CP for LLMs #65

pat-alt · 2023-06-01T09:19:02Z

Issue set up for Experiment Week

pat-alt · 2023-06-01T09:20:31Z

pat-alt · 2023-06-12T07:47:30Z

Notes from paper

Context: LLMs for the task of multiple-choice question answering (MCQA)
Uncertainty, as estimated through CP, highly correlated with model accuracy
Working with inductive/split conformal prediction to avoid retraining
Model: LLaMA-13B
Data: MMLU benchmark containing MCQA question from 57 domains, 4 possible answers each
Find that set size is higher on average (closer to 4 for more difficult domains)
Find that set size negatively correlates with top-1 accuracy and argue that this can be used to filter low-quality predictions
Also find fairly robust size-stratified coverage, even though it does not appear they used adaptive prediction sets
Show that coverage guarantees do not hold if exchangeability assumption is violated (see below)

pat-alt · 2023-06-12T08:03:30Z

For work in Julia, we probably want to look at Transformers.jl

pat-alt · 2023-06-12T11:59:54Z

Idea: for small dataset/model, check if we can use conformal training

pat-alt added enhancement New feature or request medium This is expected to be medium. labels Jun 1, 2023

pat-alt self-assigned this Jun 1, 2023

pat-alt mentioned this issue Jun 15, 2023

Adaptive Inductive Classification broken? #66

Closed

pat-alt linked a pull request Jun 27, 2023 that will close this issue

Cp llm new attempt #70

Merged

3 tasks

pat-alt closed this as completed in #70 Jul 5, 2023

Provide feedback