add task for mmlu evaluation in arc multiple choice format #1745

jonabur · 2024-04-24T13:38:33Z

This PR adds the mmlu_arc_style task that presents the MMLU questions in the same manner as the arc evals (loglikelihood for the answer as a continuation, rather than selecting the letter for the correct answer from a presented list.)

I compared results with standard mmlu with mistralai/Mistral-7B-v0.1 and got the following results:

mmlu, acc .6256
mmlu_arc_style, acc: .4718

I was surprised to see the accuracy go down with what seems like a simpler prompting format, but I suppose it is likely that the models gain information from seeing all the options presented at once that allow them to make better choices in the classic MMLU multiple choice format presentation.

I'm glad to rename the task if you'd prefer it be called something else.

CLAassistant · 2024-04-24T13:38:40Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Jonathan Burdge seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

lintangsutawika · 2024-04-24T16:06:35Z

This is actually consistent with a few experiments I had. Conversely, I tried evaluating Mistral on Arc with MMLU's prompt style and it actually works better.

I think a better name for the prompt would be something like direct_continuation or maybe just continuation.

jonabur · 2024-04-25T07:58:36Z

I updated the name and ran a test to verify everything is still working.

I've signed the CLA but it doesn't seem to be updating -- it is possibly because I'm using an anonymized github email address?

Also is this check failure legitimate? It looks to me like it might be unrelated?

jonabur · 2024-05-08T11:09:40Z

Hi, any chance of getting this merged?

lintangsutawika · 2024-05-08T12:38:01Z

Yeah, sorry this PR got piled under other PRs.

add mmlu arc style evaluation

4b7feca

jonabur requested review from haileyschoelkopf and lintangsutawika as code owners April 24, 2024 13:38

rename arc_style to continuation

9b72773

lintangsutawika approved these changes May 8, 2024

View reviewed changes

lintangsutawika merged commit 9097ad3 into EleutherAI:main May 8, 2024
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add task for mmlu evaluation in arc multiple choice format #1745

add task for mmlu evaluation in arc multiple choice format #1745

jonabur commented Apr 24, 2024

CLAassistant commented Apr 24, 2024

lintangsutawika commented Apr 24, 2024

jonabur commented Apr 25, 2024

jonabur commented May 8, 2024

lintangsutawika commented May 8, 2024

add task for mmlu evaluation in arc multiple choice format #1745

add task for mmlu evaluation in arc multiple choice format #1745

Conversation

jonabur commented Apr 24, 2024

CLAassistant commented Apr 24, 2024

lintangsutawika commented Apr 24, 2024

jonabur commented Apr 25, 2024

jonabur commented May 8, 2024

lintangsutawika commented May 8, 2024