Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add task for mmlu evaluation in arc multiple choice format #1745

Merged
merged 2 commits into from May 8, 2024

Conversation

jonabur
Copy link
Contributor

@jonabur jonabur commented Apr 24, 2024

This PR adds the mmlu_arc_style task that presents the MMLU questions in the same manner as the arc evals (loglikelihood for the answer as a continuation, rather than selecting the letter for the correct answer from a presented list.)

I compared results with standard mmlu with mistralai/Mistral-7B-v0.1 and got the following results:

  • mmlu, acc .6256
  • mmlu_arc_style, acc: .4718

I was surprised to see the accuracy go down with what seems like a simpler prompting format, but I suppose it is likely that the models gain information from seeing all the options presented at once that allow them to make better choices in the classic MMLU multiple choice format presentation.

I'm glad to rename the task if you'd prefer it be called something else.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Jonathan Burdge seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@lintangsutawika
Copy link
Contributor

This is actually consistent with a few experiments I had. Conversely, I tried evaluating Mistral on Arc with MMLU's prompt style and it actually works better.

I think a better name for the prompt would be something like direct_continuation or maybe just continuation.

@jonabur
Copy link
Contributor Author

jonabur commented Apr 25, 2024

I updated the name and ran a test to verify everything is still working.

I've signed the CLA but it doesn't seem to be updating -- it is possibly because I'm using an anonymized github email address?

Also is this check failure legitimate? It looks to me like it might be unrelated?

@jonabur
Copy link
Contributor Author

jonabur commented May 8, 2024

Hi, any chance of getting this merged?

@lintangsutawika lintangsutawika merged commit 9097ad3 into EleutherAI:main May 8, 2024
6 of 8 checks passed
@lintangsutawika
Copy link
Contributor

Yeah, sorry this PR got piled under other PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants