-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about nq_open results #1288
Comments
Hi! Thanks for opening this. Are you using the most recent version of this codebase? I think the discrepancy may be due to the lack of passing a prepended description (see #789 (comment) )--we changed from expecting users to pass this via the CLI, to allowing it to be specified for a task in that task's config. Would you be willing to add If so then we can merge that fix. I'm sorry for the issues! |
Hi, thanks for response! I used the latest version by The YAML file I use for nq_open is this: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/nq_open/nq_open.yaml. There is already
Do you mean I should change it to Thanks again for your help! |
Haha, this is crazy. Simply adding a |
sigh Sometimes I hate LLM evals. I'm glad this works to reproduce though! To match the old |
Hi, thanks for fixing! For anyone who is also interested in the nq_open results:
Slightly lower than results in this PR, reason unknown |
Hi,
I have problems reproducing the results of Natural Question dataset from this PR. This is the command I use:
For llama1-0-shot
For llama1-5-shot
For llama2-0-shot:
For llama2-5-shot
This is the results I got:
I notice there was a refactor of the database, I am not sure if this would cause discrepancy here.
The text was updated successfully, but these errors were encountered: