Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about nq_open results #1288

Closed
Hannibal046 opened this issue Jan 15, 2024 · 5 comments · Fixed by #1289
Closed

about nq_open results #1288

Hannibal046 opened this issue Jan 15, 2024 · 5 comments · Fixed by #1289

Comments

@Hannibal046
Copy link
Contributor

Hannibal046 commented Jan 15, 2024

Hi,
I have problems reproducing the results of Natural Question dataset from this PR. This is the command I use:

For llama1-0-shot

lm-eval --model hf --batch_size auto --task nq_open --model_args pretrained=huggyllama/llama-7b

For llama1-5-shot

lm-eval --model hf --batch_size auto --task nq_open --model_args pretrained=huggyllama/llama-7b --num_fewshot 5

For llama2-0-shot:

lm-eval --model hf --batch_size auto --task nq_open --model_args pretrained=meta-llama/Llama-2-7b-hf

For llama2-5-shot

lm-eval --model hf --batch_size auto --task nq_open --model_args pretrained=meta-llama/Llama-2-7b-hf --num_fewshot 5

This is the results I got:

0-shot-llama1-7b 5-shot-llama1-7b 0-shot-llama2-7b 5-shot-llama2-7b
original PR 18.3 23.9 17.5 26.1
reproduced 13.3 21.5 14.5 25.2

I notice there was a refactor of the database, I am not sure if this would cause discrepancy here.

@haileyschoelkopf
Copy link
Contributor

Hi! Thanks for opening this. Are you using the most recent version of this codebase?

I think the discrepancy may be due to the lack of passing a prepended description (see #789 (comment) )--we changed from expecting users to pass this via the CLI, to allowing it to be specified for a task in that task's config. Would you be willing to add description: "Answer these questions:\n\n" into the NaturalQs YAML file and see if you can replicate one of these results?

If so then we can merge that fix. I'm sorry for the issues!

@Hannibal046
Copy link
Contributor Author

Hi, thanks for response! I used the latest version by git clone the repo and pip install -e. The commit id is 588a493c41b58d048d3994f99800edd27223bd3a.

The YAML file I use for nq_open is this: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/nq_open/nq_open.yaml.

There is already description filed.

description: "Answer these questions:\n"

Do you mean I should change it to "Answer these questions:\n\n"? I would try it out.

Thanks again for your help!

@Hannibal046
Copy link
Contributor Author

Haha, this is crazy. Simply adding a \n to the current description make llama-2-7b-0-shot from 14.5 to 18.4.

@haileyschoelkopf
Copy link
Contributor

sigh Sometimes I hate LLM evals. I'm glad this works to reproduce though!

To match the old master implementation, \n\n is needed. I'll PR this fix!

@Hannibal046
Copy link
Contributor Author

Hi, thanks for fixing!

For anyone who is also interested in the nq_open results:

llama1 Description 0-shot 1-shot 5-shot
7b Answer these questions:\n 13.4 17.3 21.4
7b Answer these questions:\n\n 17.3 18.8 23.0
13b Answer these questions:\n 14.7 21.5 26.5
13b Answer these questions:\n\n 22.2 23.5 28.0
llama2 Description 0-shot 1-shot 5-shot
7b Answer these questions:\n 14.5 21.3 25.2
7b Answer these questions:\n\n 18.4 22.5 25.5
13b Answer these questions:\n 14.4 23.9 28.5
13b Answer these questions:\n\n 22.8 26.0 29.7

Slightly lower than results in this PR, reason unknown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants