about nq_open results #1288

Hannibal046 · 2024-01-15T18:43:32Z

Hi,
I have problems reproducing the results of Natural Question dataset from this PR. This is the command I use:

For llama1-0-shot

lm-eval --model hf --batch_size auto --task nq_open --model_args pretrained=huggyllama/llama-7b

For llama1-5-shot

lm-eval --model hf --batch_size auto --task nq_open --model_args pretrained=huggyllama/llama-7b --num_fewshot 5

For llama2-0-shot:

lm-eval --model hf --batch_size auto --task nq_open --model_args pretrained=meta-llama/Llama-2-7b-hf

For llama2-5-shot

lm-eval --model hf --batch_size auto --task nq_open --model_args pretrained=meta-llama/Llama-2-7b-hf --num_fewshot 5

This is the results I got:

	0-shot-llama1-7b	5-shot-llama1-7b	0-shot-llama2-7b	5-shot-llama2-7b
original PR	18.3	23.9	17.5	26.1
reproduced	13.3	21.5	14.5	25.2

I notice there was a refactor of the database, I am not sure if this would cause discrepancy here.

haileyschoelkopf · 2024-01-15T19:07:01Z

Hi! Thanks for opening this. Are you using the most recent version of this codebase?

I think the discrepancy may be due to the lack of passing a prepended description (see #789 (comment) )--we changed from expecting users to pass this via the CLI, to allowing it to be specified for a task in that task's config. Would you be willing to add description: "Answer these questions:\n\n" into the NaturalQs YAML file and see if you can replicate one of these results?

If so then we can merge that fix. I'm sorry for the issues!

Hannibal046 · 2024-01-15T19:31:30Z

Hi, thanks for response! I used the latest version by git clone the repo and pip install -e. The commit id is 588a493c41b58d048d3994f99800edd27223bd3a.

The YAML file I use for nq_open is this: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/nq_open/nq_open.yaml.

There is already description filed.

lm-evaluation-harness/lm_eval/tasks/nq_open/nq_open.yaml

Line 6 in 588a493

description: "Answer these questions:\n"

Do you mean I should change it to "Answer these questions:\n\n"? I would try it out.

Thanks again for your help!

Hannibal046 · 2024-01-15T19:52:48Z

Haha, this is crazy. Simply adding a \n to the current description make llama-2-7b-0-shot from 14.5 to 18.4.

haileyschoelkopf · 2024-01-15T21:14:13Z

sigh Sometimes I hate LLM evals. I'm glad this works to reproduce though!

To match the old master implementation, \n\n is needed. I'll PR this fix!

Hannibal046 · 2024-01-16T09:57:58Z

Hi, thanks for fixing!

For anyone who is also interested in the nq_open results:

llama1	Description	0-shot	1-shot	5-shot
7b	Answer these questions:\n	13.4	17.3	21.4
7b	Answer these questions:\n\n	17.3	18.8	23.0
13b	Answer these questions:\n	14.7	21.5	26.5
13b	Answer these questions:\n\n	22.2	23.5	28.0

llama2	Description	0-shot	1-shot	5-shot
7b	Answer these questions:\n	14.5	21.3	25.2
7b	Answer these questions:\n\n	18.4	22.5	25.5
13b	Answer these questions:\n	14.4	23.9	28.5
13b	Answer these questions:\n\n	22.8	26.0	29.7

Slightly lower than results in this PR, reason unknown

haileyschoelkopf closed this as completed Jan 15, 2024

haileyschoelkopf reopened this Jan 15, 2024

haileyschoelkopf mentioned this issue Jan 15, 2024

Update nq_open / NaturalQs whitespacing #1289

Merged

lintangsutawika closed this as completed in #1289 Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about nq_open results #1288

about nq_open results #1288

Hannibal046 commented Jan 15, 2024 •

edited

Loading

haileyschoelkopf commented Jan 15, 2024

Hannibal046 commented Jan 15, 2024

Hannibal046 commented Jan 15, 2024

haileyschoelkopf commented Jan 15, 2024

Hannibal046 commented Jan 16, 2024

about nq_open results #1288

about nq_open results #1288

Comments

Hannibal046 commented Jan 15, 2024 • edited Loading

haileyschoelkopf commented Jan 15, 2024

Hannibal046 commented Jan 15, 2024

Hannibal046 commented Jan 15, 2024

haileyschoelkopf commented Jan 15, 2024

Hannibal046 commented Jan 16, 2024

Hannibal046 commented Jan 15, 2024 •

edited

Loading