Evaluation results of llama2 with lm-evaluation-harness using wikitext-2 #1833

l2002924700 · 2024-05-13T11:15:02Z

hi, kindly helper,
I am a newer of lm-eval and I want to test the llama2 model with lm-evaluation-harness using wikitext-2 dataset. The setps are as follows:

Download the llama2 model from hugfaceweb site.
install the lm-eval as descriped in this github of https://github.com/EleutherAI/lm-evaluation-harness?tab=readme-ov-file#install.
I downlaod the wikitext dataset from "https://huggingface.co/datasets/wikitext/tree/main/wikitext-2-raw-v1"
Then I condigue the lm_eval parameters and execute the test using the command :
"lm_eval --model hf --model_args pretrained=~/LLM-Models/Llama-2-7b-hf --tasks wikitext --device cuda --batch_size 1 --output_path ./eval_harness/Llama-2-7b-hf-16b".

Then I get the results as follows:
hf-auto (pretrained=LLM-Models/Llama-2-7b-hf), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1

Tasks	Version	Filter	Metric	Value		Stderr
wikitext	2	none	word_perplexity	15.0550	±	N/A
		none	byte_perplexity	1.6875	±	N/A
		none	bits_per_byte	0.7549	±	N/A

Are the results nomal? I think the value is too high compared with the test results of "https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#quantization"
Conld anyone help me?

l2002924700 · 2024-05-14T11:46:41Z

about the dataset of wikitext, I downlaod from the url "https://huggingface.co/datasets/wikitext/tree/main/wikitext-2-raw-v1". The test results seem like not normal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation results of llama2 with lm-evaluation-harness using wikitext-2 #1833

Evaluation results of llama2 with lm-evaluation-harness using wikitext-2 #1833

l2002924700 commented May 13, 2024 •

edited

Loading

l2002924700 commented May 14, 2024

Evaluation results of llama2 with lm-evaluation-harness using wikitext-2 #1833

Evaluation results of llama2 with lm-evaluation-harness using wikitext-2 #1833

Comments

l2002924700 commented May 13, 2024 • edited Loading

l2002924700 commented May 14, 2024

l2002924700 commented May 13, 2024 •

edited

Loading