-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama 3.1 8B and 70B checkpoints #1619
Conversation
Finetuning works fine, but there's something weird about the RoPE scaling when evaluating with ⚡ ~ litgpt evaluate /teamspace/studios/this_studio/out/finetune/qlora-llama3.1-8b/final --tasks mmlu
{'access_token': None,
'batch_size': 1,
'checkpoint_dir': PosixPath('/teamspace/studios/this_studio/out/finetune/qlora-llama3.1-8b/final'),
'device': None,
'dtype': None,
'force_conversion': False,
'limit': None,
'num_fewshot': None,
'out_dir': None,
'save_filepath': None,
'seed': 1234,
'tasks': 'mmlu'}
{'checkpoint_dir': PosixPath('/teamspace/studios/this_studio/out/finetune/qlora-llama3.1-8b/final'),
'output_dir': PosixPath('/teamspace/studios/this_studio/out/finetune/qlora-llama3.1-8b/final/evaluate')}
2024-07-23:20:12:02,098 INFO [huggingface.py:170] Using device 'cuda'
Traceback (most recent call last):
File "/home/zeus/miniconda3/envs/cloudspace/bin/litgpt", line 8, in <module>
sys.exit(main())
File "/teamspace/studios/this_studio/litgpt2/litgpt/__main__.py", line 71, in main
CLI(parser_data)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/jsonargparse/_cli.py", line 119, in CLI
return _run_component(component, init.get(subcommand))
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/jsonargparse/_cli.py", line 204, in _run_component
return component(**cfg)
File "/teamspace/studios/this_studio/litgpt2/litgpt/eval/evaluate.py", line 106, in convert_and_evaluate
model = HFLM(pretrained=str(out_dir.resolve()), device=device, batch_size=batch_size, dtype=dtype)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/lm_eval/models/huggingface.py", line 196, in __init__
self._get_config(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/lm_eval/models/huggingface.py", line 470, in _get_config
self._config = transformers.AutoConfig.from_pretrained(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 989, in from_pretrained
return config_class.from_dict(config_dict, **unused_kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/configuration_utils.py", line 772, in from_dict
config = cls(**config_dict)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 161, in __init__
self._rope_scaling_validation()
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 182, in _rope_scaling_validation
raise ValueError(
ValueError: `rope_scaling` must be a dictionary with two fields, `type` and `factor`, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'} They must have changed something in the Llama 3.1 RoPE. I guess I have to buckle up and read the 92-page paper tonight. |
I don't know, do we need to add tests for version 3.1 by adding it here: Lines 208 to 217 in 5ff6343
There are no architectural changes and some params this test overrides anyway. |
As for the RoPE thing during evaluation, either something is changed in this particular part of the LlaMA 3.1 architecture, or it's something in the eval that is specific to this model. Or the test is incorrect. |
I now read the complete paper and couldn’t find anything particular about the RoPE scaling in Llama 3.1. I did some research on the internet, and it seems like there’s nothing particular about it. Also, when I tried it, it works fine with the standard Llama 3 RoPE. (see Discussion via https://news.ycombinator.com/item?id=41053201) HF transformers may have added something RoPE-specific to the Llama 3 model which causes |
Adds the new Llama 3.1 8B and 70B checkpoints.
405B will be done separately as it requires tensor parallelism.
config_hub
file