Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3.1 8B and 70B checkpoints #1619

Merged
merged 4 commits into from
Jul 24, 2024
Merged

Llama 3.1 8B and 70B checkpoints #1619

merged 4 commits into from
Jul 24, 2024

Conversation

rasbt
Copy link
Collaborator

@rasbt rasbt commented Jul 23, 2024

Adds the new Llama 3.1 8B and 70B checkpoints.

405B will be done separately as it requires tensor parallelism.

  • Update config files
  • Download and convert models
  • Update checkpoint conversion (if applicable)
  • Update prompt style (if applicable)
  • Test models for inference (generate, chat, Python API)
  • Add unit tests
  • Try fine-tuning, add config_hub file
  • Update README
  • Update download tutorial page
  • Find out about / implement new RoPE (works great without it, but we should still look into this for correctness)
  • Make new release

@rasbt rasbt marked this pull request as draft July 23, 2024 15:29
@rasbt
Copy link
Collaborator Author

rasbt commented Jul 23, 2024

Finetuning works fine, but there's something weird about the RoPE scaling when evaluating with lm_eval. Haven't seen this before:

~ litgpt evaluate /teamspace/studios/this_studio/out/finetune/qlora-llama3.1-8b/final --tasks mmlu 
{'access_token': None,
 'batch_size': 1,
 'checkpoint_dir': PosixPath('/teamspace/studios/this_studio/out/finetune/qlora-llama3.1-8b/final'),
 'device': None,
 'dtype': None,
 'force_conversion': False,
 'limit': None,
 'num_fewshot': None,
 'out_dir': None,
 'save_filepath': None,
 'seed': 1234,
 'tasks': 'mmlu'}
{'checkpoint_dir': PosixPath('/teamspace/studios/this_studio/out/finetune/qlora-llama3.1-8b/final'),
 'output_dir': PosixPath('/teamspace/studios/this_studio/out/finetune/qlora-llama3.1-8b/final/evaluate')}
2024-07-23:20:12:02,098 INFO     [huggingface.py:170] Using device 'cuda'
Traceback (most recent call last):
  File "/home/zeus/miniconda3/envs/cloudspace/bin/litgpt", line 8, in <module>
    sys.exit(main())
  File "/teamspace/studios/this_studio/litgpt2/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
  File "/teamspace/studios/this_studio/litgpt2/litgpt/eval/evaluate.py", line 106, in convert_and_evaluate
    model = HFLM(pretrained=str(out_dir.resolve()), device=device, batch_size=batch_size, dtype=dtype)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/lm_eval/models/huggingface.py", line 196, in __init__
    self._get_config(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/lm_eval/models/huggingface.py", line 470, in _get_config
    self._config = transformers.AutoConfig.from_pretrained(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 989, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/configuration_utils.py", line 772, in from_dict
    config = cls(**config_dict)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 161, in __init__
    self._rope_scaling_validation()
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 182, in _rope_scaling_validation
    raise ValueError(
ValueError: `rope_scaling` must be a dictionary with two fields, `type` and `factor`, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

They must have changed something in the Llama 3.1 RoPE. I guess I have to buckle up and read the 92-page paper tonight.

@rasbt rasbt marked this pull request as ready for review July 23, 2024 20:23
@rasbt rasbt requested review from Andrei-Aksionov and removed request for williamFalcon July 23, 2024 21:48
@Andrei-Aksionov
Copy link
Collaborator

I don't know, do we need to add tests for version 3.1 by adding it here:

litgpt/tests/test_model.py

Lines 208 to 217 in 5ff6343

@pytest.mark.parametrize(
"ours_kwargs",
[
{"name": "Llama-2-7b-hf"},
{"name": "CodeLlama-7b-hf"},
{"name": "Llama-2-70b-chat-hf", "n_query_groups": 1},
{"name": "Llama-3-8B"},
{"name": "Llama-3-8B-Instruct"},
],
)

There are no architectural changes and some params this test overrides anyway.

@Andrei-Aksionov
Copy link
Collaborator

As for the RoPE thing during evaluation, either something is changed in this particular part of the LlaMA 3.1 architecture, or it's something in the eval that is specific to this model.
Because the test uses pythia-14m as a model and I don't see any fails on CI.

Or the test is incorrect.

README.md Outdated Show resolved Hide resolved
@rasbt
Copy link
Collaborator Author

rasbt commented Jul 24, 2024

I now read the complete paper and couldn’t find anything particular about the RoPE scaling in Llama 3.1. I did some research on the internet, and it seems like there’s nothing particular about it. Also, when I tried it, it works fine with the standard Llama 3 RoPE. (see Discussion via https://news.ycombinator.com/item?id=41053201)

HF transformers may have added something RoPE-specific to the Llama 3 model which causes lm_eval to fail. I guess we have to wait for an Evaluation Harness update here but this shouldn't hold up the PR.

@rasbt rasbt merged commit fd71063 into main Jul 24, 2024
9 checks passed
@rasbt rasbt deleted the llama3.1-small branch July 24, 2024 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants