Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for LLaMA #841

Merged
merged 12 commits into from
May 2, 2023
Merged

Support for LLaMA #841

merged 12 commits into from
May 2, 2023

Conversation

zphang
Copy link
Contributor

@zphang zphang commented Mar 18, 2023

No description provided.

@zphang zphang requested a review from a team as a code owner March 18, 2023 02:25
megatron/model/transformer.py Show resolved Hide resolved
megatron/model/transformer.py Show resolved Hide resolved
megatron/neox_arguments/neox_args.py Outdated Show resolved Hide resolved
megatron/neox_arguments/neox_args.py Outdated Show resolved Hide resolved
@StellaAthena
Copy link
Member

@zphang where are we on this?

@zphang
Copy link
Contributor Author

zphang commented Apr 12, 2023

Other than the above note on LLaMAMLP, I can incorporate the necessary changes and update the PR.

@zphang
Copy link
Contributor Author

zphang commented Apr 18, 2023

Addressed comments. Please take another look. For further additions (e.g. guide to tuning a LLaMA model), I can do a separate PR.

@StellaAthena
Copy link
Member

I'm confused by the "make_vocab_size_divisible_by" argument in the config file. Can you explain what it does? We generally assume that the data has been preprocessed including pre-tokenzied, and that's the point in time where it would make sense to have such a thing (and we do).

Otherwise, this looks good to me. @Quentin-Anthony, any thoughts?

@zphang
Copy link
Contributor Author

zphang commented Apr 20, 2023

make_vocab_size_divisible_by is an argument pre-existing in NeoX. If I understand correctly, NeoX determines the vocab size based on the tokenizer, and then pads it to a multiple of make_vocab_size_divisible_by for computational efficiency reasons. Setting it to 1 effectively disables the padding. The feature isn't LLaMA-specific, but I think this makes sense for a default config for LLaMA models.

@Quentin-Anthony
Copy link
Member

I'm confused by the "make_vocab_size_divisible_by" argument in the config file. Can you explain what it does? We generally assume that the data has been preprocessed including pre-tokenzied, and that's the point in time where it would make sense to have such a thing (and we do).

Otherwise, this looks good to me. @Quentin-Anthony, any thoughts?

Looking good to me. I'm gonna run some tests then merge.

@Quentin-Anthony Quentin-Anthony self-assigned this Apr 21, 2023
@Quentin-Anthony Quentin-Anthony added the merge-queue This PR is next on the queue to merge label Apr 21, 2023
@CLAassistant
Copy link

CLAassistant commented Apr 23, 2023

CLA assistant check
All committers have signed the CLA.

@StellaAthena StellaAthena self-requested a review April 25, 2023 16:22
@StellaAthena
Copy link
Member

@zphang We need you to sign the CLA before merging this PR :)

@zphang
Copy link
Contributor Author

zphang commented Apr 25, 2023

Signed!

@Quentin-Anthony Quentin-Anthony dismissed StellaAthena’s stale review May 2, 2023 05:47

all comments appear resolved now

@Quentin-Anthony
Copy link
Member

Works for all of my tests. Merging.

@Quentin-Anthony Quentin-Anthony merged commit 299b68c into EleutherAI:main May 2, 2023
@DaoD
Copy link

DaoD commented May 4, 2023

@zphang Thanks for your work! It seems that there is no params.json in the llama checkpoint. Where can I get it? Thanks!

@wiio12
Copy link

wiio12 commented May 4, 2023

@zphang Hi, thank you for your work! can you provide a readme explaining how to run llama in neox?

@wiio12
Copy link

wiio12 commented May 4, 2023

@zphang Thanks for your work! It seems that there is no params.json in the llama checkpoint. Where can I get it? Thanks!

Hi @DaoD, this params.json file can be found in the offical checkpoint (It requires to fill the google form).

@DaoD
Copy link

DaoD commented May 4, 2023

@zphang Thanks for your work! It seems that there is no params.json in the llama checkpoint. Where can I get it? Thanks!

Hi @DaoD, this params.json file can be found in the offical checkpoint (It requires to fill the google form).

Thanks so much!

@DaoD
Copy link

DaoD commented May 4, 2023

@zphang Hi, thank you for your work! can you provide a readme explaining how to run llama in neox?

Yes. I have converted the checkpoint and tried to use the 6-7B's config to load it, but there are some missing keys and unexpected keys in the dict. Could you please provide a readme for using it?

@wiio12
Copy link

wiio12 commented May 4, 2023

Hi @DaoD, I think llama should be loaded with the llama/7B.yml config file.

I post an issue that discusses problems that I came across when loading llama in Neox.

@DaoD
Copy link

DaoD commented May 5, 2023

@zphang I think we need a new convert_sequential_to_hf.py to convert the obatined model into HF style. Have you done something about this? Thanks!

@borghives
Copy link

borghives commented May 5, 2023

@zphang in the tools/convert_raw_llama_weights_to_neox.py, how does one convert llama tokenizer?

  • I have tried using the documented flag in the tool "--model_size tokenizer_only" (got error "assert model_size in NUM_SHARDS" when trying to run the convert_raw_llama_weights_to_neox.py)
  • I have tried using the huggingface llama tokenizer.json file (got error missing "merge-file" when trying to run generate)

@DaoD
Copy link

DaoD commented May 6, 2023

@zphang in the tools/convert_raw_llama_weights_to_neox.py, how does one convert llama tokenizer?

  • I have tried using the documented flag in the tool "--model_size tokenizer_only" (got error "assert model_size in NUM_SHARDS" when trying to run the convert_raw_llama_weights_to_neox.py)
  • I have tried using the huggingface llama tokenizer.json file (got error missing "merge-file" when trying to run generate)

I think you do not need to convert llama tokenizer. Just set

"tokenizer_type": "SPMTokenizer",
"vocab-file": "/llama-7b-hf/tokenizer.model",

The tokenizer.model can be obtained from this link.

@DaoD
Copy link

DaoD commented May 6, 2023

I find another problem. The eod token's id of SentencePieceTokenizer (eos_token_id=0) is differnet from the orginal LlamaTokenizer (eos_token_id =1), which may cause some problems in training and inference.

@wiio12
Copy link

wiio12 commented May 8, 2023

I find another problem. The eod token's id of SentencePieceTokenizer (eos_token_id=0) is differnet from the orginal LlamaTokenizer (eos_token_id =1), which may cause some problems in training and inference.

Hi @DaoD, have you tried to do inference with this model, can you generate reasonable text with it?

@DaoD
Copy link

DaoD commented May 8, 2023

I find another problem. The eod token's id of SentencePieceTokenizer (eos_token_id=0) is differnet from the orginal LlamaTokenizer (eos_token_id =1), which may cause some problems in training and inference.

Hi @DaoD, have you tried to do inference with this model, can you generate reasonable text with it?

Yes, but I do not use the inference code provided by gpt-neox. I just convert the model into Huggingface style, and use the HF function for generation. It seems correct.

@wiio12
Copy link

wiio12 commented May 8, 2023

I find another problem. The eod token's id of SentencePieceTokenizer (eos_token_id=0) is differnet from the orginal LlamaTokenizer (eos_token_id =1), which may cause some problems in training and inference.

Hi @DaoD, have you tried to do inference with this model, can you generate reasonable text with it?

Yes, but I do not use the inference code provided by gpt-neox. I just convert the model into Huggingface style, and use the HF function for generation. It seems correct.

Thx! This helps a lot:)

@Quentin-Anthony Quentin-Anthony removed the merge-queue This PR is next on the queue to merge label May 11, 2023
@sxthunder
Copy link

Hello, can you convert a gpt-neox llama model to HF style?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants