Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The saved embed_tokens is empty #21

Closed
merlinarer opened this issue Nov 20, 2023 · 5 comments
Closed

The saved embed_tokens is empty #21

merlinarer opened this issue Nov 20, 2023 · 5 comments

Comments

@merlinarer
Copy link

merlinarer commented Nov 20, 2023

Hello, I try to run this code with llama1-7B, while I find the saved embed_tokens is empty and fail to load after training. Have you met this problem?

(Pdb) param_name
'model.embed_tokens.weight'
(Pdb) param
tensor([], dtype=torch.bfloat16)
@AkariAsai
Copy link
Owner

I haven't seen this issue on my side so I am not sure if I can help here... I had some issues when I was using Llama1 and adding special tokens a while ago, as back then (April or May) the HF transformers' llama1 support was somewhat unstable. Probably using Llama2 instead of upgrading HF transformers might help, although I am not sure...

@merlinarer
Copy link
Author

Solved! It seems the given scripts will save the spilted safetensor (model-00001-of-00003.safetensors ...), and also a model.safetensors, which leads to the loading error. Deleting model.safetensors solves it.

@merlinarer
Copy link
Author

BTW, I can not find the 13B training script. I try to modify the 7B to 13B and find a CUDA OOM even with bs 1 on each 80G device. Maybe I should reduce the input_length ?

@AkariAsai
Copy link
Owner

Yes we reduce the input_length for 13B due to the OOM issue, as mentioned in the paper. Let me upload the final 13B script and push it.

@AkariAsai
Copy link
Owner

I uploaded the 13B training script: script_finetune_13b.sh

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants