Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Art/llama3 support #1

Closed

Conversation

nivibilla
Copy link

@nivibilla nivibilla commented Apr 29, 2024

Hey,

I was having a look at generalising this to llama 3 70b as well. I found that if we pre convert the safetensors version to a PyTorch pickle version. The model converts fine given the correct config.

And with your tokeniser changes generation should be good too. Pls have a look.

Ive converted and uploaded llama-3-8b-instruct and llama-3-70b-instruct to the native pytorch_model_n_of_n.bin for testing

@nivibilla
Copy link
Author

This extends this PR to support both llama 3 8b and 70b.

@Artyom17
Copy link
Collaborator

It is great to see that 70B model works as well! Thanks for that! A couple of notes:

  1. In your converted llama-3-8b-instruct and llama-3-70b-instruct you misspelled the name of the sub-dir "original" (it is "orignal" in your case), therefore conversion script complains that it can't find the 'tokenizer.model'
  2. I afraid, we can't make gpt-fast dependent on a third-party model.

Here is what I think should be done:

  1. My original PR gets landed first (really hope it happens soon)
  2. It would be nice if HuggingFace adopts your conversion and release those .bin files properly, or
  3. We need to add .safetensor-to-.bin conversion script into gpt-fast
  4. Finally, you create a proper PR in gpt-fast repo, which adds 70b model support.

What do you think?

@nivibilla nivibilla closed this by deleting the head repository Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants