-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Llama 3.1 405B config #1622
Conversation
What about the main README and |
I think updating the readme can be done last once we have all pieces in. What should I update in tutorials/download_model_weights.md? |
Sebastian is working on 8B and 70B variants, so he adds only them to the table.
There is a table with supported models and an example output from |
Btw what do you think about the FP8 version: https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-FP8 Could that be something we could support on H100 hardware? |
We can't load these FP8 checkpoints directly (afaik). I think we have to do a bit of tinkering to load things fast. |
There was a new LitServe release 35 min ago. Could be related |
@rasbt |
Sounds reasonable, thanks |
Adds the config for Llama 3.1 405B.
Did a basic test last night using 8xH100 and both TP and sequential generation.
Here are some of the outputs: