Skip to content

Conversation

@wenxindongwork
Copy link
Collaborator

  1. Llama3 tokenizers don't have a default pad_token id. This PR sets it to fall back to other special tokens
  2. Added Llama3 FSDP to PredefinedShardingStrategy
  3. Refactored llama3 loading and saving tests into same test suites as the gemma models
  4. Fixed some bugs on model weights similarity metric calculation

TODO:

  • MaxText Llama3.1 model saving is failing.

@wenxindongwork wenxindongwork merged commit 5957e12 into main Mar 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants