update rope_max_timescale to 1M for qwen3-30b-a3b-base to match HF#4039
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
831ade9 to
6ba34f4
Compare
|
🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
This Pull Request correctly updates the rope_max_timescale (and its Hugging Face counterpart rope_theta) for the qwen3-30b-a3b-base model to align with the Hugging Face configuration. The change ensures consistency between the MaxText model configuration and the checkpoint conversion utilities.
🔍 General Feedback
- The changes are focused and follow the established patterns in the codebase for model configuration and HF mapping.
- I noticed that while the model has been added to
hf_model_configs.pyandparam_mapping.py, it is currently missing fromsrc/maxtext/checkpoint_conversion/utils/hf_shape.py. This omission will likely cause theto_huggingfaceconversion script to fail for this specific model variant. I have added an inline comment suggesting this addition. - Overall, the PR is high quality and addresses the requirement of matching HF configurations.
6ba34f4 to
659d5b1
Compare
vlad-karp
approved these changes
Jun 2, 2026
ajkv-google
approved these changes
Jun 2, 2026
gagika
reviewed
Jun 2, 2026
e4d28d6 to
9f5c120
Compare
…F configuration - Set rope_max_timescale to 1,000,000 in qwen3-30b-a3b-base.yml config. - Update qwen3_30b_a3b_base_config in hf_model_configs.py: eos_token_id=151645, max_position_embeddings=40960.
9f5c120 to
4720c3e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR updates the
rope_max_timescalefor theqwen3-30b-a3b-basemodel configuration from10,000,000to1,000,000.Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.