fix: read rope config from rope_parameters across all models#1400
Merged
fix: read rope config from rope_parameters across all models#1400
Conversation
Extract rope_theta, rope_scaling, and partial_rotary_factor from config.rope_parameters (the newer HuggingFace format) via a shared get_rope_config helper. This fixes Qwen3MoE crashing with KeyError when rope_parameters exists but lacks rope_theta, and ensures YaRN scaling parameters are propagated to RotaryEmbedding instead of being silently hardcoded to defaults. Models updated: qwen3_moe, qwen3_next, gpt_oss, glm4_moe, minimax_m2, step3p5, glm4_moe_lite, deepseek_v3, deepseek_v32, mistral3. Closes #1398 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Hemil Desai <hemild@nvidia.com>
In transformers v5 all rope configuration lives in config.rope_parameters; config.rope_theta no longer exists as a top-level attribute. Remove all fallback paths and read rope_theta, partial_rotary_factor, and scaling fields directly from rope_parameters. Update mock configs in glm4_moe_lite, minimax_m2, and step3p5 tests to include rope_parameters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Hemil Desai <hemild@nvidia.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Hemil Desai <hemild@nvidia.com>
Contributor
Author
|
/ok to test 277c9c7 |
akoumpa
approved these changes
Feb 27, 2026
linnanwang
pushed a commit
that referenced
this pull request
Apr 24, 2026
* fix: read rope config from rope_parameters across all models Extract rope_theta, rope_scaling, and partial_rotary_factor from config.rope_parameters (the newer HuggingFace format) via a shared get_rope_config helper. This fixes Qwen3MoE crashing with KeyError when rope_parameters exists but lacks rope_theta, and ensures YaRN scaling parameters are propagated to RotaryEmbedding instead of being silently hardcoded to defaults. Models updated: qwen3_moe, qwen3_next, gpt_oss, glm4_moe, minimax_m2, step3p5, glm4_moe_lite, deepseek_v3, deepseek_v32, mistral3. Closes #1398 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> * refactor: simplify get_rope_config to read only from rope_parameters In transformers v5 all rope configuration lives in config.rope_parameters; config.rope_theta no longer exists as a top-level attribute. Remove all fallback paths and read rope_theta, partial_rotary_factor, and scaling fields directly from rope_parameters. Update mock configs in glm4_moe_lite, minimax_m2, and step3p5 tests to include rope_parameters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> * revert: restore mistral3/model.py to main version Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
get_rope_config(config)helper incomponents/models/common/utils.pythat extractsrope_theta,rope_parameters, andpartial_rotary_factordirectly fromconfig.rope_parametersfactor,beta_slow,beta_fast,original_max_position_embeddings) fromrope_parameterstoRotaryEmbeddinginstead of hardcoding defaultsget_rope_configand updates mock configs in existing tests to includerope_parametersModels updated
qwen3_moe,qwen3_next,gpt_oss,glm4_moe,minimax_m2,step3p5,glm4_moe_lite,deepseek_v3,deepseek_v32,mistral3What was wrong
RotaryEmbeddingwas instantiated withscaling_factor=1.0,ntk_alpha=1.0, etc., so configured rope scaling never activated.Closes #1398
Test plan
uv run pytest tests/unit_tests/models/ -q— 1100 passed, 7 pre-existing failures (TE/flash_attn not installed), 0 regressionsQwen/Qwen3-30B-A3B+rope_scalingto confirm YaRN activates end-to-end🤖 Generated with Claude Code