Skip to content

fix: resolve rope_theta from rope_parameters in DeepseekV32Bridge#1734

Merged
zhuzilin merged 1 commit intoTHUDM:mainfrom
stevewx:fix/deepseek-v32-rope-theta
Mar 22, 2026
Merged

fix: resolve rope_theta from rope_parameters in DeepseekV32Bridge#1734
zhuzilin merged 1 commit intoTHUDM:mainfrom
stevewx:fix/deepseek-v32-rope-theta

Conversation

@stevewx
Copy link
Copy Markdown
Contributor

@stevewx stevewx commented Mar 17, 2026

Summary

  • DeepseekV3Bridge._build_config() expects hf_config.rope_theta as a top-level attribute, but transformers 5.x RotaryEmbeddingConfigMixin moves it into the rope_parameters dict
  • Adds __init__ to DeepseekV32Bridge to resolve rope_theta from rope_parameters when not available as a top-level attribute (no-op on transformers 4.x)
  • Same fix pattern as GLM4MoELiteBridge in glm4moe_lite.py

Test plan

  • Verified transformers 4.x sets config.rope_theta directly (fix is no-op)
  • Verified transformers 5.x stores it in config.rope_parameters dict (fix resolves it)
  • Successfully converted GLM-5 744B checkpoint with this fix applied as a runtime patch

🤖 Generated with Claude Code

transformers 5.x RotaryEmbeddingConfigMixin.convert_rope_params_to_dict()
consumes rope_theta from __init__ kwargs and stores it inside the
rope_parameters dict, never setting self.rope_theta as a top-level
attribute. DeepseekV3Bridge._build_config() expects hf_config.rope_theta
directly, causing AttributeError during checkpoint conversion.

Add __init__ to DeepseekV32Bridge that resolves rope_theta from the
rope_parameters dict when it's not available as a top-level attribute.
Same fix pattern as GLM4MoELiteBridge in glm4moe_lite.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zhuzilin zhuzilin merged commit d269838 into THUDM:main Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants