Skip to content

[model] Add support for GLM4.7 Flash#1460

Merged
zhuzilin merged 1 commit intomainfrom
feature/glm47_flash
Jan 20, 2026
Merged

[model] Add support for GLM4.7 Flash#1460
zhuzilin merged 1 commit intomainfrom
feature/glm47_flash

Conversation

@zhuzilin
Copy link
Contributor

@zhuzilin zhuzilin commented Jan 20, 2026

Please change the "architectures" to "DeepseekV32ForCausalLM" and "model_type" to "deepseek_v3" before the sglang in the slime image supports Glm4MoeLiteForCausalLM, and add "moe_layer_freq": 1.

@zhuzilin zhuzilin merged commit fe0cc35 into main Jan 20, 2026
2 checks passed
@zhuzilin zhuzilin deleted the feature/glm47_flash branch January 20, 2026 01:52
@sxthunder
Copy link

I modified the config.json of glm-4.7 by changing "architectures" to "DeepseekV3ForCausalLM" and "model_type" to "deepseek_v3", but it still failed with an AttributeError: 'DeepseekV3Config' object has no attribute 'rope_theta'

@ifififa
Copy link

ifififa commented Feb 10, 2026

@zhuzilin hi,can we convert glm-4.7-flash from huggingface to megatron format with --mtp-num-layers 1?
when I set --mtp-num-layers 1 in conversion script, I met this error:

[rank7]: Traceback (most recent call last):
[rank7]: File "/root/slime/tools/convert_hf_to_torch_dist.py", line 145, in
[rank7]: main()
[rank7]: File "/root/slime/tools/convert_hf_to_torch_dist.py", line 119, in main
[rank7]: bridge.load_weights(model, hf_model_path, memory_efficient=True)
[rank7]: File "/usr/local/lib/python3.12/dist-packages/mbridge/core/bridge.py", line 172, in load_weights
[rank7]: k: self._weight_name_mapping_mcore_to_hf(v)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/usr/local/lib/python3.12/dist-packages/mbridge/models/deepseek_v3.py", line 300, in _weight_name_mapping_mcore_to_hf
[rank7]: return self._convert_mtp_param(mcore_weights_name)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/usr/local/lib/python3.12/dist-packages/mbridge/models/deepseek_v3.py", line 361, in _convert_mtp_param
[rank7]: assert self.config.num_layers == 61, "only support 61 layers for now"
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: AssertionError: only support 61 layers for now
[rank7]:[W209 12:07:19.756675305 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

TideDra pushed a commit to t2vg/slime that referenced this pull request Feb 13, 2026
Yangruipis pushed a commit to rednote-ai/slime that referenced this pull request Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants