[model] Add support for GLM4.7 Flash by zhuzilin · Pull Request #1460 · THUDM/slime

zhuzilin · 2026-01-20T01:52:27Z

Please change the "architectures" to "DeepseekV32ForCausalLM" and "model_type" to "deepseek_v3" before the sglang in the slime image supports Glm4MoeLiteForCausalLM, and add "moe_layer_freq": 1.

sxthunder · 2026-01-29T09:14:20Z

I modified the config.json of glm-4.7 by changing "architectures" to "DeepseekV3ForCausalLM" and "model_type" to "deepseek_v3", but it still failed with an AttributeError: 'DeepseekV3Config' object has no attribute 'rope_theta'

ifififa · 2026-02-10T02:22:06Z

@zhuzilin hi，can we convert glm-4.7-flash from huggingface to megatron format with --mtp-num-layers 1?
when I set --mtp-num-layers 1 in conversion script, I met this error:

[rank7]: Traceback (most recent call last):
[rank7]: File "/root/slime/tools/convert_hf_to_torch_dist.py", line 145, in
[rank7]: main()
[rank7]: File "/root/slime/tools/convert_hf_to_torch_dist.py", line 119, in main
[rank7]: bridge.load_weights(model, hf_model_path, memory_efficient=True)
[rank7]: File "/usr/local/lib/python3.12/dist-packages/mbridge/core/bridge.py", line 172, in load_weights
[rank7]: k: self._weight_name_mapping_mcore_to_hf(v)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/usr/local/lib/python3.12/dist-packages/mbridge/models/deepseek_v3.py", line 300, in _weight_name_mapping_mcore_to_hf
[rank7]: return self._convert_mtp_param(mcore_weights_name)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/usr/local/lib/python3.12/dist-packages/mbridge/models/deepseek_v3.py", line 361, in _convert_mtp_param
[rank7]: assert self.config.num_layers == 61, "only support 61 layers for now"
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: AssertionError: only support 61 layers for now
[rank7]:[W209 12:07:19.756675305 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

[model] Add support for GLM4.7 Flash

ab21946

zhuzilin merged commit fe0cc35 into main Jan 20, 2026
2 checks passed

zhuzilin deleted the feature/glm47_flash branch January 20, 2026 01:52

lilei199908 mentioned this pull request Jan 26, 2026

About GLM-4.7-Flash #1482

Open

TideDra pushed a commit to t2vg/slime that referenced this pull request Feb 13, 2026

[model] Add support for GLM4.7 Flash (THUDM#1460)

cc0bc88

ifififa mentioned this pull request Feb 26, 2026

Is glm-4.7-flash MTP training in RL stage supported for now? #1534

Open

Yangruipis pushed a commit to rednote-ai/slime that referenced this pull request Feb 28, 2026

[model] Add support for GLM4.7 Flash (THUDM#1460)

edcabac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] Add support for GLM4.7 Flash#1460

[model] Add support for GLM4.7 Flash#1460
zhuzilin merged 1 commit intomainfrom
feature/glm47_flash

zhuzilin commented Jan 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

sxthunder commented Jan 29, 2026

Uh oh!

ifififa commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhuzilin commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sxthunder commented Jan 29, 2026

Uh oh!

ifififa commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhuzilin commented Jan 20, 2026 •

edited

Loading