[Klaud Cold] Add glm5-fp4-mi355x-sglang (off + mtp) recipes#1493
[Klaud Cold] Add glm5-fp4-mi355x-sglang (off + mtp) recipes#1493functionstackx wants to merge 2 commits into
Conversation
New GLM-5 MXFP4 family for MI355X using model amd/GLM-5-MXFP4 and lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260517. TP=4 (conc 4..128) and TP=8 (conc 4..8), 1k1k + 8k1k — search-space shape mirrors the existing glm5-fp8-mi355x-sglang-mtp recipe. Launch scripts mirror the glm5-fp8-mi355x sglang ones (same NSA/tilelang backend stack, ROCm fused-decode-mla disabled, quick-reduce INT4 quantization). MTP variant additionally sets SGLANG_ENABLE_SPEC_V2=1 and adds the standard EAGLE knobs + --use-chat-template on the bench client per AGENTS.md. Note: this is the glm5/MXFP4 path (not glm5.1). The existing glm5.1-fp4-mi355x-sglang recipe stays untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26018032673 |
|
Closing — mis-named recipe ( |
There was a problem hiding this comment.
LGTM — standard GLM-5 MXFP4 recipe addition mirroring the existing glm5-fp8-mi355x-sglang (off + MTP) shape on the canonical v0.5.12-rocm720-mi35x image.
Extended reasoning...
Overview
This PR adds two new SGLang ROCm recipes for GLM-5 MXFP4 on MI355X (off + MTP/EAGLE variants), with corresponding launch scripts and a perf-changelog entry. Touches .github/configs/amd-master.yaml, two new benchmarks/single_node/glm5_fp4_mi355x*.sh scripts, and perf-changelog.yaml.
Security risks
None. These are benchmark configs and launch scripts that run inside the benchmark harness. No auth, crypto, permissions, or external IO surface changes.
Level of scrutiny
Low. This is a mechanical recipe addition that mirrors the existing glm5-fp8-mi355x-sglang / glm5-fp8-mi355x-sglang-mtp shape — same TP/concurrency search-space pattern, same NSA/tilelang backend stack, same EAGLE knobs, same canonical v0.5.12-rocm720-mi35x-20260517 image used by other live mi355x recipes (per PR description, the merged #1440/#1441/#1443/#1444 bumps).
Other factors
The bug hunting system found no issues. The launch scripts closely mirror glm5_fp8_mi355x.sh / glm5_fp8_mi355x_mtp.sh with only the model and (for MTP) the standard EAGLE flags + --use-chat-template per AGENTS.md. The MTP variant correctly sets SGLANG_ENABLE_SPEC_V2=1. The full-sweep CI label will exercise both matrices end-to-end before merge.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26018035453 |
Summary
Adds a new GLM-5 MXFP4 SGLang ROCm recipe family for MI355X, both off and MTP/EAGLE variants in one PR (grouped per project convention).
Recipes
glm5-fp4-mi355x-sglangglm5-fp4-mi355x-sglang-mtpModel / image
amd/GLM-5-MXFP4(same model the abandoned #1254 attempt was targeting).lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260517(the canonical mi35x v0.5.12 dated-nightly tag — same tag used by the merged mi355x bump PRs Update glm5-fp8-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517 #1440/Update glm5.1-fp4-mi355x-sglang SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517 #1441/Update qwen3.5-bf16-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517 #1443/Update qwen3.5-fp8-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517 #1444).Search-space
TP=4 / conc 4..128 and TP=8 / conc 4..8, on both 1k1k and 8k1k — mirrors the shape of
glm5-fp8-mi355x-sglang-mtpthat's already on main.Launch scripts
Mirror
glm5_fp8_mi355x.sh/glm5_fp8_mi355x_mtp.sh(same NSA/tilelang backend stack, ROCm fused-decode-mla disabled, quick-reduce INT4 quantization). MTP variant additionally setsSGLANG_ENABLE_SPEC_V2=1and the standard EAGLE knobs (--speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4) plus--use-chat-templateper AGENTS.md.Why a new PR vs reviving #1254
#1254 targeted an older image (
v0.5.10.post1-rocm700-mi35x-20260428) on rocm700 base; that tag is now stale. Cleaner to land fresh on the current canonical v0.5.12-rocm720 image used by all other live mi355x recipes — and #1254 can be closed without losing work since the model name + recipe shape carry over.Test plan
bash -nsyntax passes on both launch scripts.🤖 Generated with Claude Code