[Klaud Cold] Add glm5-fp4-mi355x-sglang (off + mtp) recipes by functionstackx · Pull Request #1493 · SemiAnalysisAI/InferenceX

functionstackx · 2026-05-18T06:47:54Z

Summary

Adds a new GLM-5 MXFP4 SGLang ROCm recipe family for MI355X, both off and MTP/EAGLE variants in one PR (grouped per project convention).

Recipes

glm5-fp4-mi355x-sglang
glm5-fp4-mi355x-sglang-mtp

Model / image

Model: amd/GLM-5-MXFP4 (same model the abandoned #1254 attempt was targeting).
Image: lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260517 (the canonical mi35x v0.5.12 dated-nightly tag — same tag used by the merged mi355x bump PRs Update glm5-fp8-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517 #1440/Update glm5.1-fp4-mi355x-sglang SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517 #1441/Update qwen3.5-bf16-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517 #1443/Update qwen3.5-fp8-mi355x-sglang and mtp SGLang ROCm image to v0.5.12-rocm720-mi35x-20260517 #1444).

Search-space

TP=4 / conc 4..128 and TP=8 / conc 4..8, on both 1k1k and 8k1k — mirrors the shape of glm5-fp8-mi355x-sglang-mtp that's already on main.

Launch scripts

Mirror glm5_fp8_mi355x.sh / glm5_fp8_mi355x_mtp.sh (same NSA/tilelang backend stack, ROCm fused-decode-mla disabled, quick-reduce INT4 quantization). MTP variant additionally sets SGLANG_ENABLE_SPEC_V2=1 and the standard EAGLE knobs (--speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4) plus --use-chat-template per AGENTS.md.

Why a new PR vs reviving #1254

#1254 targeted an older image (v0.5.10.post1-rocm700-mi35x-20260428) on rocm700 base; that tag is now stale. Cleaner to land fresh on the current canonical v0.5.12-rocm720 image used by all other live mi355x recipes — and #1254 can be closed without losing work since the model name + recipe shape carry over.

Test plan

YAML loads; bash -n syntax passes on both launch scripts.
full-sweep-enabled sweep finishes green for both off + mtp matrices.

🤖 Generated with Claude Code

New GLM-5 MXFP4 family for MI355X using model amd/GLM-5-MXFP4 and lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260517. TP=4 (conc 4..128) and TP=8 (conc 4..8), 1k1k + 8k1k — search-space shape mirrors the existing glm5-fp8-mi355x-sglang-mtp recipe. Launch scripts mirror the glm5-fp8-mi355x sglang ones (same NSA/tilelang backend stack, ROCm fused-decode-mla disabled, quick-reduce INT4 quantization). MTP variant additionally sets SGLANG_ENABLE_SPEC_V2=1 and adds the standard EAGLE knobs + --use-chat-template on the bench client per AGENTS.md. Note: this is the glm5/MXFP4 path (not glm5.1). The existing glm5.1-fp4-mi355x-sglang recipe stays untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T06:48:02Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-18T06:48:02Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-18T06:48:03Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-18T06:48:31Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26018032673
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26018032673

functionstackx · 2026-05-18T06:48:34Z

Closing — mis-named recipe (glm5-fp4-* with model amd/GLM-5-MXFP4 which doesn't exist). The correct FP4 MXFP4 path for MI355X is the existing glm5.1-fp4-mi355x-sglang family (model amd/GLM-5.1-MXFP4); only the MTP sibling needs to be added, which is being done in a fresh PR. cc @functionstackx

claude

LGTM — standard GLM-5 MXFP4 recipe addition mirroring the existing glm5-fp8-mi355x-sglang (off + MTP) shape on the canonical v0.5.12-rocm720-mi35x image.

Extended reasoning...

Overview

This PR adds two new SGLang ROCm recipes for GLM-5 MXFP4 on MI355X (off + MTP/EAGLE variants), with corresponding launch scripts and a perf-changelog entry. Touches .github/configs/amd-master.yaml, two new benchmarks/single_node/glm5_fp4_mi355x*.sh scripts, and perf-changelog.yaml.

Security risks

None. These are benchmark configs and launch scripts that run inside the benchmark harness. No auth, crypto, permissions, or external IO surface changes.

Level of scrutiny

Low. This is a mechanical recipe addition that mirrors the existing glm5-fp8-mi355x-sglang / glm5-fp8-mi355x-sglang-mtp shape — same TP/concurrency search-space pattern, same NSA/tilelang backend stack, same EAGLE knobs, same canonical v0.5.12-rocm720-mi35x-20260517 image used by other live mi355x recipes (per PR description, the merged #1440/#1441/#1443/#1444 bumps).

Other factors

The bug hunting system found no issues. The launch scripts closely mirror glm5_fp8_mi355x.sh / glm5_fp8_mi355x_mtp.sh with only the model and (for MTP) the standard EAGLE flags + --use-chat-template per AGENTS.md. The MTP variant correctly sets SGLANG_ENABLE_SPEC_V2=1. The full-sweep CI label will exercise both matrices end-to-end before merge.

github-actions · 2026-05-18T23:42:57Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26018035453
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26018035453

functionstackx requested a review from a team May 18, 2026 06:47

functionstackx added the full-sweep-enabled label May 18, 2026

functionstackx requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners May 18, 2026 06:47

github-project-automation Bot added this to InferenceMAX Board May 18, 2026

chore: fill pr-link for #1493

63d7c02

functionstackx closed this May 18, 2026

functionstackx deleted the add-glm5-fp4-mi355x-sglang branch May 18, 2026 06:48

github-project-automation Bot moved this to Done in InferenceMAX Board May 18, 2026

claude Bot reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Klaud Cold] Add glm5-fp4-mi355x-sglang (off + mtp) recipes#1493

[Klaud Cold] Add glm5-fp4-mi355x-sglang (off + mtp) recipes#1493
functionstackx wants to merge 2 commits into
mainfrom
add-glm5-fp4-mi355x-sglang

functionstackx commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

functionstackx commented May 18, 2026

Uh oh!

claude Bot left a comment

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented May 18, 2026

Summary

Recipes

Model / image

Search-space

Launch scripts

Why a new PR vs reviving #1254

Test plan

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

functionstackx commented May 18, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant