Add MI355X config: glm5-fp8-sglang-mtp by functionstackx · Pull Request #1086 · SemiAnalysisAI/InferenceX

functionstackx · 2026-04-18T02:48:18Z

Summary

Adds glm5-fp8-mi355x-sglang-mtp config mirroring the existing glm5-fp8-mi355x-sglang non-MTP recipe, plus a new benchmarks/single_node/glm5_fp8_mi355x_mtp.sh launch script.
Adds EAGLE speculative decoding flags on top of the non-MTP script: --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4.
Sets SGLANG_ENABLE_SPEC_V2=1 in the env before sglang.launch_server (required for GLM-5 MTP).
Passes --use-chat-template to run_benchmark_serving per the AGENTS.md requirement for all MTP scripts.
Search space rows carry spec-decoding: mtp so the runner picks up the _mtp.sh variant.
perf-changelog.yaml diff is append-only (no modifications to any existing line).

Test plan

YAML parses for both master config and perf-changelog.
bash -n benchmarks/single_node/glm5_fp8_mi355x_mtp.sh — bash syntax OK.
git diff perf-changelog.yaml shows only additions.
python3 utils/matrix_logic/generate_sweep_configs.py full-sweep --config-files .github/configs/amd-master.yaml — emits 10 entries (2 ISL/OSL × 5 concurrencies) with spec-decoding=mtp.
CI sweep passes on MI355X.

🤖 Generated with Claude Code

claude

LGTM — straightforward MTP config addition mirroring existing patterns; the stale image name in the comment is flagged inline.

Extended reasoning...

Overview

This PR adds a new glm5-fp8-mi355x-sglang-mtp entry to amd-master.yaml, a matching benchmarks/single_node/glm5_fp8_mi355x_mtp.sh launch script, and an append-only perf-changelog.yaml entry. Three files touched, all additive.

Security risks

None. This is a benchmark configuration and shell script that runs inside a pre-defined container. No auth, crypto, or permission-sensitive paths are involved.

Level of scrutiny

Low. The change follows a well-worn pattern — identical structure to dsr1-fp8-b200-sglang-mtp, qwen3.5-fp8-b200-sglang-mtp, and other MTP siblings already in the repo. The EAGLE speculative-decoding flags, SGLANG_ENABLE_SPEC_V2=1, and --use-chat-template are consistent with existing MTP scripts.

Other factors

The one bug found (stale image name in a comment) is a documentation nit with zero functional impact — the actual image is correctly specified in the YAML and the pip install workaround runs unconditionally regardless. The inline comment already covers this. No outstanding reviewer threads exist.

claude · 2026-04-18T02:56:58Z

+# GLM-5 requires transformers with glm_moe_dsa model type support.
+# However, the Image rocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219 doesn't provide this support.
+python3 -m pip install -U --no-cache-dir \
+  "git+https://github.com/huggingface/transformers.git@6ed9ee36f608fd145168377345bfc4a5de12e1e2"


🟡 The comment on lines 18-19 of glm5_fp8_mi355x_mtp.sh references image rocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219, but the script actually runs against lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260413 (as defined in the YAML config). Update the comment to reference the correct image so maintainers can accurately evaluate whether the pinned transformers pip install is still required.

Extended reasoning...

What the bug is: Lines 18–19 of the new benchmarks/single_node/glm5_fp8_mi355x_mtp.sh contain the comment:

# However, the Image rocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219 doesn't provide this support.

This comment explains why the script unconditionally runs pip install -U … transformers@<commit>. The stated reason is that a specific image lacks GLM-5 glm_moe_dsa model type support.

The specific code path that triggers it: A developer reading the script to decide whether the workaround is still necessary will look at lines 18–21 to understand the rationale. The comment identifies rocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219 as the deficient image.

Why existing code doesn't prevent it: The comment was copy-pasted from benchmarks/single_node/glm5_fp8_mi355x.sh (the non-MTP script), which itself already had a stale comment after PR #1023 upgraded the non-MTP image from that old rocm/sgl-dev image to lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260413 without updating the explanatory comment. The MTP script was then created by copying the non-MTP script and inheriting the already-stale comment.

What the impact would be: The actual image used—as defined by glm5-fp8-mi355x-sglang-mtp in amd-master.yaml—is lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260413. A maintainer reading the comment might:

Assume the pip install workaround is still needed because of a limitation in the old v0.5.8 image, and not think to re-check it against the newer v0.5.10rc0 image.

Conversely, if they do notice the discrepancy, they waste time tracing the history to figure out which image the comment was meant to describe.
The pip install runs regardless, so there is no functional impact—this is purely a documentation accuracy issue.

How to fix it: Update the comment to reference the image actually in use. For example:

# GLM-5 requires transformers with glm_moe_dsa model type support. # However, the image lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260413 doesn't provide this support.

The same fix should also be applied to the identical stale comment in glm5_fp8_mi355x.sh (the non-MTP script), which was introduced in PR #1023.

Step-by-step proof:

Open .github/configs/amd-master.yaml and look at the glm5-fp8-mi355x-sglang-mtp key → image: lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260413.

Open benchmarks/single_node/glm5_fp8_mi355x_mtp.sh, line 19 → # However, the Image rocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219 doesn't provide this support.

The image name in the comment (rocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219) is entirely different from the image actually pulled (lmsysorg/sglang-rocm:v0.5.10rc0-rocm720-mi35x-20260413) — different registry, different tag, different date.

Check benchmarks/single_node/glm5_fp8_mi355x.sh line 19 → identical stale comment, confirming the MTP script was copy-pasted from there without updating the comment.

Mirrors the existing glm5-fp8-mi355x-sglang non-MTP recipe and adds EAGLE speculative decoding (num-steps=3, eagle-topk=1, num-draft-tokens=4) via the standard spec-decoding=mtp suffix. SGLANG_ENABLE_SPEC_V2=1 is set before launching the server as required for GLM-5 MTP. Script also passes --use-chat-template to run_benchmark_serving, as required by AGENTS.md for all MTP configs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx requested a review from a team April 18, 2026 02:48

github-project-automation Bot added this to InferenceMAX Board Apr 18, 2026

functionstackx requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners April 18, 2026 02:48

functionstackx added the sweep-enabled label Apr 18, 2026

claude Bot reviewed Apr 18, 2026

View reviewed changes

functionstackx force-pushed the claude/add-glm5-fp8-mi355x-mtp branch from 1f01c5a to fa2f438 Compare April 18, 2026 03:07

functionstackx removed the sweep-enabled label Apr 18, 2026

functionstackx force-pushed the claude/add-glm5-fp8-mi355x-mtp branch from fa2f438 to c698215 Compare April 18, 2026 06:03

functionstackx merged commit ca59324 into main Apr 18, 2026
3 checks passed

functionstackx deleted the claude/add-glm5-fp8-mi355x-mtp branch April 18, 2026 06:03

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 18, 2026

functionstackx mentioned this pull request Apr 23, 2026

[AMD/ROCM] GLM5.1 FP8 MTP Support on MI355X #1122

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MI355X config: glm5-fp8-sglang-mtp#1086

Add MI355X config: glm5-fp8-sglang-mtp#1086
functionstackx merged 1 commit intomainfrom
claude/add-glm5-fp8-mi355x-mtp

functionstackx commented Apr 18, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Apr 18, 2026

Summary

Test plan

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant