Skip to content

[Klaud Cold] Add glm5-fp4-mi355x-sglang (off + mtp) recipes#1493

Closed
functionstackx wants to merge 2 commits into
mainfrom
add-glm5-fp4-mi355x-sglang
Closed

[Klaud Cold] Add glm5-fp4-mi355x-sglang (off + mtp) recipes#1493
functionstackx wants to merge 2 commits into
mainfrom
add-glm5-fp4-mi355x-sglang

Conversation

@functionstackx
Copy link
Copy Markdown
Collaborator

Summary

Adds a new GLM-5 MXFP4 SGLang ROCm recipe family for MI355X, both off and MTP/EAGLE variants in one PR (grouped per project convention).

Recipes

  • glm5-fp4-mi355x-sglang
  • glm5-fp4-mi355x-sglang-mtp

Model / image

Search-space

TP=4 / conc 4..128 and TP=8 / conc 4..8, on both 1k1k and 8k1k — mirrors the shape of glm5-fp8-mi355x-sglang-mtp that's already on main.

Launch scripts

Mirror glm5_fp8_mi355x.sh / glm5_fp8_mi355x_mtp.sh (same NSA/tilelang backend stack, ROCm fused-decode-mla disabled, quick-reduce INT4 quantization). MTP variant additionally sets SGLANG_ENABLE_SPEC_V2=1 and the standard EAGLE knobs (--speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4) plus --use-chat-template per AGENTS.md.

Why a new PR vs reviving #1254

#1254 targeted an older image (v0.5.10.post1-rocm700-mi35x-20260428) on rocm700 base; that tag is now stale. Cleaner to land fresh on the current canonical v0.5.12-rocm720 image used by all other live mi355x recipes — and #1254 can be closed without losing work since the model name + recipe shape carry over.

Test plan

  • YAML loads; bash -n syntax passes on both launch scripts.
  • full-sweep-enabled sweep finishes green for both off + mtp matrices.

🤖 Generated with Claude Code

New GLM-5 MXFP4 family for MI355X using model amd/GLM-5-MXFP4 and
lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260517. TP=4 (conc 4..128)
and TP=8 (conc 4..8), 1k1k + 8k1k — search-space shape mirrors the
existing glm5-fp8-mi355x-sglang-mtp recipe.

Launch scripts mirror the glm5-fp8-mi355x sglang ones (same NSA/tilelang
backend stack, ROCm fused-decode-mla disabled, quick-reduce INT4
quantization). MTP variant additionally sets SGLANG_ENABLE_SPEC_V2=1
and adds the standard EAGLE knobs + --use-chat-template on the bench
client per AGENTS.md.

Note: this is the glm5/MXFP4 path (not glm5.1). The existing
glm5.1-fp4-mi355x-sglang recipe stays untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

@functionstackx
Copy link
Copy Markdown
Collaborator Author

Closing — mis-named recipe (glm5-fp4-* with model amd/GLM-5-MXFP4 which doesn't exist). The correct FP4 MXFP4 path for MI355X is the existing glm5.1-fp4-mi355x-sglang family (model amd/GLM-5.1-MXFP4); only the MTP sibling needs to be added, which is being done in a fresh PR. cc @functionstackx

@functionstackx functionstackx deleted the add-glm5-fp4-mi355x-sglang branch May 18, 2026 06:48
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — standard GLM-5 MXFP4 recipe addition mirroring the existing glm5-fp8-mi355x-sglang (off + MTP) shape on the canonical v0.5.12-rocm720-mi35x image.

Extended reasoning...

Overview

This PR adds two new SGLang ROCm recipes for GLM-5 MXFP4 on MI355X (off + MTP/EAGLE variants), with corresponding launch scripts and a perf-changelog entry. Touches .github/configs/amd-master.yaml, two new benchmarks/single_node/glm5_fp4_mi355x*.sh scripts, and perf-changelog.yaml.

Security risks

None. These are benchmark configs and launch scripts that run inside the benchmark harness. No auth, crypto, permissions, or external IO surface changes.

Level of scrutiny

Low. This is a mechanical recipe addition that mirrors the existing glm5-fp8-mi355x-sglang / glm5-fp8-mi355x-sglang-mtp shape — same TP/concurrency search-space pattern, same NSA/tilelang backend stack, same EAGLE knobs, same canonical v0.5.12-rocm720-mi35x-20260517 image used by other live mi355x recipes (per PR description, the merged #1440/#1441/#1443/#1444 bumps).

Other factors

The bug hunting system found no issues. The launch scripts closely mirror glm5_fp8_mi355x.sh / glm5_fp8_mi355x_mtp.sh with only the model and (for MTP) the standard EAGLE flags + --use-chat-template per AGENTS.md. The MTP variant correctly sets SGLANG_ENABLE_SPEC_V2=1. The full-sweep CI label will exercise both matrices end-to-end before merge.

@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant