[NV] GLM5 FP8 B200 SGLang disagg config by Ankur-singh · Pull Request #1372 · SemiAnalysisAI/InferenceX

Ankur-singh · 2026-05-13T18:39:17Z

Summary

add GLM-5 FP8 B200 Dynamo SGLang disaggregated config
vendor 1k1k and 8k1k STP recipes under benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/
append perf changelog entry for glm5-fp8-b200-dynamo-sglang

Validation

uv run --with pydantic --with pyyaml python utils/matrix_logic/generate_sweep_configs.py test-config --config-files .github/configs/nvidia-master.yaml --config-keys glm5-fp8-b200-dynamo-sglang
YAML/schema check for recipes and perf changelog
git diff --check

github-actions · 2026-05-13T18:39:34Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-05-13T18:42:25Z

Claude finished @Ankur-singh's task in 3m 0s —— View job

Review of PR #1372

Read CLAUDE.md and repo guidelines
Review nvidia-master.yaml changes (model prefix, image accessibility, config structure)
Review perf-changelog.yaml (entry present + appended at end)
Review STP recipe YAML files for correctness and consistency with master config

LGTM — no blocking issues found.

Validated:

Image: lmsysorg/sglang:v0.5.11-cu130 is publicly accessible on Docker Hub ✅
Model prefix: glm5 is consistent with existing configs in the repo ✅
Perf changelog: entry appended at the end of perf-changelog.yaml with correct PR link ✅
Recipe ↔ master config consistency: all 15 recipe files match their master config counterparts — prefill_workers/decode_workers match num-worker, concurrency values match conc-list, ISL/OSL values match ✅
All CONFIG_FILE references in the master config have corresponding recipe files ✅
STP-only recipes (no MTP/speculative decoding) — no --use-chat-template requirement ✅

functionstackx · 2026-05-13T18:44:57Z

thanks for the contribution @Ankur-singh and NV team! any chance u can run full-sweep-enabled label instead of sweep-enabled?

+viz @Oseltamivir

claude

LGTM — config-only addition following the established disagg pattern; the inline nits are cosmetic.

Extended reasoning...

Overview

This PR adds the glm5-fp8-b200-dynamo-sglang entry to .github/configs/nvidia-master.yaml, vendors 15 new SGLang STP recipe YAMLs under benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/ (1k1k and 8k1k, low-latency + max-throughput variants), and appends a corresponding entry to perf-changelog.yaml. It follows the same shape as the pre-existing dsv4-fp4-gb300-dynamo-sglang block.

Security risks

None. No code, scripts, or auth/permission paths touched — only declarative YAML for benchmark configuration. The container image (lmsysorg/sglang:v0.5.11-cu130) is already used in the repo for other GLM-5 entries.

Level of scrutiny

Low. This is a mechanical benchmark-config addition that follows established patterns, was validated with the repo's own generate_sweep_configs.py test-config tool, and is gated behind the sweep-enabled label/CI rather than affecting production code paths.

Other factors

The two findings are both flagged as nits and would not block a merge: (1) the perf-changelog references /scratch/fsw/models/GLM-5-FP8 while the launcher maps the alias to /lustre/fsw/models/GLM-5-FP8 — documentation only, alias resolves correctly at runtime; (2) seven *_maxtpt_*.yaml recipes carry a name: of ...hightpt_N instead of ...maxtpt_N, but the runner rewrites name: at submit time so it never reaches the benchmarking layer. Both are worth a quick follow-up but do not warrant blocking the merge.

claude · 2026-05-13T18:52:07Z

+  description:
+    - "Add GLM-5 FP8 B200 Dynamo SGLang disaggregated multi-node coverage using lmsysorg/sglang:v0.5.11-cu130"
+    - "Vendor the 1k1k and 8k1k STP low-latency and max-throughput srt-slurm recipes under benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/"
+    - "Use the B200 local model alias glm5-fp8 mapped to /scratch/fsw/models/GLM-5-FP8"


🟡 perf-changelog entry says the glm5-fp8 alias is mapped to /scratch/fsw/models/GLM-5-FP8, but the actual mapping in runners/launch_b200-dgxc.sh:46 is /lustre/fsw/models/GLM-5-FP8 (consistent with every other B200 model alias under /lustre/fsw/models/). Documentation-only mismatch — the recipes resolve the path correctly via the launcher at runtime — but the changelog (and PR description) will mislead anyone using it to find where the model is staged on the B200 DGXC host. Suggest fixing the path to /lustre/fsw/models/GLM-5-FP8.

Extended reasoning...

What is wrong

perf-changelog.yaml:2451 adds the line:

- "Use the B200 local model alias glm5-fp8 mapped to /scratch/fsw/models/GLM-5-FP8"

but the launcher that actually resolves the glm5-fp8 alias on B200 maps it to a different path. From runners/launch_b200-dgxc.sh:46:

elif [[ $MODEL_PREFIX == "glm5" && $PRECISION == "fp8" ]]; then export MODEL_PATH="/lustre/fsw/models/GLM-5-FP8"

The header comment in the same script even notes that MODEL_PATH is overridden with pre-downloaded paths "on the shared Lustre tree", and every other B200 model alias in that elif chain uses /lustre/fsw/models/*. The string /scratch/fsw/models does not appear anywhere else in the repo.

Step-by-step proof

A user reads the new changelog entry for glm5-fp8-b200-dynamo-sglang and wants to stage the model locally before launching.

They follow the changelog and look for /scratch/fsw/models/GLM-5-FP8 — this path does not exist on the B200 DGXC host (nothing else in the repo references /scratch/fsw/models).

The benchmark itself still works because the recipes set model.path: glm5-fp8 and launch_b200-dgxc.sh resolves that alias to /lustre/fsw/models/GLM-5-FP8 at runtime — so there is no functional regression.

But the user troubleshooting "where is the model on disk?" is sent to the wrong filesystem.

Impact

Documentation only — no runtime effect. The PR description repeats the same wrong path, so both the changelog and the merge commit message will be misleading for traceability.

Fix

Change /scratch/fsw/models/GLM-5-FP8 to /lustre/fsw/models/GLM-5-FP8 in perf-changelog.yaml:2451 (and ideally in the PR description as well, though that does not block the merge).

claude · 2026-05-13T18:52:07Z

@@ -0,0 +1,101 @@
+name: b200-fp8-glm5_1k1k_hightpt_0


🟡 All 7 new max-throughput recipe YAMLs have a name: field that says hightpt while the filenames (and CONFIG_FILE=...maxtpt_X.yaml references in the master config) say maxtpt — e.g. 1k1k_stp_maxtpt_0.yaml -> name: b200-fp8-glm5_1k1k_hightpt_0. The corresponding _lowlat_ recipes correctly match (lowlat in both filename and name). This is purely cosmetic — the runner rewrites name: at submit time via sed -i "s/^name:.*/name: ..." — but worth fixing so the committed YAMLs stay self-consistent with the filename and the perf-changelog/master-config references.

Extended reasoning...

What the bug is

All seven max-throughput recipes added in this PR have a name: field that uses the token hightpt even though the filename uses maxtpt:

File name: field

1k1k/disagg/stp/1k1k_stp_maxtpt_0.yaml b200-fp8-glm5_1k1k_hightpt_0

1k1k/disagg/stp/1k1k_stp_maxtpt_1.yaml b200-fp8-glm5_1k1k_hightpt_1

1k1k/disagg/stp/1k1k_stp_maxtpt_2.yaml b200-fp8-glm5_1k1k_hightpt_2

1k1k/disagg/stp/1k1k_stp_maxtpt_3.yaml b200-fp8-glm5_1k1k_hightpt_3

8k1k/disagg/stp/8k1k_stp_maxtpt_0.yaml b200-fp8-glm5_8k1k_hightpt_0

8k1k/disagg/stp/8k1k_stp_maxtpt_1.yaml b200-fp8-glm5_8k1k_hightpt_1

8k1k/disagg/stp/8k1k_stp_maxtpt_2.yaml b200-fp8-glm5_8k1k_hightpt_2

The corresponding _lowlat_ files are all consistent (lowlat in both the filename and name:).

Why this is purely cosmetic

The slurm runners override name: at job-submit time. For example, runners/launch_b200-dgxc.sh does roughly:

sed -i "s/^name:.*/name: \"${RUNNER_NAME}\"/" "$CONFIG_FILE"

so by the time the recipe is read by anything downstream, the in-tree name: value has been replaced. That means hightpt never reaches the benchmarking layer and the run completes correctly.

Why it's still worth fixing

The master config and perf-changelog uniformly use the maxtpt token:

.github/configs/nvidia-master.yaml references CONFIG_FILE=recipes/sglang/glm5/b200-fp8/1k1k/disagg/stp/1k1k_stp_maxtpt_0.yaml etc.

The token hightpt does not appear anywhere else in benchmarks/multi_node/srt-slurm-recipes/ — it is unique to these new GLM-5 files.

So anyone grepping the committed tree for the recipe by filename (maxtpt_0) will find a name: that disagrees with the filename, and anyone grepping by name (hightpt_0) will not find the matching config-key reference. It is a small but real traceability papercut.

Step-by-step proof

Open benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/1k1k/disagg/stp/1k1k_stp_maxtpt_0.yaml. Line 1 reads name: b200-fp8-glm5_1k1k_hightpt_0.

Open the matching low-latency file 1k1k_stp_lowlat_0.yaml. Line 1 reads name: b200-fp8-glm5_1k1k_lowlat_0 — token matches the filename.

In .github/configs/nvidia-master.yaml, the glm5-fp8-b200-dynamo-sglang entry references the file as CONFIG_FILE=recipes/sglang/glm5/b200-fp8/1k1k/disagg/stp/1k1k_stp_maxtpt_0.yaml — using maxtpt, not hightpt.

grep -r hightpt benchmarks/multi_node/srt-slurm-recipes/ returns only the 7 files added in this PR; no other recipe in the tree uses that token.

Because the runner rewrites name: before the recipe is consumed downstream, no behavioural test will fail — the inconsistency only shows up when a human reads the YAML.

How to fix

Replace hightpt with maxtpt on the first line of each of the seven max-throughput recipes so the name: token matches the filename and the rest of the codebase:

sed -i 's/_hightpt_/_maxtpt_/' \\ benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/{1k1k,8k1k}/disagg/stp/*_maxtpt_*.yaml

github-actions · 2026-05-13T19:18:53Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25819176223
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25819176223

The new glm5-fp8-b200-dynamo-sglang config references recipes/sglang/glm5/b200-fp8/... paths inside the cloned srt-slurm repo, but srt-slurm@sa-submission-q2-2026 does not ship those recipes. Add a parallel branch to the existing dsv4 overlay so the vendored recipes under benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/ are copied into the clone before srtctl apply runs.

github-actions · 2026-05-13T20:53:47Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25825808625
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25825808625

github-actions · 2026-05-13T20:58:41Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25825860956
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25825860956

github-actions · 2026-05-13T21:31:06Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25827569200
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25827569200

Ankur-singh · 2026-05-13T21:34:52Z

thanks for the contribution @Ankur-singh and NV team! any chance u can run full-sweep-enabled label instead of sweep-enabled?

+viz @Oseltamivir

Hi @Oseltamivir, trying to get this config to work. Will add the full-sweep-enabled label once I get a successful run😁

github-actions · 2026-05-13T22:07:33Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25827673208
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25827673208

github-actions · 2026-05-14T09:15:04Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25827673208
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25827673208

Add GLM5 B200 Dynamo SGLang disagg config

f70cf78

github-project-automation Bot added this to InferenceMAX Board May 13, 2026

Update GLM5 B200 disagg changelog PR link

64329e9

Ankur-singh added the sweep-enabled label May 13, 2026

Ankur-singh changed the title ~~Add GLM5 B200 Dynamo SGLang disagg config~~ [NV] GLM5 B200 SGLang disagg config May 13, 2026

Ankur-singh marked this pull request as ready for review May 13, 2026 18:41

Ankur-singh requested a review from a team May 13, 2026 18:41

Ankur-singh requested review from jgangani and kedarpotdar-nv as code owners May 13, 2026 18:42

functionstackx changed the title ~~[NV] GLM5 B200 SGLang disagg config~~ [NV] GLM5 FP8 B200 SGLang disagg config May 13, 2026

claude Bot reviewed May 13, 2026

View reviewed changes

Merge branch 'main' into nv/glm5-fp8-b200-dynamo-sglang-disagg-upstream

99641cc

Pin glm5-fp8-b200-dynamo-sglang to b200-dgxc runner group

a3bd6c1

Merge branch 'main' into nv/glm5-fp8-b200-dynamo-sglang-disagg-upstream

65514e5

File	`name:` field
`1k1k/disagg/stp/1k1k_stp_maxtpt_0.yaml`	`b200-fp8-glm5_1k1k_hightpt_0`
`1k1k/disagg/stp/1k1k_stp_maxtpt_1.yaml`	`b200-fp8-glm5_1k1k_hightpt_1`
`1k1k/disagg/stp/1k1k_stp_maxtpt_2.yaml`	`b200-fp8-glm5_1k1k_hightpt_2`
`1k1k/disagg/stp/1k1k_stp_maxtpt_3.yaml`	`b200-fp8-glm5_1k1k_hightpt_3`
`8k1k/disagg/stp/8k1k_stp_maxtpt_0.yaml`	`b200-fp8-glm5_8k1k_hightpt_0`
`8k1k/disagg/stp/8k1k_stp_maxtpt_1.yaml`	`b200-fp8-glm5_8k1k_hightpt_1`
`8k1k/disagg/stp/8k1k_stp_maxtpt_2.yaml`	`b200-fp8-glm5_8k1k_hightpt_2`

Conversation

Ankur-singh commented May 13, 2026

Summary

Validation

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

claude Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #1372

Uh oh!

functionstackx commented May 13, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Ankur-singh commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented May 13, 2026 •

edited

Loading