[NV] GLM5 FP8 B200 SGLang disagg config#1372
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Claude finished @Ankur-singh's task in 3m 0s —— View job Review of PR #1372
LGTM — no blocking issues found. Validated:
|
|
thanks for the contribution @Ankur-singh and NV team! any chance u can run +viz @Oseltamivir |
There was a problem hiding this comment.
LGTM — config-only addition following the established disagg pattern; the inline nits are cosmetic.
Extended reasoning...
Overview
This PR adds the glm5-fp8-b200-dynamo-sglang entry to .github/configs/nvidia-master.yaml, vendors 15 new SGLang STP recipe YAMLs under benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/ (1k1k and 8k1k, low-latency + max-throughput variants), and appends a corresponding entry to perf-changelog.yaml. It follows the same shape as the pre-existing dsv4-fp4-gb300-dynamo-sglang block.
Security risks
None. No code, scripts, or auth/permission paths touched — only declarative YAML for benchmark configuration. The container image (lmsysorg/sglang:v0.5.11-cu130) is already used in the repo for other GLM-5 entries.
Level of scrutiny
Low. This is a mechanical benchmark-config addition that follows established patterns, was validated with the repo's own generate_sweep_configs.py test-config tool, and is gated behind the sweep-enabled label/CI rather than affecting production code paths.
Other factors
The two findings are both flagged as nits and would not block a merge: (1) the perf-changelog references /scratch/fsw/models/GLM-5-FP8 while the launcher maps the alias to /lustre/fsw/models/GLM-5-FP8 — documentation only, alias resolves correctly at runtime; (2) seven *_maxtpt_*.yaml recipes carry a name: of ...hightpt_N instead of ...maxtpt_N, but the runner rewrites name: at submit time so it never reaches the benchmarking layer. Both are worth a quick follow-up but do not warrant blocking the merge.
| description: | ||
| - "Add GLM-5 FP8 B200 Dynamo SGLang disaggregated multi-node coverage using lmsysorg/sglang:v0.5.11-cu130" | ||
| - "Vendor the 1k1k and 8k1k STP low-latency and max-throughput srt-slurm recipes under benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/" | ||
| - "Use the B200 local model alias glm5-fp8 mapped to /scratch/fsw/models/GLM-5-FP8" |
There was a problem hiding this comment.
🟡 perf-changelog entry says the glm5-fp8 alias is mapped to /scratch/fsw/models/GLM-5-FP8, but the actual mapping in runners/launch_b200-dgxc.sh:46 is /lustre/fsw/models/GLM-5-FP8 (consistent with every other B200 model alias under /lustre/fsw/models/). Documentation-only mismatch — the recipes resolve the path correctly via the launcher at runtime — but the changelog (and PR description) will mislead anyone using it to find where the model is staged on the B200 DGXC host. Suggest fixing the path to /lustre/fsw/models/GLM-5-FP8.
Extended reasoning...
What is wrong
perf-changelog.yaml:2451 adds the line:
- "Use the B200 local model alias glm5-fp8 mapped to /scratch/fsw/models/GLM-5-FP8"but the launcher that actually resolves the glm5-fp8 alias on B200 maps it to a different path. From runners/launch_b200-dgxc.sh:46:
elif [[ $MODEL_PREFIX == "glm5" && $PRECISION == "fp8" ]]; then
export MODEL_PATH="/lustre/fsw/models/GLM-5-FP8"The header comment in the same script even notes that MODEL_PATH is overridden with pre-downloaded paths "on the shared Lustre tree", and every other B200 model alias in that elif chain uses /lustre/fsw/models/*. The string /scratch/fsw/models does not appear anywhere else in the repo.
Step-by-step proof
- A user reads the new changelog entry for
glm5-fp8-b200-dynamo-sglangand wants to stage the model locally before launching. - They follow the changelog and look for
/scratch/fsw/models/GLM-5-FP8— this path does not exist on the B200 DGXC host (nothing else in the repo references/scratch/fsw/models). - The benchmark itself still works because the recipes set
model.path: glm5-fp8andlaunch_b200-dgxc.shresolves that alias to/lustre/fsw/models/GLM-5-FP8at runtime — so there is no functional regression. - But the user troubleshooting "where is the model on disk?" is sent to the wrong filesystem.
Impact
Documentation only — no runtime effect. The PR description repeats the same wrong path, so both the changelog and the merge commit message will be misleading for traceability.
Fix
Change /scratch/fsw/models/GLM-5-FP8 to /lustre/fsw/models/GLM-5-FP8 in perf-changelog.yaml:2451 (and ideally in the PR description as well, though that does not block the merge).
| @@ -0,0 +1,101 @@ | |||
| name: b200-fp8-glm5_1k1k_hightpt_0 | |||
There was a problem hiding this comment.
🟡 All 7 new max-throughput recipe YAMLs have a name: field that says hightpt while the filenames (and CONFIG_FILE=...maxtpt_X.yaml references in the master config) say maxtpt — e.g. 1k1k_stp_maxtpt_0.yaml -> name: b200-fp8-glm5_1k1k_hightpt_0. The corresponding _lowlat_ recipes correctly match (lowlat in both filename and name). This is purely cosmetic — the runner rewrites name: at submit time via sed -i "s/^name:.*/name: ..." — but worth fixing so the committed YAMLs stay self-consistent with the filename and the perf-changelog/master-config references.
Extended reasoning...
What the bug is
All seven max-throughput recipes added in this PR have a name: field that uses the token hightpt even though the filename uses maxtpt:
| File | name: field |
|---|---|
1k1k/disagg/stp/1k1k_stp_maxtpt_0.yaml |
b200-fp8-glm5_1k1k_hightpt_0 |
1k1k/disagg/stp/1k1k_stp_maxtpt_1.yaml |
b200-fp8-glm5_1k1k_hightpt_1 |
1k1k/disagg/stp/1k1k_stp_maxtpt_2.yaml |
b200-fp8-glm5_1k1k_hightpt_2 |
1k1k/disagg/stp/1k1k_stp_maxtpt_3.yaml |
b200-fp8-glm5_1k1k_hightpt_3 |
8k1k/disagg/stp/8k1k_stp_maxtpt_0.yaml |
b200-fp8-glm5_8k1k_hightpt_0 |
8k1k/disagg/stp/8k1k_stp_maxtpt_1.yaml |
b200-fp8-glm5_8k1k_hightpt_1 |
8k1k/disagg/stp/8k1k_stp_maxtpt_2.yaml |
b200-fp8-glm5_8k1k_hightpt_2 |
The corresponding _lowlat_ files are all consistent (lowlat in both the filename and name:).
Why this is purely cosmetic
The slurm runners override name: at job-submit time. For example, runners/launch_b200-dgxc.sh does roughly:
sed -i "s/^name:.*/name: \"${RUNNER_NAME}\"/" "$CONFIG_FILE"so by the time the recipe is read by anything downstream, the in-tree name: value has been replaced. That means hightpt never reaches the benchmarking layer and the run completes correctly.
Why it's still worth fixing
The master config and perf-changelog uniformly use the maxtpt token:
.github/configs/nvidia-master.yamlreferencesCONFIG_FILE=recipes/sglang/glm5/b200-fp8/1k1k/disagg/stp/1k1k_stp_maxtpt_0.yamletc.- The token
hightptdoes not appear anywhere else inbenchmarks/multi_node/srt-slurm-recipes/— it is unique to these new GLM-5 files.
So anyone grepping the committed tree for the recipe by filename (maxtpt_0) will find a name: that disagrees with the filename, and anyone grepping by name (hightpt_0) will not find the matching config-key reference. It is a small but real traceability papercut.
Step-by-step proof
- Open
benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/1k1k/disagg/stp/1k1k_stp_maxtpt_0.yaml. Line 1 readsname: b200-fp8-glm5_1k1k_hightpt_0. - Open the matching low-latency file
1k1k_stp_lowlat_0.yaml. Line 1 readsname: b200-fp8-glm5_1k1k_lowlat_0— token matches the filename. - In
.github/configs/nvidia-master.yaml, theglm5-fp8-b200-dynamo-sglangentry references the file asCONFIG_FILE=recipes/sglang/glm5/b200-fp8/1k1k/disagg/stp/1k1k_stp_maxtpt_0.yaml— usingmaxtpt, nothightpt. grep -r hightpt benchmarks/multi_node/srt-slurm-recipes/returns only the 7 files added in this PR; no other recipe in the tree uses that token.- Because the runner rewrites
name:before the recipe is consumed downstream, no behavioural test will fail — the inconsistency only shows up when a human reads the YAML.
How to fix
Replace hightpt with maxtpt on the first line of each of the seven max-throughput recipes so the name: token matches the filename and the rest of the codebase:
sed -i 's/_hightpt_/_maxtpt_/' \\
benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/{1k1k,8k1k}/disagg/stp/*_maxtpt_*.yaml|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25819176223 |
The new glm5-fp8-b200-dynamo-sglang config references recipes/sglang/glm5/b200-fp8/... paths inside the cloned srt-slurm repo, but srt-slurm@sa-submission-q2-2026 does not ship those recipes. Add a parallel branch to the existing dsv4 overlay so the vendored recipes under benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/ are copied into the clone before srtctl apply runs.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25825808625 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25825860956 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25827569200 |
Hi @Oseltamivir, trying to get this config to work. Will add the full-sweep-enabled label once I get a successful run😁 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25827673208 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25827673208 |
Summary
benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/glm5-fp8-b200-dynamo-sglangValidation
uv run --with pydantic --with pyyaml python utils/matrix_logic/generate_sweep_configs.py test-config --config-files .github/configs/nvidia-master.yaml --config-keys glm5-fp8-b200-dynamo-sglanggit diff --check