Skip to content

[NV] GLM5 FP8 B200 SGLang disagg config#1372

Open
Ankur-singh wants to merge 6 commits into
mainfrom
nv/glm5-fp8-b200-dynamo-sglang-disagg-upstream
Open

[NV] GLM5 FP8 B200 SGLang disagg config#1372
Ankur-singh wants to merge 6 commits into
mainfrom
nv/glm5-fp8-b200-dynamo-sglang-disagg-upstream

Conversation

@Ankur-singh
Copy link
Copy Markdown
Collaborator

Summary

  • add GLM-5 FP8 B200 Dynamo SGLang disaggregated config
  • vendor 1k1k and 8k1k STP recipes under benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/
  • append perf changelog entry for glm5-fp8-b200-dynamo-sglang

Validation

  • uv run --with pydantic --with pyyaml python utils/matrix_logic/generate_sweep_configs.py test-config --config-files .github/configs/nvidia-master.yaml --config-keys glm5-fp8-b200-dynamo-sglang
  • YAML/schema check for recipes and perf changelog
  • git diff --check

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@Ankur-singh Ankur-singh changed the title Add GLM5 B200 Dynamo SGLang disagg config [NV] GLM5 B200 SGLang disagg config May 13, 2026
@Ankur-singh Ankur-singh marked this pull request as ready for review May 13, 2026 18:41
@Ankur-singh Ankur-singh requested a review from a team May 13, 2026 18:41
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 13, 2026

Claude finished @Ankur-singh's task in 3m 0s —— View job


Review of PR #1372

  • Read CLAUDE.md and repo guidelines
  • Review nvidia-master.yaml changes (model prefix, image accessibility, config structure)
  • Review perf-changelog.yaml (entry present + appended at end)
  • Review STP recipe YAML files for correctness and consistency with master config

LGTM — no blocking issues found.

Validated:

  • Image: lmsysorg/sglang:v0.5.11-cu130 is publicly accessible on Docker Hub ✅
  • Model prefix: glm5 is consistent with existing configs in the repo ✅
  • Perf changelog: entry appended at the end of perf-changelog.yaml with correct PR link ✅
  • Recipe ↔ master config consistency: all 15 recipe files match their master config counterparts — prefill_workers/decode_workers match num-worker, concurrency values match conc-list, ISL/OSL values match ✅
  • All CONFIG_FILE references in the master config have corresponding recipe files ✅
  • STP-only recipes (no MTP/speculative decoding) — no --use-chat-template requirement ✅

@functionstackx functionstackx changed the title [NV] GLM5 B200 SGLang disagg config [NV] GLM5 FP8 B200 SGLang disagg config May 13, 2026
@functionstackx
Copy link
Copy Markdown
Contributor

thanks for the contribution @Ankur-singh and NV team! any chance u can run full-sweep-enabled label instead of sweep-enabled?

+viz @Oseltamivir

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — config-only addition following the established disagg pattern; the inline nits are cosmetic.

Extended reasoning...

Overview

This PR adds the glm5-fp8-b200-dynamo-sglang entry to .github/configs/nvidia-master.yaml, vendors 15 new SGLang STP recipe YAMLs under benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/ (1k1k and 8k1k, low-latency + max-throughput variants), and appends a corresponding entry to perf-changelog.yaml. It follows the same shape as the pre-existing dsv4-fp4-gb300-dynamo-sglang block.

Security risks

None. No code, scripts, or auth/permission paths touched — only declarative YAML for benchmark configuration. The container image (lmsysorg/sglang:v0.5.11-cu130) is already used in the repo for other GLM-5 entries.

Level of scrutiny

Low. This is a mechanical benchmark-config addition that follows established patterns, was validated with the repo's own generate_sweep_configs.py test-config tool, and is gated behind the sweep-enabled label/CI rather than affecting production code paths.

Other factors

The two findings are both flagged as nits and would not block a merge: (1) the perf-changelog references /scratch/fsw/models/GLM-5-FP8 while the launcher maps the alias to /lustre/fsw/models/GLM-5-FP8 — documentation only, alias resolves correctly at runtime; (2) seven *_maxtpt_*.yaml recipes carry a name: of ...hightpt_N instead of ...maxtpt_N, but the runner rewrites name: at submit time so it never reaches the benchmarking layer. Both are worth a quick follow-up but do not warrant blocking the merge.

Comment thread perf-changelog.yaml
description:
- "Add GLM-5 FP8 B200 Dynamo SGLang disaggregated multi-node coverage using lmsysorg/sglang:v0.5.11-cu130"
- "Vendor the 1k1k and 8k1k STP low-latency and max-throughput srt-slurm recipes under benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/"
- "Use the B200 local model alias glm5-fp8 mapped to /scratch/fsw/models/GLM-5-FP8"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 perf-changelog entry says the glm5-fp8 alias is mapped to /scratch/fsw/models/GLM-5-FP8, but the actual mapping in runners/launch_b200-dgxc.sh:46 is /lustre/fsw/models/GLM-5-FP8 (consistent with every other B200 model alias under /lustre/fsw/models/). Documentation-only mismatch — the recipes resolve the path correctly via the launcher at runtime — but the changelog (and PR description) will mislead anyone using it to find where the model is staged on the B200 DGXC host. Suggest fixing the path to /lustre/fsw/models/GLM-5-FP8.

Extended reasoning...

What is wrong

perf-changelog.yaml:2451 adds the line:

- "Use the B200 local model alias glm5-fp8 mapped to /scratch/fsw/models/GLM-5-FP8"

but the launcher that actually resolves the glm5-fp8 alias on B200 maps it to a different path. From runners/launch_b200-dgxc.sh:46:

elif [[ $MODEL_PREFIX == "glm5" && $PRECISION == "fp8" ]]; then
    export MODEL_PATH="/lustre/fsw/models/GLM-5-FP8"

The header comment in the same script even notes that MODEL_PATH is overridden with pre-downloaded paths "on the shared Lustre tree", and every other B200 model alias in that elif chain uses /lustre/fsw/models/*. The string /scratch/fsw/models does not appear anywhere else in the repo.

Step-by-step proof

  1. A user reads the new changelog entry for glm5-fp8-b200-dynamo-sglang and wants to stage the model locally before launching.
  2. They follow the changelog and look for /scratch/fsw/models/GLM-5-FP8 — this path does not exist on the B200 DGXC host (nothing else in the repo references /scratch/fsw/models).
  3. The benchmark itself still works because the recipes set model.path: glm5-fp8 and launch_b200-dgxc.sh resolves that alias to /lustre/fsw/models/GLM-5-FP8 at runtime — so there is no functional regression.
  4. But the user troubleshooting "where is the model on disk?" is sent to the wrong filesystem.

Impact

Documentation only — no runtime effect. The PR description repeats the same wrong path, so both the changelog and the merge commit message will be misleading for traceability.

Fix

Change /scratch/fsw/models/GLM-5-FP8 to /lustre/fsw/models/GLM-5-FP8 in perf-changelog.yaml:2451 (and ideally in the PR description as well, though that does not block the merge).

@@ -0,0 +1,101 @@
name: b200-fp8-glm5_1k1k_hightpt_0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 All 7 new max-throughput recipe YAMLs have a name: field that says hightpt while the filenames (and CONFIG_FILE=...maxtpt_X.yaml references in the master config) say maxtpt — e.g. 1k1k_stp_maxtpt_0.yaml -> name: b200-fp8-glm5_1k1k_hightpt_0. The corresponding _lowlat_ recipes correctly match (lowlat in both filename and name). This is purely cosmetic — the runner rewrites name: at submit time via sed -i "s/^name:.*/name: ..." — but worth fixing so the committed YAMLs stay self-consistent with the filename and the perf-changelog/master-config references.

Extended reasoning...

What the bug is

All seven max-throughput recipes added in this PR have a name: field that uses the token hightpt even though the filename uses maxtpt:

File name: field
1k1k/disagg/stp/1k1k_stp_maxtpt_0.yaml b200-fp8-glm5_1k1k_hightpt_0
1k1k/disagg/stp/1k1k_stp_maxtpt_1.yaml b200-fp8-glm5_1k1k_hightpt_1
1k1k/disagg/stp/1k1k_stp_maxtpt_2.yaml b200-fp8-glm5_1k1k_hightpt_2
1k1k/disagg/stp/1k1k_stp_maxtpt_3.yaml b200-fp8-glm5_1k1k_hightpt_3
8k1k/disagg/stp/8k1k_stp_maxtpt_0.yaml b200-fp8-glm5_8k1k_hightpt_0
8k1k/disagg/stp/8k1k_stp_maxtpt_1.yaml b200-fp8-glm5_8k1k_hightpt_1
8k1k/disagg/stp/8k1k_stp_maxtpt_2.yaml b200-fp8-glm5_8k1k_hightpt_2

The corresponding _lowlat_ files are all consistent (lowlat in both the filename and name:).

Why this is purely cosmetic

The slurm runners override name: at job-submit time. For example, runners/launch_b200-dgxc.sh does roughly:

sed -i "s/^name:.*/name: \"${RUNNER_NAME}\"/" "$CONFIG_FILE"

so by the time the recipe is read by anything downstream, the in-tree name: value has been replaced. That means hightpt never reaches the benchmarking layer and the run completes correctly.

Why it's still worth fixing

The master config and perf-changelog uniformly use the maxtpt token:

  • .github/configs/nvidia-master.yaml references CONFIG_FILE=recipes/sglang/glm5/b200-fp8/1k1k/disagg/stp/1k1k_stp_maxtpt_0.yaml etc.
  • The token hightpt does not appear anywhere else in benchmarks/multi_node/srt-slurm-recipes/ — it is unique to these new GLM-5 files.

So anyone grepping the committed tree for the recipe by filename (maxtpt_0) will find a name: that disagrees with the filename, and anyone grepping by name (hightpt_0) will not find the matching config-key reference. It is a small but real traceability papercut.

Step-by-step proof

  1. Open benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/1k1k/disagg/stp/1k1k_stp_maxtpt_0.yaml. Line 1 reads name: b200-fp8-glm5_1k1k_hightpt_0.
  2. Open the matching low-latency file 1k1k_stp_lowlat_0.yaml. Line 1 reads name: b200-fp8-glm5_1k1k_lowlat_0 — token matches the filename.
  3. In .github/configs/nvidia-master.yaml, the glm5-fp8-b200-dynamo-sglang entry references the file as CONFIG_FILE=recipes/sglang/glm5/b200-fp8/1k1k/disagg/stp/1k1k_stp_maxtpt_0.yaml — using maxtpt, not hightpt.
  4. grep -r hightpt benchmarks/multi_node/srt-slurm-recipes/ returns only the 7 files added in this PR; no other recipe in the tree uses that token.
  5. Because the runner rewrites name: before the recipe is consumed downstream, no behavioural test will fail — the inconsistency only shows up when a human reads the YAML.

How to fix

Replace hightpt with maxtpt on the first line of each of the seven max-throughput recipes so the name: token matches the filename and the rest of the codebase:

sed -i 's/_hightpt_/_maxtpt_/' \\
  benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/{1k1k,8k1k}/disagg/stp/*_maxtpt_*.yaml

@github-actions
Copy link
Copy Markdown
Contributor

The new glm5-fp8-b200-dynamo-sglang config references
recipes/sglang/glm5/b200-fp8/... paths inside the cloned srt-slurm
repo, but srt-slurm@sa-submission-q2-2026 does not ship those
recipes. Add a parallel branch to the existing dsv4 overlay so the
vendored recipes under
benchmarks/multi_node/srt-slurm-recipes/sglang/glm5/b200-fp8/
are copied into the clone before srtctl apply runs.
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@Ankur-singh
Copy link
Copy Markdown
Collaborator Author

thanks for the contribution @Ankur-singh and NV team! any chance u can run full-sweep-enabled label instead of sweep-enabled?

+viz @Oseltamivir

Hi @Oseltamivir, trying to get this config to work. Will add the full-sweep-enabled label once I get a successful run😁

@github-actions
Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants