Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/configs/nvidia-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8759,7 +8759,7 @@ dsv4-fp4-gb300-dynamo-vllm:
dp-attn: true

dsv4-fp4-gb300-dynamo-sglang:
image: lmsysorg/sglang:nightly-dev-cu13-20260519-dbac4647
image: lmsysorg/sglang:nightly-dev-cu13-20260520-425dffbd
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Missing perf-changelog.yaml entry for this image bump. The immediately-preceding PR #1492 (20260518 → 20260519 bump of this same dsv4-fp4-gb300-dynamo-sglang config-key) added an explicit entry under that key, and other recent image-bump PRs (#1411, #1444, #1475) followed the same convention. Consider adding a parallel entry to keep the changelog consistent (also worth noting the SGLANG_OPT_FP8_WO_A_GEMM=0 removal, which is a functional change worth recording).

Extended reasoning...

What's missing

This PR bumps the SGLang image for the dsv4-fp4-gb300-dynamo-sglang config-key (in .github/configs/nvidia-master.yaml:8762) from nightly-dev-cu13-20260519-dbac4647 to nightly-dev-cu13-20260520-425dffbd and, alongside that, removes the SGLANG_OPT_FP8_WO_A_GEMM=0 workaround from six disagg-gb300-*.yaml recipes (PR description: "fixed in 0520 nightly via sgl-project/sglang#25805"). It does not add an entry to perf-changelog.yaml.

Why this is a convention break

The immediately-preceding PR for this same config-key — #1492 (commit 80c944e, 20260518 → 20260519) — added an explicit entry to perf-changelog.yaml at lines 3020–3024:

Update SGLang image from nightly-dev-cu13-20260518-c67b2870 to nightly-dev-cu13-20260519-dbac4647

The same pattern shows up across other recent image-bump PRs:

The current PR (fa55687) modifies 7 files (.github/configs/nvidia-master.yaml + six disagg-gb300-*.yaml recipes) but does not touch perf-changelog.yaml at all.

Step-by-step proof

  1. git show 80c944e --stat for PR Update dpskv4 GB300 non-MTP disagg SGLang image to nightly-20260519 #1492 shows perf-changelog.yaml | 14 ++++++++ — i.e. the 20260518→20260519 bump added a changelog entry.
  2. perf-changelog.yaml lines 3012–3024 still contain that entry under dsv4-fp4-gb300-dynamo-sglang.
  3. git show fa55687 --stat for the current PR lists 7 modified files: .github/configs/nvidia-master.yaml plus the six disagg-gb300-*.yaml recipes. perf-changelog.yaml is not in the list.
  4. The PR performs the exact same kind of change as Update dpskv4 GB300 non-MTP disagg SGLang image to nightly-20260519 #1492 (sequential nightly bump of the same key), plus an extra functional change (removing SGLANG_OPT_FP8_WO_A_GEMM=0 from prefill+decode environments in 6 recipes), which is arguably even more worth recording.

Impact

This is a documentation/observability concern, not a runtime bug — the recipes themselves will run fine. The missed entry only affects the historical perf-tracking trail for this config-key. Given that the previous bump (one day earlier, same author) did add the entry, this looks more like an oversight than an intentional skip.

Suggested fix

Add a perf-changelog.yaml entry under dsv4-fp4-gb300-dynamo-sglang along the lines of:

Update SGLang image from nightly-dev-cu13-20260519-dbac4647 to nightly-dev-cu13-20260520-425dffbd; remove SGLANG_OPT_FP8_WO_A_GEMM=0 workaround (fixed upstream in sgl-project/sglang#25805).

model: deepseek-ai/DeepSeek-V4-Pro
model-prefix: dsv4
runner: gb300-cw
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ name: "disagg-gb300-10p1d-dep4-dep16-14-c8192"

model:
path: "deepseek-v4-pro"
container: "lmsysorg/sglang:nightly-dev-cu13-20260519-dbac4647"
container: "lmsysorg/sglang:nightly-dev-cu13-20260520-425dffbd"
precision: "fp4"

dynamo:
Expand Down Expand Up @@ -94,7 +94,6 @@ backend:
SGLANG_LOG_FORWARD_ITERS: "1"
SGLANG_LOG_MS: "1"
SGLANG_REQUEST_STATE_WAIT_TIMEOUT: "60"
SGLANG_OPT_FP8_WO_A_GEMM: "0"

decode_environment:
PYTHONUNBUFFERED: "1"
Expand All @@ -119,7 +118,6 @@ backend:
SGLANG_LOG_FORWARD_ITERS: "1"
SGLANG_LOG_MS: "1"
SGLANG_REQUEST_STATE_WAIT_TIMEOUT: "60"
SGLANG_OPT_FP8_WO_A_GEMM: "0"
# is single-node only and corrupts results in 2-node decode setups.
Comment on lines 118 to 121
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The PR removes the SGLANG_OPT_FP8_WO_A_GEMM: "0" line in each decode_environment block but leaves the trailing comment # is single-node only and corrupts results in 2-node decode setups. behind. The orphan now reads as a subject-less sentence fragment beginning with "is" — the line above it is unrelated (SGLANG_REQUEST_STATE_WAIT_TIMEOUT in 5 yamls, SGLANG_OPT_SWA_RELEASE_LEAF_LOCK_AFTER_WINDOW in the tp4-tp4 yaml). Nit — the comment should be dropped along with the env var. Affects all 6 modified YAMLs (10p1d, 12p1d, 1p1d-dep, 1p1d-tp4-tp4, 4p1d, 8p1d).

Extended reasoning...

What the bug is

The PR removes SGLANG_OPT_FP8_WO_A_GEMM: "0" from both the prefill_environment and decode_environment blocks because the underlying issue was fixed upstream (sgl-project/sglang#25805). However, in the decode_environment block of every modified YAML the trailing comment immediately below the removed line — # is single-node only and corrupts results in 2-node decode setups. — is left behind.

Step-by-step proof (using disagg-gb300-10p1d-dep4-dep16-14-c8192.yaml)

Pre-PR snippet (decode_environment, ending lines):

    SGLANG_REQUEST_STATE_WAIT_TIMEOUT: "60"
    SGLANG_OPT_FP8_WO_A_GEMM: "0"
    # is single-node only and corrupts results in 2-node decode setups.

Post-PR snippet (lines 117-119 of the new file):

    SGLANG_REQUEST_STATE_WAIT_TIMEOUT: "60"
    # is single-node only and corrupts results in 2-node decode setups.

  sglang_config:

The comment now dangles directly under SGLANG_REQUEST_STATE_WAIT_TIMEOUT, which is a request-state timeout that has nothing to do with single-node/2-node decode correctness. A reader encountering this hunk has no way to know what "is single-node only" refers to — the subject of the sentence has been deleted.

Why this is a nit, not a functional bug

YAML treats # as a line comment, so parsing still succeeds and runtime behavior is unaffected. The damage is purely to future readability: the next person bumping the image (or chasing a decode-correctness issue) will see an unattributed warning about 2-node decode corruption and have no way to act on it. It also occurs in the same hunks the PR is already touching, so the cleanup cost is essentially zero.

Addressing the refutation

One verifier dismissed this as duplicate-of-already-refuted and "long-standing leftover commentary style." That framing is not quite right:

  1. The comment is not pre-existing in its current orphaned form — before this PR it was the trailing description of a real SGLANG_OPT_FP8_WO_A_GEMM: "0" line directly above it. The PR is what creates the orphan by removing the subject.
  2. A second verifier noted that the comment originated as the second line of a 2-line comment block about SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2 (still visible intact in sibling files like disagg-low-latency-1p1d-tp4-tp4-mtp.yaml). Either way, in these modified files it is now dangling.
  3. The PR is the natural moment to clean it up: the change is mechanically aligned with the diff hunks (just extend the - to include the comment line), no broader refactoring needed.

How to fix

In each of the 6 modified files, also remove the trailing # is single-node only and corrupts results in 2-node decode setups. line in decode_environment (lines ~120 in the 5 dep yamls, ~111 in disagg-gb300-1p1d-tp4-tp4-2-c1.yaml). No other change needed.


sglang_config:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ name: "disagg-gb300-12p1d-dep4-dep12-15-c21504"

model:
path: "deepseek-v4-pro"
container: "lmsysorg/sglang:nightly-dev-cu13-20260519-dbac4647"
container: "lmsysorg/sglang:nightly-dev-cu13-20260520-425dffbd"
precision: "fp4"

dynamo:
Expand Down Expand Up @@ -94,7 +94,6 @@ backend:
SGLANG_LOG_FORWARD_ITERS: "1"
SGLANG_LOG_MS: "1"
SGLANG_REQUEST_STATE_WAIT_TIMEOUT: "60"
SGLANG_OPT_FP8_WO_A_GEMM: "0"

decode_environment:
PYTHONUNBUFFERED: "1"
Expand All @@ -119,7 +118,6 @@ backend:
SGLANG_LOG_FORWARD_ITERS: "1"
SGLANG_LOG_MS: "1"
SGLANG_REQUEST_STATE_WAIT_TIMEOUT: "60"
SGLANG_OPT_FP8_WO_A_GEMM: "0"
# is single-node only and corrupts results in 2-node decode setups.

sglang_config:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ name: "disagg-gb300-1p1d-dep4-dep16-5-c1024"

model:
path: "deepseek-v4-pro"
container: "lmsysorg/sglang:nightly-dev-cu13-20260519-dbac4647"
container: "lmsysorg/sglang:nightly-dev-cu13-20260520-425dffbd"
precision: "fp4"

dynamo:
Expand Down Expand Up @@ -94,7 +94,6 @@ backend:
SGLANG_LOG_FORWARD_ITERS: "1"
SGLANG_LOG_MS: "1"
SGLANG_REQUEST_STATE_WAIT_TIMEOUT: "60"
SGLANG_OPT_FP8_WO_A_GEMM: "0"

decode_environment:
PYTHONUNBUFFERED: "1"
Expand All @@ -119,7 +118,6 @@ backend:
SGLANG_LOG_FORWARD_ITERS: "1"
SGLANG_LOG_MS: "1"
SGLANG_REQUEST_STATE_WAIT_TIMEOUT: "60"
SGLANG_OPT_FP8_WO_A_GEMM: "0"
# is single-node only and corrupts results in 2-node decode setups.

sglang_config:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ name: "disagg-gb300-1p1d-tp4-tp4-2-c1"

model:
path: "deepseek-v4-pro"
container: "lmsysorg/sglang:nightly-dev-cu13-20260519-dbac4647"
container: "lmsysorg/sglang:nightly-dev-cu13-20260520-425dffbd"
precision: "fp4"

# See ../1k1k/disagg-gb200-1p1d-dep8-tep8.yaml for the dynamo pin
Expand Down Expand Up @@ -93,7 +93,6 @@ backend:
SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
SGLANG_OPT_SWA_RELEASE_LEAF_LOCK_AFTER_WINDOW: "1"
SGLANG_OPT_FP8_WO_A_GEMM: "0"

decode_environment:
PYTHONUNBUFFERED: "1"
Expand All @@ -110,7 +109,6 @@ backend:
SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT: "100000"
SGLANG_DISAGGREGATION_WAITING_TIMEOUT: "100000"
SGLANG_OPT_SWA_RELEASE_LEAF_LOCK_AFTER_WINDOW: "1"
SGLANG_OPT_FP8_WO_A_GEMM: "0"
# is single-node only and corrupts results in 2-node decode setups.

sglang_config:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ name: "disagg-gb300-4p1d-dep4-dep16-8-c1024"

model:
path: "deepseek-v4-pro"
container: "lmsysorg/sglang:nightly-dev-cu13-20260519-dbac4647"
container: "lmsysorg/sglang:nightly-dev-cu13-20260520-425dffbd"
precision: "fp4"

dynamo:
Expand Down Expand Up @@ -94,7 +94,6 @@ backend:
SGLANG_LOG_FORWARD_ITERS: "1"
SGLANG_LOG_MS: "1"
SGLANG_REQUEST_STATE_WAIT_TIMEOUT: "60"
SGLANG_OPT_FP8_WO_A_GEMM: "0"

decode_environment:
PYTHONUNBUFFERED: "1"
Expand All @@ -119,7 +118,6 @@ backend:
SGLANG_LOG_FORWARD_ITERS: "1"
SGLANG_LOG_MS: "1"
SGLANG_REQUEST_STATE_WAIT_TIMEOUT: "60"
SGLANG_OPT_FP8_WO_A_GEMM: "0"
# is single-node only and corrupts results in 2-node decode setups.

sglang_config:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ name: "disagg-gb300-8p1d-dep4-dep16-12-c4096"

model:
path: "deepseek-v4-pro"
container: "lmsysorg/sglang:nightly-dev-cu13-20260519-dbac4647"
container: "lmsysorg/sglang:nightly-dev-cu13-20260520-425dffbd"
precision: "fp4"

dynamo:
Expand Down Expand Up @@ -94,7 +94,6 @@ backend:
SGLANG_LOG_FORWARD_ITERS: "1"
SGLANG_LOG_MS: "1"
SGLANG_REQUEST_STATE_WAIT_TIMEOUT: "60"
SGLANG_OPT_FP8_WO_A_GEMM: "0"

decode_environment:
PYTHONUNBUFFERED: "1"
Expand All @@ -119,7 +118,6 @@ backend:
SGLANG_LOG_FORWARD_ITERS: "1"
SGLANG_LOG_MS: "1"
SGLANG_REQUEST_STATE_WAIT_TIMEOUT: "60"
SGLANG_OPT_FP8_WO_A_GEMM: "0"
# is single-node only and corrupts results in 2-node decode setups.

sglang_config:
Expand Down
9 changes: 8 additions & 1 deletion perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3035,7 +3035,14 @@
- "Bump ATOM image to rocm/atom:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom20260511"
- "TP=4 shows +3.2% to +16.3% throughput improvement across 1k1k and 8k1k workloads (concurrency 4-256)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1411


- config-keys:
- dsv4-fp4-gb300-dynamo-sglang
description:
- "Update SGLang image from nightly-dev-cu13-20260519-dbac4647 to nightly-dev-cu13-20260520-425dffbd for all non-MTP disagg configs"
- "Remove SGLANG_OPT_FP8_WO_A_GEMM=0 workaround (topk_v2 crash fixed upstream in sgl-project/sglang#25805)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1528


- config-keys:
- qwen3.5-fp4-b300-sglang
Expand Down