[NVIDIA] Fix vllm & sglang b200 updated containers by kedarpotdar-nv · Pull Request #4 · SemiAnalysisAI/InferenceX

kedarpotdar-nv · 2025-09-03T23:17:16Z

No description provided.

Modify GB200 runs to use test partition

…DSA state-index path amd-master.yaml - Image: rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-0402 -> lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260523 (matches qwen3.5-fp8-mi355x-sglang-disagg; the older 0.5.9 image is no longer the reference build for hybrid-attention disagg models on MI355X.) - Scenarios: collapse the four legacy "top/middle/bottom/small-scale" search-spaces per ISL into a single 1P+1D TP=8 EP=1 dp-attn=false entry with the standard conc-list [8, 16, 32, 64, 128, 256, 512] for both 1k1k and 8k1k. dp-attn=false avoids the fused_moe_triton/layer.py:209 shared-slot assertion that --enable-dp-attention + --moe-a2a-backend mori triggers for GLM-5 (256 routed + 1 shared expert; (256-1) % 8 = 7 != 0). The collapsed layout mirrors the qwen3.5-fp8-mi355x-sglang-disagg shape so the same CI matrix-expansion logic applies to both. patches/mori_conn.py - Add patch #4: rank + length normalization in MoriKVReceiver._send_swa_dsa_state, immediately before the group_concurrent_contiguous call. For GLM-5 (single DSA component), upstream hands dst_state_indices as a 2-D (1, N) array while src_state_indices is 1-D length 1; the existing [:common_len] slice operates only on the outer axis, leaving the rank mismatched. np.diff then produces (1, N-1) vs (0,), which can't broadcast and crashes with "operands could not be broadcast together with shapes (1,12) (0,)". The fix ravels both indices to 1-D and re-truncates to common length so np.diff outputs compatible 1-D arrays. One-shot log gates the warning to once per receiver class. - Verified end-to-end: glm5-fp8-mi355x-sglang-disagg gsm8k flexible-extract = 0.9704 +/- 0.0047 glm5-fp8-mi355x-sglang-disagg gsm8k strict-match = 0.9712 +/- 0.0046 qwen3.5-fp8-mi355x-sglang-disagg gsm8k (regression) = 0.9780 +/- 0.004 Patch #4 fires zero times on the Qwen3.5 Mamba path (it lives inside _send_swa_dsa_state, never called for Mamba); patches #1-#3 behavior is unchanged. patches/README.md - Document patch #4 alongside the existing three. Cross-link the full bug analysis at scripts/sglang_disagg/docs_glm5/01-bug-analysis.md and the gsm8k verification at scripts/sglang_disagg/docs_glm5/02-fix-and-verification.md.

kedarpotdar-nv added 6 commits September 3, 2025 09:21

fix vllm launch

4f9ee5e

re-enable dsr1 and update image ID to re-fetch

7e6577b

rollback dsr1

22c9710

fix dsr1, remove 70b

492de4c

readd 70b

2e21fe9

re-add other tests

594bc88

kimbochen merged commit 75ec29c into main Sep 4, 2025

kimbochen deleted the fix-vllm-b200 branch September 4, 2025 00:42

claude-code-infmax Bot mentioned this pull request Jan 17, 2026

[NVIDIA] fix: update ep metadata in gb200 dynamo sglang configs to match comments #486

Merged

jthomson04 pushed a commit to jthomson04/InferenceMAX that referenced this pull request Jan 21, 2026

Merge pull request SemiAnalysisAI#4 from NVIDIA/test-runner-gb200

853761f

Modify GB200 runs to use test partition

claude-code-infmax Bot mentioned this pull request Jan 21, 2026

[NV] Update DSR1 GB200 FP4 Disagg Submission #510

Merged

cquil11 added the NVIDIA label Apr 8, 2026

cquil11 changed the title ~~Fix vllm & sglang b200 updated containers~~ [NVIDIA] Fix vllm & sglang b200 updated containers Apr 8, 2026

claude Bot mentioned this pull request May 18, 2026

[Klaud Cold] Add qwen3.5-fp8-mi325x-sglang-mtp recipe #1484

Merged

2 tasks

Oseltamivir added a commit that referenced this pull request May 26, 2026

T4 retrigger #4: runner pool freed

2f6aa0c

claude Bot mentioned this pull request May 28, 2026

short term patch: GLM-5 disagg: port MoRI conn.py overlay to fix PD startup crash #1578

Merged

4 tasks

cursor Bot mentioned this pull request May 28, 2026

[MoRI short term temp patch] GLM-5 FP8 MI355X SGLang disaggregated #1572

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Fix vllm & sglang b200 updated containers#4

[NVIDIA] Fix vllm & sglang b200 updated containers#4
kimbochen merged 6 commits into
mainfrom
fix-vllm-b200

kedarpotdar-nv commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kedarpotdar-nv commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants