dsv4-b300-sglang: retry by yhyang201 · Pull Request #1174 · SemiAnalysisAI/InferenceX

yhyang201 · 2026-04-26T10:51:42Z

Summary

Set SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1 for the DP-attention path (conc=512) in the DSV4 B300 SGLang benchmark script

Test plan

Verify conc=512 DP-attention run picks up the new env var (visible in the SGLANG_* env dump in server.log)
Verify non-DP runs (conc=1, 32) are unaffected

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-26T10:51:52Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude

LGTM, single env var addition gated to the existing DP-attention branch.

Extended reasoning...

Overview

This PR adds a single line to benchmarks/single_node/dsv4_fp4_b300_sglang.sh that exports SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1 inside the existing if [ "${DP_ATTENTION}" = "true" ] branch. It sits alongside the other SGLANG_OPT_* env vars already set for that path.

Security risks

None. This is a benchmark shell script that toggles an SGLang runtime tuning flag; it does not affect auth, networking, secrets, or production code paths.

Level of scrutiny

Minimal. This is a benchmark recipe tweak scoped to one config (DP-attention / conc=512). The change is mechanical and follows the established pattern of export SGLANG_OPT_*=... lines in the same conditional block. Non-DP runs are unaffected because the export is gated by the conditional.

Other factors

No bugs were found by the bug hunting system. The PR is small, self-contained, and trivially reversible.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rebase perf-changelog.yaml on latest main (preserving SemiAnalysisAI#1173 and SemiAnalysisAI#1174 entries) and append the MTP config entry for PR SemiAnalysisAI#1166.

* sglang dsv4 mtp * knob-driven recipe selection * self-contained mtp config; recipe via dp-attn * add mtp_1 (1/1/2) variant * knob-driven recipe selection * pin sglang image to mega_moe-capable digest * drop mtp_1 knob; align with PR #1158 image digest * update nvidia-master.yaml * fix: restore trailing newline in perf-changelog.yaml * fix: remove --use-chat-template and floor --max-running-requests at 8 The tokenizer for DSv4-Pro has no chat_template set, so --use-chat-template causes benchmark_serving.py to crash with ValueError. Remove it to align with dsv4_fp4_b300_sglang.sh. Also add a floor of 8 to --max-running-requests to match the base script and avoid too-low values at low concurrency. * perf-changelog: add dsv4-fp4-b300-sglang-mtp entry Rebase perf-changelog.yaml on latest main (preserving #1173 and #1174 entries) and append the MTP config entry for PR #1166. * dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5)" This reverts the EAGLE spec params back to (3, 1, 4): --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Bryan Shan <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com> Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline Reverts the matrix expansion (#1132), script edits (#1158, #1173, #1174), and changelog retriggers (#1178) on top of the original #1143 entry. Restores the script and config block to their #1143 state and clears all prior dsv4-fp4-b300-sglang changelog entries to start fresh. The dsv4-fp4-b300-sglang-mtp config (#1166) is untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: add pr-link for #1184 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: keep only the original #1143 entry, drop new entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@Qiaolin-Yu

…C split + DP-attn SWA tweak (#1185) * dsv4-fp4-b300-sglang: recipe-per-CONC split + DP-attn SWA tweak Squashes the cumulative changes from #1158 and #1174 into a single commit on top of the #1184 baseline. Excludes the iterative --max-running-requests floor from #1173. - Image pinned to lmsysorg/sglang:deepseek-v4-b300@sha256:26e116bd... - Search space: TP8/EP1 conc=1, TP4/EP1 conc=32, TP4/EP4 dp-attn conc=512 for both 1k1k and 8k1k - Script dispatches on DP_ATTENTION knob: TP-only (flashinfer_mxfp4) vs DP-attn (deepep + prefill-delayer + mega_moe env vars) - DP-attn path enables SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: add pr-link for #1185 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply suggestion from @Qiaolin-Yu --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com>

dsv4-b300-sglang: enable SWA_EVICT_DROP_PAGE_MARGIN for DP-attention

ef0ec6c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yhyang201 requested a review from a team April 26, 2026 10:51

github-project-automation Bot added this to InferenceMAX Board Apr 26, 2026

yhyang201 changed the title ~~dsv4-b300-sglang: enable SWA_EVICT_DROP_PAGE_MARGIN for DP-attention~~ dsv4-b300-sglang: retry Apr 26, 2026

claude Bot reviewed Apr 26, 2026

View reviewed changes

yhyang201 and others added 2 commits April 26, 2026 18:53

perf-changelog: add dsv4-b300-sglang entry for #1174

d96eb71

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

perf-changelog: update description for #1174

807b11a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Qiaolin-Yu approved these changes Apr 26, 2026

View reviewed changes

Qiaolin-Yu merged commit 82bc500 into main Apr 26, 2026
17 checks passed

Qiaolin-Yu deleted the dsv4-b300-sglang-evict-drop-margin branch April 26, 2026 10:56

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 26, 2026

claude Bot mentioned this pull request Apr 26, 2026

dsv4-b300-sglang: add conc=2048 recipe & MTP benchmark #1176

Closed

3 tasks

cquil11 mentioned this pull request Apr 26, 2026

dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline #1184

Merged

2 tasks

cquil11 mentioned this pull request Apr 26, 2026

[co-authored with sglang community maintainers leads at radixark] [NVIDIA][SGLang][redo PR] B300 DeepSeek v4 FP4 SGLang: recipe-per-CONC split + DP-attn SWA tweak #1185

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dsv4-b300-sglang: retry#1174

dsv4-b300-sglang: retry#1174
Qiaolin-Yu merged 3 commits intomainfrom
dsv4-b300-sglang-evict-drop-margin

yhyang201 commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yhyang201 commented Apr 26, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants