Skip to content

dsv4-b300-sglang: retry#1174

Merged
Qiaolin-Yu merged 3 commits intomainfrom
dsv4-b300-sglang-evict-drop-margin
Apr 26, 2026
Merged

dsv4-b300-sglang: retry#1174
Qiaolin-Yu merged 3 commits intomainfrom
dsv4-b300-sglang-evict-drop-margin

Conversation

@yhyang201
Copy link
Copy Markdown
Collaborator

Summary

  • Set SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1 for the DP-attention path (conc=512) in the DSV4 B300 SGLang benchmark script

Test plan

  • Verify conc=512 DP-attention run picks up the new env var (visible in the SGLANG_* env dump in server.log)
  • Verify non-DP runs (conc=1, 32) are unaffected

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@yhyang201 yhyang201 changed the title dsv4-b300-sglang: enable SWA_EVICT_DROP_PAGE_MARGIN for DP-attention dsv4-b300-sglang: retry Apr 26, 2026
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, single env var addition gated to the existing DP-attention branch.

Extended reasoning...

Overview

This PR adds a single line to benchmarks/single_node/dsv4_fp4_b300_sglang.sh that exports SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1 inside the existing if [ "${DP_ATTENTION}" = "true" ] branch. It sits alongside the other SGLANG_OPT_* env vars already set for that path.

Security risks

None. This is a benchmark shell script that toggles an SGLang runtime tuning flag; it does not affect auth, networking, secrets, or production code paths.

Level of scrutiny

Minimal. This is a benchmark recipe tweak scoped to one config (DP-attention / conc=512). The change is mechanical and follows the established pattern of export SGLANG_OPT_*=... lines in the same conditional block. Non-DP runs are unaffected because the export is gated by the conditional.

Other factors

No bugs were found by the bug hunting system. The PR is small, self-contained, and trivially reversible.

yhyang201 and others added 2 commits April 26, 2026 18:53
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Qiaolin-Yu Qiaolin-Yu merged commit 82bc500 into main Apr 26, 2026
17 checks passed
@Qiaolin-Yu Qiaolin-Yu deleted the dsv4-b300-sglang-evict-drop-margin branch April 26, 2026 10:56
yhyang201 added a commit to Qiaolin-Yu/InferenceX that referenced this pull request Apr 26, 2026
Rebase perf-changelog.yaml on latest main (preserving SemiAnalysisAI#1173 and SemiAnalysisAI#1174
entries) and append the MTP config entry for PR SemiAnalysisAI#1166.
Qiaolin-Yu pushed a commit that referenced this pull request Apr 26, 2026
* sglang dsv4 mtp

* knob-driven recipe selection

* self-contained mtp config; recipe via dp-attn

* add mtp_1 (1/1/2) variant

* knob-driven recipe selection

* pin sglang image to mega_moe-capable digest

* drop mtp_1 knob; align with PR #1158 image digest

* update nvidia-master.yaml

* fix: restore trailing newline in perf-changelog.yaml

* fix: remove --use-chat-template and floor --max-running-requests at 8

The tokenizer for DSv4-Pro has no chat_template set, so
--use-chat-template causes benchmark_serving.py to crash with
ValueError. Remove it to align with dsv4_fp4_b300_sglang.sh.

Also add a floor of 8 to --max-running-requests to match the
base script and avoid too-low values at low concurrency.

* perf-changelog: add dsv4-fp4-b300-sglang-mtp entry

Rebase perf-changelog.yaml on latest main (preserving #1173 and #1174
entries) and append the MTP config entry for PR #1166.

* dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert "dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5)"

This reverts the EAGLE spec params back to (3, 1, 4):
  --speculative-num-steps 3
  --speculative-eagle-topk 1
  --speculative-num-draft-tokens 4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Bryan Shan <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
cquil11 added a commit that referenced this pull request Apr 26, 2026
* dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline

Reverts the matrix expansion (#1132), script edits (#1158, #1173, #1174),
and changelog retriggers (#1178) on top of the original #1143 entry.
Restores the script and config block to their #1143 state and clears
all prior dsv4-fp4-b300-sglang changelog entries to start fresh.

The dsv4-fp4-b300-sglang-mtp config (#1166) is untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: add pr-link for #1184

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: keep only the original #1143 entry, drop new entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cquil11 added a commit that referenced this pull request Apr 26, 2026
…C split + DP-attn SWA tweak (#1185)

* dsv4-fp4-b300-sglang: recipe-per-CONC split + DP-attn SWA tweak

Squashes the cumulative changes from #1158 and #1174 into a single
commit on top of the #1184 baseline. Excludes the iterative
--max-running-requests floor from #1173.

- Image pinned to lmsysorg/sglang:deepseek-v4-b300@sha256:26e116bd...
- Search space: TP8/EP1 conc=1, TP4/EP1 conc=32, TP4/EP4 dp-attn
  conc=512 for both 1k1k and 8k1k
- Script dispatches on DP_ATTENTION knob: TP-only (flashinfer_mxfp4)
  vs DP-attn (deepep + prefill-delayer + mega_moe env vars)
- DP-attn path enables SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: add pr-link for #1185

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Apply suggestion from @Qiaolin-Yu

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants