dsv4-b300-sglang: retry#1174
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
There was a problem hiding this comment.
LGTM, single env var addition gated to the existing DP-attention branch.
Extended reasoning...
Overview
This PR adds a single line to benchmarks/single_node/dsv4_fp4_b300_sglang.sh that exports SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1 inside the existing if [ "${DP_ATTENTION}" = "true" ] branch. It sits alongside the other SGLANG_OPT_* env vars already set for that path.
Security risks
None. This is a benchmark shell script that toggles an SGLang runtime tuning flag; it does not affect auth, networking, secrets, or production code paths.
Level of scrutiny
Minimal. This is a benchmark recipe tweak scoped to one config (DP-attention / conc=512). The change is mechanical and follows the established pattern of export SGLANG_OPT_*=... lines in the same conditional block. Non-DP runs are unaffected because the export is gated by the conditional.
Other factors
No bugs were found by the bug hunting system. The PR is small, self-contained, and trivially reversible.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rebase perf-changelog.yaml on latest main (preserving SemiAnalysisAI#1173 and SemiAnalysisAI#1174 entries) and append the MTP config entry for PR SemiAnalysisAI#1166.
* sglang dsv4 mtp * knob-driven recipe selection * self-contained mtp config; recipe via dp-attn * add mtp_1 (1/1/2) variant * knob-driven recipe selection * pin sglang image to mega_moe-capable digest * drop mtp_1 knob; align with PR #1158 image digest * update nvidia-master.yaml * fix: restore trailing newline in perf-changelog.yaml * fix: remove --use-chat-template and floor --max-running-requests at 8 The tokenizer for DSv4-Pro has no chat_template set, so --use-chat-template causes benchmark_serving.py to crash with ValueError. Remove it to align with dsv4_fp4_b300_sglang.sh. Also add a floor of 8 to --max-running-requests to match the base script and avoid too-low values at low concurrency. * perf-changelog: add dsv4-fp4-b300-sglang-mtp entry Rebase perf-changelog.yaml on latest main (preserving #1173 and #1174 entries) and append the MTP config entry for PR #1166. * dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5)" This reverts the EAGLE spec params back to (3, 1, 4): --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Bryan Shan <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com> Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline Reverts the matrix expansion (#1132), script edits (#1158, #1173, #1174), and changelog retriggers (#1178) on top of the original #1143 entry. Restores the script and config block to their #1143 state and clears all prior dsv4-fp4-b300-sglang changelog entries to start fresh. The dsv4-fp4-b300-sglang-mtp config (#1166) is untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: add pr-link for #1184 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: keep only the original #1143 entry, drop new entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…C split + DP-attn SWA tweak (#1185) * dsv4-fp4-b300-sglang: recipe-per-CONC split + DP-attn SWA tweak Squashes the cumulative changes from #1158 and #1174 into a single commit on top of the #1184 baseline. Excludes the iterative --max-running-requests floor from #1173. - Image pinned to lmsysorg/sglang:deepseek-v4-b300@sha256:26e116bd... - Search space: TP8/EP1 conc=1, TP4/EP1 conc=32, TP4/EP4 dp-attn conc=512 for both 1k1k and 8k1k - Script dispatches on DP_ATTENTION knob: TP-only (flashinfer_mxfp4) vs DP-attn (deepep + prefill-delayer + mega_moe env vars) - DP-attn path enables SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: add pr-link for #1185 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply suggestion from @Qiaolin-Yu --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Summary
SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1for the DP-attention path (conc=512) in the DSV4 B300 SGLang benchmark scriptTest plan
🤖 Generated with Claude Code