Add B300 config: dsr1-fp8-sglang-mtp#1059
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| SGLANG_ENABLE_SPEC_V2=1 | ||
|
|
||
| EVAL_CONTEXT_ARGS="" | ||
| if [ "${EVAL_ONLY}" = "true" ]; then | ||
| setup_eval_context | ||
| EVAL_CONTEXT_ARGS="--context-length $EVAL_MAX_MODEL_LEN" | ||
| fi | ||
| # Start GPU monitoring (power, temperature, clocks every second) | ||
| start_gpu_monitor | ||
|
|
||
| set -x | ||
| PYTHONNOUSERSITE=1 python3 -m sglang.launch_server \ |
There was a problem hiding this comment.
🔴 In dsr1_fp8_b300_mtp.sh line 61, SGLANG_ENABLE_SPEC_V2=1 is a plain shell variable assignment that is never exported or prefixed inline to the python3 command, so the SGLang server subprocess will not see this variable and will silently skip the EAGLE Speculative V2 optimization. Fix by adding it as an inline prefix: SGLANG_ENABLE_SPEC_V2=1 PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ... — exactly the pattern used correctly in qwen3.5_fp8_b300_mtp.sh.
Extended reasoning...
The bug: plain shell variable is not inherited by child processes
In Bash, a variable assignment like SGLANG_ENABLE_SPEC_V2=1 on its own line (without export or inline-prefix syntax) creates a shell-local variable. It is NOT added to the environment of any child process. The only way to pass an environment variable to a subprocess in Bash is either via export VAR=value before the command, or via the inline-prefix syntax VAR=value command args.... This is standard POSIX/Bash behavior.
The code path that triggers the bug
In benchmarks/single_node/dsr1_fp8_b300_mtp.sh (the new file introduced by this PR), line 61 contains:
SGLANG_ENABLE_SPEC_V2=1This is a standalone assignment. Then at line 72, the server is launched as:
PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ...SGLANG_ENABLE_SPEC_V2 is absent from the inline prefix, so it is not in the subprocess environment.
Why existing code does not prevent it
There is no set -a (auto-export) or export SGLANG_ENABLE_SPEC_V2=1 anywhere in the script. The variable simply sits in the shell's local namespace and is discarded when the shell exits. A superficial code review might miss this because the variable name is present in the file — the error is in how it is propagated, not whether it is set.
Impact on benchmark results
SGLANG_ENABLE_SPEC_V2 enables the SGLang Speculative V2 algorithm, which is the optimized EAGLE speculative decoding path for DeepSeek R1 MTP. Without this flag the server falls back to the less-optimized V1 path, potentially yielding lower throughput. Since this is an MTP benchmark config, the results would not reflect the intended configuration and would not be comparable to other configs (such as dsr1-fp8-b200-sglang-mtp) where the flag does take effect via the correct pattern.
How to fix it
Merge the SGLANG_ENABLE_SPEC_V2=1 into the inline prefix on the python3 command, removing the standalone assignment on line 61:
# Remove standalone line 61:
# SGLANG_ENABLE_SPEC_V2=1
# Change line 72 from:
PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ...
# to:
SGLANG_ENABLE_SPEC_V2=1 PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ...This is exactly the pattern used correctly in benchmarks/single_node/qwen3.5_fp8_b300_mtp.sh (line 34) and benchmarks/single_node/qwen3.5_fp8_h200_mtp.sh.
Step-by-step proof
- Shell executes
SGLANG_ENABLE_SPEC_V2=1— this sets the variable in the shell's local environment only. - Shell executes
PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ...— Bash creates a child process with an environment that includesPYTHONNOUSERSITE=1(because it is an inline prefix) plus the already-exported variables (e.g.,SGLANG_ENABLE_JIT_DEEPGEMMviaexporton line 28), but NOTSGLANG_ENABLE_SPEC_V2because it was never exported. - Inside the Python process,
os.environ.get('SGLANG_ENABLE_SPEC_V2')returnsNone, so SGLang behaves as if the flag was never set. - Contrast with
qwen3.5_fp8_b300_mtp.sh:SGLANG_ENABLE_SPEC_V2=1 PYTHONNOUSERSITE=1 python3 ...— both variables appear as inline prefixes, both are present in the child process environment.
| - "EAGLE speculative decoding with MTP, TP=8, concurrency 4-512 for 1k1k and 8k1k" | ||
| - "At the time of submission, https://cookbook.sglang.io/autoregressive/DeepSeek/DeepSeek-R1 does not have a B300-specific recipe, so this reuses the existing DSR1 FP8 B200 SGLang MTP recipe as-is" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1059 |
There was a problem hiding this comment.
🟡 The new dsr1-fp8-b300-sglang-mtp entry in perf-changelog.yaml uses a placeholder XXXX in its pr-link field instead of the actual PR number 1059. This makes the changelog entry untraceable back to its source PR; the correct link should be https://github.com/SemiAnalysisAI/InferenceX/pull/1059.
Extended reasoning...
What the bug is and how it manifests
The last entry added to perf-changelog.yaml in this PR sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. The XXXX is a template placeholder that was never replaced with the actual PR number. The PR number (1059) was known at the time of submission, as it is this very PR.
The specific code path that triggers it
The entry appears at lines 1413–1415 of perf-changelog.yaml (the last 12 lines of the file as committed in b7d1595). The pr-link field reads:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXXWhy existing code doesn't prevent it
There is no automated validation in the CI workflows that checks pr-link values for placeholder text. The changelog is maintained by convention, so a typo or forgotten substitution passes all checks silently.
What the impact would be
Anyone reading the changelog entry for dsr1-fp8-b300-sglang-mtp cannot navigate directly to the PR that introduced it. Traceability from changelog to PR is broken, making it harder to audit the history of benchmark changes or understand the rationale behind this configuration.
How to fix it
Replace XXXX with 1059:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1059Step-by-step proof
- PR Add B300 config: dsr1-fp8-sglang-mtp #1059 adds the
dsr1-fp8-b300-sglang-mtpentry toperf-changelog.yaml. - Commit
b7d1595is the merge commit for this PR. - Running
tail -20 perf-changelog.yamlon HEAD confirms the finalpr-linkvalue is.../pull/XXXX. - Every other resolved entry in the file uses a real PR number (e.g.,
/pull/1049,/pull/1048,/pull/1035, etc.). - While a handful of other entries also use
XXXorXXXX(for PRs whose numbers may have been unknown at submission time), this entry's PR number is definitively known: it is 1059. - The fix is a one-character change: replace
XXXXwith1059.
At the time of submission, the SGLang DSR1 cookbook (https://cookbook.sglang.io/autoregressive/DeepSeek/DeepSeek-R1) does not have a B300-specific recipe, so this config reuses the existing DSR1 FP8 B200 SGLang MTP recipe as-is until B300-specific tuning is available. Image bumped to v0.5.10.post1-cu130 to match the standard B300 SGLang image used by other B300 configs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ee5fabc to
70b8774
Compare
Summary
dsr1-fp8-b300-sglang-mtpbenchmark config and the correspondingbenchmarks/single_node/dsr1_fp8_b300_mtp.shlaunch scriptv0.5.9-cu130tolmsysorg/sglang:v0.5.10.post1-cu130to match the standard B300 SGLang image used by other B300 configsb300, same TP=8 / concurrency 4-512 search-space and same MTP knobs (SPECULATIVE_NUM_STEPS=2,SPECULATIVE_DRAFT_TOKENS=3,SPECULATIVE_EAGLE_TOPK=1) as B200Test plan
dsr1-fp8-b300-sglang-mtpsingle-node benchmark on a B300 node and confirm the SGLang server starts with EAGLE MTP, benchmark completes, and result file is produced🤖 Generated with Claude Code