Skip to content

Add B300 config: dsr1-fp8-sglang-mtp#1059

Merged
functionstackx merged 2 commits intomainfrom
claude/add-dsr1-fp8-b300-sglang-mtp
Apr 17, 2026
Merged

Add B300 config: dsr1-fp8-sglang-mtp#1059
functionstackx merged 2 commits intomainfrom
claude/add-dsr1-fp8-b300-sglang-mtp

Conversation

@functionstackx
Copy link
Copy Markdown
Contributor

Summary

  • Add dsr1-fp8-b300-sglang-mtp benchmark config and the corresponding benchmarks/single_node/dsr1_fp8_b300_mtp.sh launch script
  • At the time of submission, the SGLang DSR1 cookbook does not have a B300-specific recipe, so this reuses the existing DSR1 FP8 B200 SGLang MTP (EAGLE speculative decoding) recipe as-is until B300-specific tuning is available
  • Image bumped from B200's v0.5.9-cu130 to lmsysorg/sglang:v0.5.10.post1-cu130 to match the standard B300 SGLang image used by other B300 configs
  • Runner: b300, same TP=8 / concurrency 4-512 search-space and same MTP knobs (SPECULATIVE_NUM_STEPS=2, SPECULATIVE_DRAFT_TOKENS=3, SPECULATIVE_EAGLE_TOPK=1) as B200

Note: the URL cited in the request was the Kimi-K2.5 cookbook page; substituted the DSR1 cookbook URL which is the correct reference for this config.

Test plan

  • CI config validation passes
  • Run dsr1-fp8-b300-sglang-mtp single-node benchmark on a B300 node and confirm the SGLang server starts with EAGLE MTP, benchmark completes, and result file is produced

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Comment on lines +61 to +72
SGLANG_ENABLE_SPEC_V2=1

EVAL_CONTEXT_ARGS=""
if [ "${EVAL_ONLY}" = "true" ]; then
setup_eval_context
EVAL_CONTEXT_ARGS="--context-length $EVAL_MAX_MODEL_LEN"
fi
# Start GPU monitoring (power, temperature, clocks every second)
start_gpu_monitor

set -x
PYTHONNOUSERSITE=1 python3 -m sglang.launch_server \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 In dsr1_fp8_b300_mtp.sh line 61, SGLANG_ENABLE_SPEC_V2=1 is a plain shell variable assignment that is never exported or prefixed inline to the python3 command, so the SGLang server subprocess will not see this variable and will silently skip the EAGLE Speculative V2 optimization. Fix by adding it as an inline prefix: SGLANG_ENABLE_SPEC_V2=1 PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ... — exactly the pattern used correctly in qwen3.5_fp8_b300_mtp.sh.

Extended reasoning...

The bug: plain shell variable is not inherited by child processes

In Bash, a variable assignment like SGLANG_ENABLE_SPEC_V2=1 on its own line (without export or inline-prefix syntax) creates a shell-local variable. It is NOT added to the environment of any child process. The only way to pass an environment variable to a subprocess in Bash is either via export VAR=value before the command, or via the inline-prefix syntax VAR=value command args.... This is standard POSIX/Bash behavior.

The code path that triggers the bug

In benchmarks/single_node/dsr1_fp8_b300_mtp.sh (the new file introduced by this PR), line 61 contains:

SGLANG_ENABLE_SPEC_V2=1

This is a standalone assignment. Then at line 72, the server is launched as:

PYTHONNOUSERSITE=1 python3 -m sglang.launch_server     ...

SGLANG_ENABLE_SPEC_V2 is absent from the inline prefix, so it is not in the subprocess environment.

Why existing code does not prevent it

There is no set -a (auto-export) or export SGLANG_ENABLE_SPEC_V2=1 anywhere in the script. The variable simply sits in the shell's local namespace and is discarded when the shell exits. A superficial code review might miss this because the variable name is present in the file — the error is in how it is propagated, not whether it is set.

Impact on benchmark results

SGLANG_ENABLE_SPEC_V2 enables the SGLang Speculative V2 algorithm, which is the optimized EAGLE speculative decoding path for DeepSeek R1 MTP. Without this flag the server falls back to the less-optimized V1 path, potentially yielding lower throughput. Since this is an MTP benchmark config, the results would not reflect the intended configuration and would not be comparable to other configs (such as dsr1-fp8-b200-sglang-mtp) where the flag does take effect via the correct pattern.

How to fix it

Merge the SGLANG_ENABLE_SPEC_V2=1 into the inline prefix on the python3 command, removing the standalone assignment on line 61:

# Remove standalone line 61:
# SGLANG_ENABLE_SPEC_V2=1

# Change line 72 from:
PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ...
# to:
SGLANG_ENABLE_SPEC_V2=1 PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ...

This is exactly the pattern used correctly in benchmarks/single_node/qwen3.5_fp8_b300_mtp.sh (line 34) and benchmarks/single_node/qwen3.5_fp8_h200_mtp.sh.

Step-by-step proof

  1. Shell executes SGLANG_ENABLE_SPEC_V2=1 — this sets the variable in the shell's local environment only.
  2. Shell executes PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ... — Bash creates a child process with an environment that includes PYTHONNOUSERSITE=1 (because it is an inline prefix) plus the already-exported variables (e.g., SGLANG_ENABLE_JIT_DEEPGEMM via export on line 28), but NOT SGLANG_ENABLE_SPEC_V2 because it was never exported.
  3. Inside the Python process, os.environ.get('SGLANG_ENABLE_SPEC_V2') returns None, so SGLang behaves as if the flag was never set.
  4. Contrast with qwen3.5_fp8_b300_mtp.sh: SGLANG_ENABLE_SPEC_V2=1 PYTHONNOUSERSITE=1 python3 ... — both variables appear as inline prefixes, both are present in the child process environment.

Comment thread perf-changelog.yaml
Comment on lines +1413 to +1415
- "EAGLE speculative decoding with MTP, TP=8, concurrency 4-512 for 1k1k and 8k1k"
- "At the time of submission, https://cookbook.sglang.io/autoregressive/DeepSeek/DeepSeek-R1 does not have a B300-specific recipe, so this reuses the existing DSR1 FP8 B200 SGLang MTP recipe as-is"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1059
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new dsr1-fp8-b300-sglang-mtp entry in perf-changelog.yaml uses a placeholder XXXX in its pr-link field instead of the actual PR number 1059. This makes the changelog entry untraceable back to its source PR; the correct link should be https://github.com/SemiAnalysisAI/InferenceX/pull/1059.

Extended reasoning...

What the bug is and how it manifests

The last entry added to perf-changelog.yaml in this PR sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. The XXXX is a template placeholder that was never replaced with the actual PR number. The PR number (1059) was known at the time of submission, as it is this very PR.

The specific code path that triggers it

The entry appears at lines 1413–1415 of perf-changelog.yaml (the last 12 lines of the file as committed in b7d1595). The pr-link field reads:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

Why existing code doesn't prevent it

There is no automated validation in the CI workflows that checks pr-link values for placeholder text. The changelog is maintained by convention, so a typo or forgotten substitution passes all checks silently.

What the impact would be

Anyone reading the changelog entry for dsr1-fp8-b300-sglang-mtp cannot navigate directly to the PR that introduced it. Traceability from changelog to PR is broken, making it harder to audit the history of benchmark changes or understand the rationale behind this configuration.

How to fix it

Replace XXXX with 1059:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1059

Step-by-step proof

  1. PR Add B300 config: dsr1-fp8-sglang-mtp #1059 adds the dsr1-fp8-b300-sglang-mtp entry to perf-changelog.yaml.
  2. Commit b7d1595 is the merge commit for this PR.
  3. Running tail -20 perf-changelog.yaml on HEAD confirms the final pr-link value is .../pull/XXXX.
  4. Every other resolved entry in the file uses a real PR number (e.g., /pull/1049, /pull/1048, /pull/1035, etc.).
  5. While a handful of other entries also use XXX or XXXX (for PRs whose numbers may have been unknown at submission time), this entry's PR number is definitively known: it is 1059.
  6. The fix is a one-character change: replace XXXX with 1059.

functionstackx and others added 2 commits April 17, 2026 07:25
At the time of submission, the SGLang DSR1 cookbook
(https://cookbook.sglang.io/autoregressive/DeepSeek/DeepSeek-R1) does
not have a B300-specific recipe, so this config reuses the existing
DSR1 FP8 B200 SGLang MTP recipe as-is until B300-specific tuning is
available. Image bumped to v0.5.10.post1-cu130 to match the standard
B300 SGLang image used by other B300 configs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@functionstackx functionstackx force-pushed the claude/add-dsr1-fp8-b300-sglang-mtp branch from ee5fabc to 70b8774 Compare April 17, 2026 11:26
@functionstackx functionstackx merged commit f8543f9 into main Apr 17, 2026
5 checks passed
@functionstackx functionstackx deleted the claude/add-dsr1-fp8-b300-sglang-mtp branch April 17, 2026 11:26
@claude claude bot mentioned this pull request Apr 20, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant