Add B300 config: dsr1-fp8-sglang-mtp by functionstackx · Pull Request #1059 · SemiAnalysisAI/InferenceX

functionstackx · 2026-04-17T08:45:36Z

Summary

Add dsr1-fp8-b300-sglang-mtp benchmark config and the corresponding benchmarks/single_node/dsr1_fp8_b300_mtp.sh launch script
At the time of submission, the SGLang DSR1 cookbook does not have a B300-specific recipe, so this reuses the existing DSR1 FP8 B200 SGLang MTP (EAGLE speculative decoding) recipe as-is until B300-specific tuning is available
Image bumped from B200's v0.5.9-cu130 to lmsysorg/sglang:v0.5.10.post1-cu130 to match the standard B300 SGLang image used by other B300 configs
Runner: b300, same TP=8 / concurrency 4-512 search-space and same MTP knobs (SPECULATIVE_NUM_STEPS=2, SPECULATIVE_DRAFT_TOKENS=3, SPECULATIVE_EAGLE_TOPK=1) as B200

Note: the URL cited in the request was the Kimi-K2.5 cookbook page; substituted the DSR1 cookbook URL which is the correct reference for this config.

Test plan

CI config validation passes
Run dsr1-fp8-b300-sglang-mtp single-node benchmark on a B300 node and confirm the SGLang server starts with EAGLE MTP, benchmark completes, and result file is produced

🤖 Generated with Claude Code

github-actions · 2026-04-17T08:45:46Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-04-17T08:52:32Z

+SGLANG_ENABLE_SPEC_V2=1
+
+EVAL_CONTEXT_ARGS=""
+if [ "${EVAL_ONLY}" = "true" ]; then
+    setup_eval_context
+    EVAL_CONTEXT_ARGS="--context-length $EVAL_MAX_MODEL_LEN"
+fi
+# Start GPU monitoring (power, temperature, clocks every second)
+start_gpu_monitor
+
+set -x
+PYTHONNOUSERSITE=1 python3 -m sglang.launch_server \


🔴 In dsr1_fp8_b300_mtp.sh line 61, SGLANG_ENABLE_SPEC_V2=1 is a plain shell variable assignment that is never exported or prefixed inline to the python3 command, so the SGLang server subprocess will not see this variable and will silently skip the EAGLE Speculative V2 optimization. Fix by adding it as an inline prefix: SGLANG_ENABLE_SPEC_V2=1 PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ... — exactly the pattern used correctly in qwen3.5_fp8_b300_mtp.sh.

Extended reasoning...

The bug: plain shell variable is not inherited by child processes

In Bash, a variable assignment like SGLANG_ENABLE_SPEC_V2=1 on its own line (without export or inline-prefix syntax) creates a shell-local variable. It is NOT added to the environment of any child process. The only way to pass an environment variable to a subprocess in Bash is either via export VAR=value before the command, or via the inline-prefix syntax VAR=value command args.... This is standard POSIX/Bash behavior.

The code path that triggers the bug

In benchmarks/single_node/dsr1_fp8_b300_mtp.sh (the new file introduced by this PR), line 61 contains:

SGLANG_ENABLE_SPEC_V2=1

This is a standalone assignment. Then at line 72, the server is launched as:

PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ...

SGLANG_ENABLE_SPEC_V2 is absent from the inline prefix, so it is not in the subprocess environment.

Why existing code does not prevent it

There is no set -a (auto-export) or export SGLANG_ENABLE_SPEC_V2=1 anywhere in the script. The variable simply sits in the shell's local namespace and is discarded when the shell exits. A superficial code review might miss this because the variable name is present in the file — the error is in how it is propagated, not whether it is set.

Impact on benchmark results

SGLANG_ENABLE_SPEC_V2 enables the SGLang Speculative V2 algorithm, which is the optimized EAGLE speculative decoding path for DeepSeek R1 MTP. Without this flag the server falls back to the less-optimized V1 path, potentially yielding lower throughput. Since this is an MTP benchmark config, the results would not reflect the intended configuration and would not be comparable to other configs (such as dsr1-fp8-b200-sglang-mtp) where the flag does take effect via the correct pattern.

How to fix it

Merge the SGLANG_ENABLE_SPEC_V2=1 into the inline prefix on the python3 command, removing the standalone assignment on line 61:

# Remove standalone line 61: # SGLANG_ENABLE_SPEC_V2=1 # Change line 72 from: PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ... # to: SGLANG_ENABLE_SPEC_V2=1 PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ...

This is exactly the pattern used correctly in benchmarks/single_node/qwen3.5_fp8_b300_mtp.sh (line 34) and benchmarks/single_node/qwen3.5_fp8_h200_mtp.sh.

Step-by-step proof

Shell executes SGLANG_ENABLE_SPEC_V2=1 — this sets the variable in the shell's local environment only.

Shell executes PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ... — Bash creates a child process with an environment that includes PYTHONNOUSERSITE=1 (because it is an inline prefix) plus the already-exported variables (e.g., SGLANG_ENABLE_JIT_DEEPGEMM via export on line 28), but NOT SGLANG_ENABLE_SPEC_V2 because it was never exported.

Inside the Python process, os.environ.get('SGLANG_ENABLE_SPEC_V2') returns None, so SGLang behaves as if the flag was never set.

Contrast with qwen3.5_fp8_b300_mtp.sh: SGLANG_ENABLE_SPEC_V2=1 PYTHONNOUSERSITE=1 python3 ... — both variables appear as inline prefixes, both are present in the child process environment.

claude · 2026-04-17T08:52:33Z

+    - "EAGLE speculative decoding with MTP, TP=8, concurrency 4-512 for 1k1k and 8k1k"
+    - "At the time of submission, https://cookbook.sglang.io/autoregressive/DeepSeek/DeepSeek-R1 does not have a B300-specific recipe, so this reuses the existing DSR1 FP8 B200 SGLang MTP recipe as-is"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1059


🟡 The new dsr1-fp8-b300-sglang-mtp entry in perf-changelog.yaml uses a placeholder XXXX in its pr-link field instead of the actual PR number 1059. This makes the changelog entry untraceable back to its source PR; the correct link should be https://github.com/SemiAnalysisAI/InferenceX/pull/1059.

Extended reasoning...

What the bug is and how it manifests

The last entry added to perf-changelog.yaml in this PR sets pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. The XXXX is a template placeholder that was never replaced with the actual PR number. The PR number (1059) was known at the time of submission, as it is this very PR.

The specific code path that triggers it

The entry appears at lines 1413–1415 of perf-changelog.yaml (the last 12 lines of the file as committed in b7d1595). The pr-link field reads:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

Why existing code doesn't prevent it

There is no automated validation in the CI workflows that checks pr-link values for placeholder text. The changelog is maintained by convention, so a typo or forgotten substitution passes all checks silently.

What the impact would be

Anyone reading the changelog entry for dsr1-fp8-b300-sglang-mtp cannot navigate directly to the PR that introduced it. Traceability from changelog to PR is broken, making it harder to audit the history of benchmark changes or understand the rationale behind this configuration.

How to fix it

Replace XXXX with 1059:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1059

Step-by-step proof

PR Add B300 config: dsr1-fp8-sglang-mtp #1059 adds the dsr1-fp8-b300-sglang-mtp entry to perf-changelog.yaml.

Commit b7d1595 is the merge commit for this PR.

Running tail -20 perf-changelog.yaml on HEAD confirms the final pr-link value is .../pull/XXXX.

Every other resolved entry in the file uses a real PR number (e.g., /pull/1049, /pull/1048, /pull/1035, etc.).

While a handful of other entries also use XXX or XXXX (for PRs whose numbers may have been unknown at submission time), this entry's PR number is definitively known: it is 1059.

The fix is a one-character change: replace XXXX with 1059.

At the time of submission, the SGLang DSR1 cookbook (https://cookbook.sglang.io/autoregressive/DeepSeek/DeepSeek-R1) does not have a B300-specific recipe, so this config reuses the existing DSR1 FP8 B200 SGLang MTP recipe as-is until B300-specific tuning is available. Image bumped to v0.5.10.post1-cu130 to match the standard B300 SGLang image used by other B300 configs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx requested a review from a team April 17, 2026 08:45

functionstackx requested review from jgangani and kedarpotdar-nv as code owners April 17, 2026 08:45

github-project-automation bot added this to InferenceMAX Board Apr 17, 2026

claude bot reviewed Apr 17, 2026

View reviewed changes

functionstackx added the sweep-enabled label Apr 17, 2026

functionstackx and others added 2 commits April 17, 2026 07:25

Fill in PR link for dsr1-fp8-b300-sglang-mtp changelog entry

70b8774

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx force-pushed the claude/add-dsr1-fp8-b300-sglang-mtp branch from ee5fabc to 70b8774 Compare April 17, 2026 11:26

functionstackx merged commit f8543f9 into main Apr 17, 2026
5 checks passed

functionstackx deleted the claude/add-dsr1-fp8-b300-sglang-mtp branch April 17, 2026 11:26

github-project-automation bot moved this to Done in InferenceMAX Board Apr 17, 2026

claude bot mentioned this pull request Apr 20, 2026

Add B300 config: kimi-k2.5-fp4-vllm #1100

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add B300 config: dsr1-fp8-sglang-mtp#1059

Add B300 config: dsr1-fp8-sglang-mtp#1059
functionstackx merged 2 commits intomainfrom
claude/add-dsr1-fp8-b300-sglang-mtp

functionstackx commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

claude bot Apr 17, 2026

Uh oh!

claude bot Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Apr 17, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

claude bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant