Skip to content

[NV] Update: sglang v2 Qwen3.5 h200 MTP#1017

Merged
Oseltamivir merged 12 commits intomainfrom
nv/qwen35_h200_v2
Apr 14, 2026
Merged

[NV] Update: sglang v2 Qwen3.5 h200 MTP#1017
Oseltamivir merged 12 commits intomainfrom
nv/qwen35_h200_v2

Conversation

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator

@hshrivastava-droid hshrivastava-droid commented Apr 8, 2026

Summary

Enable SGLang speculative decoding v2 (SGLANG_ENABLE_SPEC_V2=1) for the Qwen3.5 FP8 H200 MTP benchmark configuration.

Changes

  • benchmarks/single_node/qwen3.5_fp8_h200_mtp.sh: Set SGLANG_ENABLE_SPEC_V2=1 environment variable on the sglang.launch_server command to enable the v2 speculative decoding engine for EAGLE-based multi-token prediction
  • perf-changelog.yaml: Added changelog entry for qwen3.5-fp8-h200-sglang-mtp documenting the spec v2 enablement

Context

The Qwen3.5 FP8 H200 MTP benchmark uses EAGLE speculative decoding (3 speculative steps, 4 draft tokens, topk=1). This PR enables SGLang's v2 speculative decoding implementation via the SGLANG_ENABLE_SPEC_V2=1 flag, which is expected to improve MTP performance.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

Comment thread perf-changelog.yaml Outdated
@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

@claude add PR description

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 9, 2026

Claude finished @hshrivastava-droid's task in 38s —— View job


Adding PR Description

  • Gather context from changed files and diff
  • Write and update PR description

PR description has been added. It covers:

  • Summary: Enabling SGLang speculative decoding v2 for the Qwen3.5 FP8 H200 MTP benchmark
  • Changes: The two modified files and what each change does
  • Context: Background on the EAGLE speculative decoding config and the purpose of the SGLANG_ENABLE_SPEC_V2=1 flag

Copy link
Copy Markdown
Collaborator

@jgangani jgangani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hshrivastava-droid hshrivastava-droid changed the title [WIP][NV] Update: sglang v2 Qwen3.5 h200 MTP [NV] Update: sglang v2 Qwen3.5 h200 MTP Apr 13, 2026
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cookbook plz 😭

Comment thread benchmarks/single_node/qwen3.5_fp8_h200_mtp.sh
faradawn added a commit to faradawn/sgl-cookbook that referenced this pull request Apr 14, 2026
Enable SGLang speculative decoding v2 engine for the Qwen3.5 FP8 H200 + MTP combination, as validated in SemiAnalysisAI/InferenceX#1017.

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

Sglang cookbook- sgl-project/sgl-cookbook#240

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

@functionstackx - could you please review this?

Copy link
Copy Markdown
Collaborator

@Oseltamivir Oseltamivir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Oseltamivir Oseltamivir merged commit 6cb8291 into main Apr 14, 2026
4 checks passed
@Oseltamivir Oseltamivir deleted the nv/qwen35_h200_v2 branch April 14, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

6 participants