Skip to content

Update Qwen3.5 FP8 B200 SGLang#1027

Merged
hshrivastava-droid merged 17 commits intomainfrom
nv/qwen3.5-fp8-b200-sglang-update
Apr 30, 2026
Merged

Update Qwen3.5 FP8 B200 SGLang#1027
hshrivastava-droid merged 17 commits intomainfrom
nv/qwen3.5-fp8-b200-sglang-update

Conversation

@Ankur-singh
Copy link
Copy Markdown
Collaborator

Summary

  • Search-space: tp8 conc=4, tp4 conc=4-256 (both seq-lens)
  • Updated server launch args to match latest recipe

…r args

Update search-space to tp8 conc=4 and tp4 conc=4-256. Update benchmark
script with new server launch: enable-symm-mem, prefill/chunked 16384,
mem-fraction-static 0.8, stream-interval 50, tokenizer-path.
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

@Ankur-singh Ankur-singh changed the title [WIP] Update Qwen3.5 FP8 B200 SGLang: tp8 conc=4, tp4 conc=4-256 [WIP] Update Qwen3.5 FP8 B200 SGLang Apr 13, 2026
@Ankur-singh
Copy link
Copy Markdown
Collaborator Author

@claude update perf-changelog.yaml file

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 13, 2026

Claude finished @Ankur-singh's task in 56s —— View job


Updated perf-changelog.yaml

  • Gather context on PR changes
  • Read current perf-changelog.yaml
  • Add changelog entry
  • Commit and push

Added a perf-changelog.yaml entry for qwen3.5-fp8-b200-sglang covering:

  • Search-space update (tp8 conc=4, tp4 conc=4-256; ep removed)
  • New server args (--enable-symm-mem, --disable-radix-cache, --tokenizer-path)
  • Tuning changes (prefill 32768→16384, mem-fraction 0.82→0.8, stream-interval 30→50)
  • Removed env vars and conditional --enable-flashinfer-allreduce-fusion

Commit: 2b15d3d
|

github-actions Bot and others added 2 commits April 13, 2026 18:53
Co-authored-by: Ankur Singh <Ankur-singh@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is still marked [WIP] and also removes the ep field from the search-space YAML while the benchmark script still requires EP_SIZE as a mandatory env var — worth a human look before merging.

Extended reasoning...

Overview

Two files changed: the nvidia-master.yaml CI config (search-space parameters for qwen3.5-fp8-b200-sglang) and benchmarks/single_node/qwen3.5_fp8_b200.sh (server launch args). The changes simplify the server invocation and narrow the tp8 concurrency sweep to a single point (4→4) while widening tp4 (4–256).

Security Risks

No security-sensitive code is touched. No auth, crypto, or permissions changes.

Level of Scrutiny

Moderate. This is benchmark/CI config code, not production logic, but changes to the search-space and server launch args affect the validity of benchmark results and CI correctness. The [WIP] title explicitly signals the author considers it unfinished.

Other Factors

The ep key is removed from both search-space entries in the YAML, yet the benchmark script still lists EP_SIZE in its check_env_vars call (mandatory) and passes it as --expert-parallel-size=$EP_SIZE. If EP_SIZE is not injected by the CI harness through another path, jobs will fail at startup. This potential misalignment, combined with the [WIP] designation, warrants a human review before merging.

@Ankur-singh Ankur-singh force-pushed the nv/qwen3.5-fp8-b200-sglang-update branch from a645df8 to 5d6a8e4 Compare April 13, 2026 21:53
@Ankur-singh Ankur-singh force-pushed the nv/qwen3.5-fp8-b200-sglang-update branch from ae3066b to 8c79915 Compare April 15, 2026 19:11
@hshrivastava-droid hshrivastava-droid force-pushed the nv/qwen3.5-fp8-b200-sglang-update branch from 95d9314 to 8c79915 Compare April 27, 2026 21:40
@hshrivastava-droid hshrivastava-droid changed the title [WIP] Update Qwen3.5 FP8 B200 SGLang Update Qwen3.5 FP8 B200 SGLang Apr 29, 2026
@hshrivastava-droid
Copy link
Copy Markdown
Collaborator

hshrivastava-droid commented Apr 29, 2026

@Klaud-Cold - add PR description

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator

@functionstackx - could you please review.

@Ankur-singh
Copy link
Copy Markdown
Collaborator Author

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just need cookbook update

Copy link
Copy Markdown
Collaborator

@jgangani jgangani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator

sglang cookbook receipe- sgl-project/sgl-cookbook#262

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hshrivastava-droid hshrivastava-droid merged commit 5152941 into main Apr 30, 2026
13 of 32 checks passed
@hshrivastava-droid hshrivastava-droid deleted the nv/qwen3.5-fp8-b200-sglang-update branch April 30, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

6 participants