(ci)[SGLang-ATOM]: Add Qwen3.5 cases for ci, nightly and benchmark by zhuyuhua-v · Pull Request #777 · ROCm/ATOM

zhuyuhua-v · 2026-05-13T12:19:43Z

Motivation

Add Qwen3.5 coverage to ATOM SGLang CI / nightly accuracy / benchmark flows.
Align Qwen3.5 SGLang launch args with validated local commands.
Add Qwen3.5-397B-A17B-FP8 TP4 / TP8 benchmark cases.
Rotate scheduled benchmark groups to avoid running all benchmark cases every night.
Update recipes/atom_sglang/Qwen3_5.md with server, benchmark, and GSM8K commands.

ATOM SGLang CI / Nightly / Benchmark Scope

CI

Item	Value
Workflow	`.github/workflows/atom-sglang-test.yaml`
Trigger	PR to `main`, non-draft, non-closed
Purpose	PR-level SGLang GSM8K accuracy smoke validation

Model	Weight	Runner	TP	Threshold
DeepSeek-R1-FP8 TP4	`deepseek-ai/DeepSeek-R1-0528`	`linux-atom-mi35x-4`	4	`0.91`
DeepSeek-R1-FP4 TP4	`amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4`	`linux-atom-mi35x-4`	4	`0.91`
Qwen3.5-35B-A3B-FP8 TP2	`Qwen/Qwen3.5-35B-A3B-FP8`	`linux-atom-mi35x-4`	2	`0.76`

Nightly Accuracy

Item	Value
Workflow	`.github/workflows/atom-sglang-accuracy-validation.yaml`
Trigger	Nightly `18:00 UTC` / Beijing `02:00`, or manual dispatch
Task	`gsm8k`
Metric	`results.gsm8k["exact_match,flexible-extract"]`
Few-shot	`3`
LM Eval concurrency	`65`
LM Eval retries	`1`
SGLang ref	`v0.5.10`

Model	Weight	Runner	TP	Threshold
DeepSeek-R1-FP8 TP4	`deepseek-ai/DeepSeek-R1-0528`	`linux-atom-mi35x-4`	4	`0.91`
DeepSeek-R1-FP8 TP8	`deepseek-ai/DeepSeek-R1-0528`	`linux-atom-mi35x-8`	8	`0.93`
DeepSeek-R1-FP4 TP4	`amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4`	`linux-atom-mi35x-4`	4	`0.91`
DeepSeek-R1-FP4 TP8	`amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4`	`linux-atom-mi35x-8`	8	`0.93`
Qwen3.5-35B-A3B-FP8 TP2	`Qwen/Qwen3.5-35B-A3B-FP8`	`linux-atom-mi35x-4`	2	`0.76`
Qwen3.5-35B-A3B TP2	`Qwen/Qwen3.5-35B-A3B`	`linux-atom-mi35x-4`	2	`0.83`
Qwen3.5-397B-A17B-FP8 TP4	`Qwen/Qwen3.5-397B-A17B-FP8`	`linux-atom-mi35x-4`	4	`0.83`
Qwen3.5-397B-A17B-FP8 TP8	`Qwen/Qwen3.5-397B-A17B-FP8`	`linux-atom-mi35x-8`	8	`0.83`

Server Args

Model Family	Default Server Args	Extra Args	Env
DeepSeek	`--trust-remote-code --kv-cache-dtype fp8_e4m3 --mem-fraction-static 0.8 --page-size 1 --disable-radix-cache`	`--tensor-parallel-size <tp>`; EP case adds `--expert-parallel-size 8`	`AITER_QUICK_REDUCE_QUANTIZATION=INT4`, `SGLANG_AITER_FP8_PREFILL_ATTN=0`, `SGLANG_USE_AITER=1`, `ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1`
Qwen3.5	empty via `SGLANG_DEFAULT_SERVER_ARGS=`	`--tensor-parallel-size <tp> --mem-fraction-static 0.9 --reasoning-parser qwen3 --disable-radix-cache`	`SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models`, `ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION=0`

Nightly Benchmark

Item	Value
Workflow	`.github/workflows/atom-sglang-benchmark.yaml`
Trigger	Weekday `15:00 UTC` / Beijing `23:00`, or manual dispatch
Manual selection	Checkbox-based model selection
Param override	`param_lists`
Dashboard upload	`publish_to_dashboard`

Schedule Group	Beijing Day	Models	Default Case Count
`A-DEEPSEEK`	Monday / Wednesday	5 DeepSeek benchmark models	`5 × 10 = 50`
`B-QWEN35`	Tuesday / Thursday	2 Qwen3.5-397B benchmark models	`2 × 10 = 20`
`C-ALL`	Friday	All benchmark models	`7 × 10 = 70`

ISL	OSL	Concurrency	Random Range Ratio
1024	1024	`4, 8, 16, 32, 64`	`0.8`
8192	1024	`4, 8, 16, 32, 64`	`0.8`

Model	Weight	Runner	Serve Args
DeepSeek-R1-0528 FP8 TP8	`deepseek-ai/DeepSeek-R1-0528`	`atom-mi355-8gpu-aac-runner`	`--trust-remote-code --tensor-parallel-size 8`
DeepSeek-R1-0528 FP8 TP4	`deepseek-ai/DeepSeek-R1-0528`	`atom-mi355-8gpu-aac-runner`	`--trust-remote-code --tensor-parallel-size 4`
DeepSeek-R1-0528-MXFP4 FP4 TP8	`amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4`	`atom-mi355-8gpu-aac-runner`	`--trust-remote-code --tensor-parallel-size 8`
DeepSeek-R1-0528-MXFP4 FP4 TP4	`amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4`	`atom-mi355-8gpu-aac-runner`	`--trust-remote-code --tensor-parallel-size 4`
DeepSeek-R1-0528-MXFP4 FP4 TP8 EP8	`amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4`	`atom-mi355-8gpu-aac-runner`	`--trust-remote-code --tensor-parallel-size 8 --expert-parallel-size 8`
Qwen3.5-397B-A17B-FP8 TP4	`Qwen/Qwen3.5-397B-A17B-FP8`	`atom-mi355-8gpu-aac-runner`	`--tensor-parallel-size 4 --mem-fraction-static 0.9 --reasoning-parser qwen3 --disable-radix-cache`
Qwen3.5-397B-A17B-FP8 TP8	`Qwen/Qwen3.5-397B-A17B-FP8`	`atom-mi355-8gpu-aac-runner`	`--tensor-parallel-size 8 --mem-fraction-static 0.9 --reasoning-parser qwen3 --disable-radix-cache`

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Copilot

Pull request overview

Adds Qwen3.5 model coverage across ATOM’s SGLang CI smoke tests, nightly GSM8K accuracy validation, and scheduled benchmark runs, aligning runtime/server args and introducing benchmark rotation to reduce nightly load.

Changes:

Add Qwen3.5 model entries to CI (atom-sglang-test) and nightly accuracy validation matrices with Qwen-specific server args/env.
Add Qwen3.5-397B benchmark model definitions and manual selectors; rotate scheduled benchmark runs by weekday group.
Update the Qwen3.5 recipe with revised launch and GSM8K lm-eval settings.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`recipes/atom_sglang/Qwen3_5.md`	Updates documented server/accuracy commands for Qwen3.5.
`.github/workflows/atom-sglang-test.yaml`	Adds a Qwen3.5 CI GSM8K smoke case (TP2) with Qwen env/args.
`.github/workflows/atom-sglang-benchmark.yaml`	Adds Qwen benchmark toggles and weekday schedule-based model group rotation.
`.github/workflows/atom-sglang-accuracy-validation.yaml`	Adds Qwen3.5 nightly accuracy cases and manual toggles.
`.github/scripts/atom_sglang_test.sh`	Introduces configurable `SGLANG_DEFAULT_SERVER_ARGS` to support model-family-specific defaults.
`.github/benchmark/sglang_models_accuracy.json`	Adds Qwen3.5 accuracy model configs/thresholds for nightly tracking.
`.github/benchmark/sglang_benchmark_models.json`	Adds Qwen3.5 benchmark models and `nightly_group` metadata for rotation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

add qwen3.5 cases for sglang-atom

9b331c9

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Copilot AI review requested due to automatic review settings May 13, 2026 12:19

Copilot started reviewing on behalf of zhuyuhua-v May 13, 2026 12:21 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread recipes/atom_sglang/Qwen3_5.md

Comment thread recipes/atom_sglang/Qwen3_5.md

Comment thread recipes/atom_sglang/Qwen3_5.md

wanzhenchn reviewed May 14, 2026

View reviewed changes

Comment thread .github/benchmark/sglang_benchmark_models.json Outdated

zhuyuhua-v added 2 commits May 14, 2026 03:56

remove redundant env

44becfa

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Merge branch 'main' into yuhua/sgl-qwen-test-cases

da60199

Copilot AI review requested due to automatic review settings May 14, 2026 08:57

Copilot started reviewing on behalf of zhuyuhua-v May 14, 2026 08:58 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

Comment thread recipes/atom_sglang/Qwen3_5.md

zejunchen-zejun approved these changes May 15, 2026

View reviewed changes

valarLip approved these changes May 15, 2026

View reviewed changes

valarLip merged commit f993245 into main May 15, 2026
65 of 75 checks passed

valarLip deleted the yuhua/sgl-qwen-test-cases branch May 15, 2026 03:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(ci)[SGLang-ATOM]: Add Qwen3.5 cases for ci, nightly and benchmark#777

(ci)[SGLang-ATOM]: Add Qwen3.5 cases for ci, nightly and benchmark#777
valarLip merged 3 commits into
mainfrom
yuhua/sgl-qwen-test-cases

zhuyuhua-v commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

zhuyuhua-v commented May 13, 2026

Motivation

ATOM SGLang CI / Nightly / Benchmark Scope

CI

Nightly Accuracy

Server Args

Nightly Benchmark

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants