Skip to content

(ci)[SGLang-ATOM]: Add Qwen3.5 cases for ci, nightly and benchmark#777

Merged
valarLip merged 3 commits into
mainfrom
yuhua/sgl-qwen-test-cases
May 15, 2026
Merged

(ci)[SGLang-ATOM]: Add Qwen3.5 cases for ci, nightly and benchmark#777
valarLip merged 3 commits into
mainfrom
yuhua/sgl-qwen-test-cases

Conversation

@zhuyuhua-v
Copy link
Copy Markdown
Collaborator

Motivation

  • Add Qwen3.5 coverage to ATOM SGLang CI / nightly accuracy / benchmark flows.
  • Align Qwen3.5 SGLang launch args with validated local commands.
  • Add Qwen3.5-397B-A17B-FP8 TP4 / TP8 benchmark cases.
  • Rotate scheduled benchmark groups to avoid running all benchmark cases every night.
  • Update recipes/atom_sglang/Qwen3_5.md with server, benchmark, and GSM8K commands.

ATOM SGLang CI / Nightly / Benchmark Scope

CI

Item Value
Workflow .github/workflows/atom-sglang-test.yaml
Trigger PR to main, non-draft, non-closed
Purpose PR-level SGLang GSM8K accuracy smoke validation
Model Weight Runner TP Threshold
DeepSeek-R1-FP8 TP4 deepseek-ai/DeepSeek-R1-0528 linux-atom-mi35x-4 4 0.91
DeepSeek-R1-FP4 TP4 amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 linux-atom-mi35x-4 4 0.91
Qwen3.5-35B-A3B-FP8 TP2 Qwen/Qwen3.5-35B-A3B-FP8 linux-atom-mi35x-4 2 0.76

Nightly Accuracy

Item Value
Workflow .github/workflows/atom-sglang-accuracy-validation.yaml
Trigger Nightly 18:00 UTC / Beijing 02:00, or manual dispatch
Task gsm8k
Metric results.gsm8k["exact_match,flexible-extract"]
Few-shot 3
LM Eval concurrency 65
LM Eval retries 1
SGLang ref v0.5.10
Model Weight Runner TP Threshold
DeepSeek-R1-FP8 TP4 deepseek-ai/DeepSeek-R1-0528 linux-atom-mi35x-4 4 0.91
DeepSeek-R1-FP8 TP8 deepseek-ai/DeepSeek-R1-0528 linux-atom-mi35x-8 8 0.93
DeepSeek-R1-FP4 TP4 amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 linux-atom-mi35x-4 4 0.91
DeepSeek-R1-FP4 TP8 amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 linux-atom-mi35x-8 8 0.93
Qwen3.5-35B-A3B-FP8 TP2 Qwen/Qwen3.5-35B-A3B-FP8 linux-atom-mi35x-4 2 0.76
Qwen3.5-35B-A3B TP2 Qwen/Qwen3.5-35B-A3B linux-atom-mi35x-4 2 0.83
Qwen3.5-397B-A17B-FP8 TP4 Qwen/Qwen3.5-397B-A17B-FP8 linux-atom-mi35x-4 4 0.83
Qwen3.5-397B-A17B-FP8 TP8 Qwen/Qwen3.5-397B-A17B-FP8 linux-atom-mi35x-8 8 0.83

Server Args

Model Family Default Server Args Extra Args Env
DeepSeek --trust-remote-code --kv-cache-dtype fp8_e4m3 --mem-fraction-static 0.8 --page-size 1 --disable-radix-cache --tensor-parallel-size <tp>; EP case adds --expert-parallel-size 8 AITER_QUICK_REDUCE_QUANTIZATION=INT4, SGLANG_AITER_FP8_PREFILL_ATTN=0, SGLANG_USE_AITER=1, ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1
Qwen3.5 empty via SGLANG_DEFAULT_SERVER_ARGS= --tensor-parallel-size <tp> --mem-fraction-static 0.9 --reasoning-parser qwen3 --disable-radix-cache SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models, ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION=0

Nightly Benchmark

Item Value
Workflow .github/workflows/atom-sglang-benchmark.yaml
Trigger Weekday 15:00 UTC / Beijing 23:00, or manual dispatch
Manual selection Checkbox-based model selection
Param override param_lists
Dashboard upload publish_to_dashboard
Schedule Group Beijing Day Models Default Case Count
A-DEEPSEEK Monday / Wednesday 5 DeepSeek benchmark models 5 × 10 = 50
B-QWEN35 Tuesday / Thursday 2 Qwen3.5-397B benchmark models 2 × 10 = 20
C-ALL Friday All benchmark models 7 × 10 = 70
ISL OSL Concurrency Random Range Ratio
1024 1024 4, 8, 16, 32, 64 0.8
8192 1024 4, 8, 16, 32, 64 0.8
Model Weight Runner Serve Args
DeepSeek-R1-0528 FP8 TP8 deepseek-ai/DeepSeek-R1-0528 atom-mi355-8gpu-aac-runner --trust-remote-code --tensor-parallel-size 8
DeepSeek-R1-0528 FP8 TP4 deepseek-ai/DeepSeek-R1-0528 atom-mi355-8gpu-aac-runner --trust-remote-code --tensor-parallel-size 4
DeepSeek-R1-0528-MXFP4 FP4 TP8 amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 atom-mi355-8gpu-aac-runner --trust-remote-code --tensor-parallel-size 8
DeepSeek-R1-0528-MXFP4 FP4 TP4 amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 atom-mi355-8gpu-aac-runner --trust-remote-code --tensor-parallel-size 4
DeepSeek-R1-0528-MXFP4 FP4 TP8 EP8 amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 atom-mi355-8gpu-aac-runner --trust-remote-code --tensor-parallel-size 8 --expert-parallel-size 8
Qwen3.5-397B-A17B-FP8 TP4 Qwen/Qwen3.5-397B-A17B-FP8 atom-mi355-8gpu-aac-runner --tensor-parallel-size 4 --mem-fraction-static 0.9 --reasoning-parser qwen3 --disable-radix-cache
Qwen3.5-397B-A17B-FP8 TP8 Qwen/Qwen3.5-397B-A17B-FP8 atom-mi355-8gpu-aac-runner --tensor-parallel-size 8 --mem-fraction-static 0.9 --reasoning-parser qwen3 --disable-radix-cache

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
Copilot AI review requested due to automatic review settings May 13, 2026 12:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Qwen3.5 model coverage across ATOM’s SGLang CI smoke tests, nightly GSM8K accuracy validation, and scheduled benchmark runs, aligning runtime/server args and introducing benchmark rotation to reduce nightly load.

Changes:

  • Add Qwen3.5 model entries to CI (atom-sglang-test) and nightly accuracy validation matrices with Qwen-specific server args/env.
  • Add Qwen3.5-397B benchmark model definitions and manual selectors; rotate scheduled benchmark runs by weekday group.
  • Update the Qwen3.5 recipe with revised launch and GSM8K lm-eval settings.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
recipes/atom_sglang/Qwen3_5.md Updates documented server/accuracy commands for Qwen3.5.
.github/workflows/atom-sglang-test.yaml Adds a Qwen3.5 CI GSM8K smoke case (TP2) with Qwen env/args.
.github/workflows/atom-sglang-benchmark.yaml Adds Qwen benchmark toggles and weekday schedule-based model group rotation.
.github/workflows/atom-sglang-accuracy-validation.yaml Adds Qwen3.5 nightly accuracy cases and manual toggles.
.github/scripts/atom_sglang_test.sh Introduces configurable SGLANG_DEFAULT_SERVER_ARGS to support model-family-specific defaults.
.github/benchmark/sglang_models_accuracy.json Adds Qwen3.5 accuracy model configs/thresholds for nightly tracking.
.github/benchmark/sglang_benchmark_models.json Adds Qwen3.5 benchmark models and nightly_group metadata for rotation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread recipes/atom_sglang/Qwen3_5.md
Comment thread recipes/atom_sglang/Qwen3_5.md
Comment thread recipes/atom_sglang/Qwen3_5.md
Comment thread .github/benchmark/sglang_benchmark_models.json Outdated
Copilot AI review requested due to automatic review settings May 14, 2026 08:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread recipes/atom_sglang/Qwen3_5.md
@valarLip valarLip merged commit f993245 into main May 15, 2026
65 of 75 checks passed
@valarLip valarLip deleted the yuhua/sgl-qwen-test-cases branch May 15, 2026 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants