(ci)[SGLang-ATOM]: Add Qwen3.5 cases for ci, nightly and benchmark#777
Merged
Conversation
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds Qwen3.5 model coverage across ATOM’s SGLang CI smoke tests, nightly GSM8K accuracy validation, and scheduled benchmark runs, aligning runtime/server args and introducing benchmark rotation to reduce nightly load.
Changes:
- Add Qwen3.5 model entries to CI (
atom-sglang-test) and nightly accuracy validation matrices with Qwen-specific server args/env. - Add Qwen3.5-397B benchmark model definitions and manual selectors; rotate scheduled benchmark runs by weekday group.
- Update the Qwen3.5 recipe with revised launch and GSM8K lm-eval settings.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
recipes/atom_sglang/Qwen3_5.md |
Updates documented server/accuracy commands for Qwen3.5. |
.github/workflows/atom-sglang-test.yaml |
Adds a Qwen3.5 CI GSM8K smoke case (TP2) with Qwen env/args. |
.github/workflows/atom-sglang-benchmark.yaml |
Adds Qwen benchmark toggles and weekday schedule-based model group rotation. |
.github/workflows/atom-sglang-accuracy-validation.yaml |
Adds Qwen3.5 nightly accuracy cases and manual toggles. |
.github/scripts/atom_sglang_test.sh |
Introduces configurable SGLANG_DEFAULT_SERVER_ARGS to support model-family-specific defaults. |
.github/benchmark/sglang_models_accuracy.json |
Adds Qwen3.5 accuracy model configs/thresholds for nightly tracking. |
.github/benchmark/sglang_benchmark_models.json |
Adds Qwen3.5 benchmark models and nightly_group metadata for rotation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
wanzhenchn
reviewed
May 14, 2026
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
zejunchen-zejun
approved these changes
May 15, 2026
valarLip
approved these changes
May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
recipes/atom_sglang/Qwen3_5.mdwith server, benchmark, and GSM8K commands.ATOM SGLang CI / Nightly / Benchmark Scope
CI
.github/workflows/atom-sglang-test.yamlmain, non-draft, non-closeddeepseek-ai/DeepSeek-R1-0528linux-atom-mi35x-40.91amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4linux-atom-mi35x-40.91Qwen/Qwen3.5-35B-A3B-FP8linux-atom-mi35x-40.76Nightly Accuracy
.github/workflows/atom-sglang-accuracy-validation.yaml18:00 UTC/ Beijing02:00, or manual dispatchgsm8kresults.gsm8k["exact_match,flexible-extract"]3651v0.5.10deepseek-ai/DeepSeek-R1-0528linux-atom-mi35x-40.91deepseek-ai/DeepSeek-R1-0528linux-atom-mi35x-80.93amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4linux-atom-mi35x-40.91amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4linux-atom-mi35x-80.93Qwen/Qwen3.5-35B-A3B-FP8linux-atom-mi35x-40.76Qwen/Qwen3.5-35B-A3Blinux-atom-mi35x-40.83Qwen/Qwen3.5-397B-A17B-FP8linux-atom-mi35x-40.83Qwen/Qwen3.5-397B-A17B-FP8linux-atom-mi35x-80.83Server Args
--trust-remote-code --kv-cache-dtype fp8_e4m3 --mem-fraction-static 0.8 --page-size 1 --disable-radix-cache--tensor-parallel-size <tp>; EP case adds--expert-parallel-size 8AITER_QUICK_REDUCE_QUANTIZATION=INT4,SGLANG_AITER_FP8_PREFILL_ATTN=0,SGLANG_USE_AITER=1,ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1SGLANG_DEFAULT_SERVER_ARGS=--tensor-parallel-size <tp> --mem-fraction-static 0.9 --reasoning-parser qwen3 --disable-radix-cacheSGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models,ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION=0Nightly Benchmark
.github/workflows/atom-sglang-benchmark.yaml15:00 UTC/ Beijing23:00, or manual dispatchparam_listspublish_to_dashboardA-DEEPSEEK5 × 10 = 50B-QWEN352 × 10 = 20C-ALL7 × 10 = 704, 8, 16, 32, 640.84, 8, 16, 32, 640.8deepseek-ai/DeepSeek-R1-0528atom-mi355-8gpu-aac-runner--trust-remote-code --tensor-parallel-size 8deepseek-ai/DeepSeek-R1-0528atom-mi355-8gpu-aac-runner--trust-remote-code --tensor-parallel-size 4amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4atom-mi355-8gpu-aac-runner--trust-remote-code --tensor-parallel-size 8amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4atom-mi355-8gpu-aac-runner--trust-remote-code --tensor-parallel-size 4amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4atom-mi355-8gpu-aac-runner--trust-remote-code --tensor-parallel-size 8 --expert-parallel-size 8Qwen/Qwen3.5-397B-A17B-FP8atom-mi355-8gpu-aac-runner--tensor-parallel-size 4 --mem-fraction-static 0.9 --reasoning-parser qwen3 --disable-radix-cacheQwen/Qwen3.5-397B-A17B-FP8atom-mi355-8gpu-aac-runner--tensor-parallel-size 8 --mem-fraction-static 0.9 --reasoning-parser qwen3 --disable-radix-cache