[https://nvbugs/5910749][https://nvbugs/5995486][test] Fix Qwen3 skip softmax attention CI tests#12789
Conversation
… softmax attention CI tests - Update threshold values (thr_prefill/thr_decode) to match blog-calibrated values for Qwen3-30B-A3B-Instruct-2507 - Add fp8kv parametrize dimension to both test methods; KvCacheConfig now accepts dtype="fp8" when fp8kv=True - Rename test_skip_softmax_attention_2gpus -> test_skip_softmax_attention_4gpus, switch to TP=4/EP=4, add @skip_pre_hopper, move to post-merge CI only - Remove 2-GPU pre-merge entries from l0_dgx_h100.yml - l0_b200.yml: replace 2 old entries with single target_sparsity_0.9-fp8kv=True - l0_dgx_h200.yml and l0_dgx_b200.yml: add 4 post-merge 4-GPU entries each (sparsity 0.5/0.9 x fp8kv False/True) - Update longbench_v1.yaml reference accuracy values measured on B200 with updated thresholds; add 3 new FP8 KV cache entries Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
/bot run |
|
PR_Github #42044 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughThe PR updates accuracy reference values for the Qwen3-30B model under BF16 KV-cache configuration and introduces new FP8 KV-cache reference entries with corresponding accuracy values. Test methods are parametrized with an Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Around line 4948-4950: The function signature for
test_skip_softmax_attention_4gpus exceeds yapf's expected wrapping and must be
reformatted to follow the 80-char layout; run yapf (or pre-commit) to rewrap the
parameter list of test_skip_softmax_attention_4gpus so parameters
(target_sparsity, thr_prefill, thr_decode, fp8kv) are each on their own or
appropriately wrapped lines per yapf rules and commit the formatted change to
keep release checks green.
- Around line 4935-4948: The test test_skip_softmax_attention_4gpus must locally
skip when fewer than 4 visible GPUs are present; add a guard at the start of the
function (or a pytest.skipif decorator) that checks torch.cuda.device_count() <
4 and calls pytest.skip("requires 4 GPUs") so the test self-skips on smaller
hosts; ensure torch and pytest are imported if not already and place the check
inside test_skip_softmax_attention_4gpus before any code that assumes
tensor_parallel_size=4 / moe_expert_parallel_size=4.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 600d8841-f924-4a80-8b85-7ace1a449cb8
📒 Files selected for processing (6)
tests/integration/defs/accuracy/references/longbench_v1.yamltests/integration/defs/accuracy/test_llm_api_pytorch.pytests/integration/test_lists/test-db/l0_b200.ymltests/integration/test_lists/test-db/l0_dgx_b200.ymltests/integration/test_lists/test-db/l0_dgx_h100.ymltests/integration/test_lists/test-db/l0_dgx_h200.yml
💤 Files with no reviewable changes (1)
- tests/integration/test_lists/test-db/l0_dgx_h100.yml
…: add skip_less_device(4) and fix yapf formatting - Add @pytest.mark.skip_less_device(4) to test_skip_softmax_attention_4gpus so direct test selection skips gracefully on <4-GPU hosts - Fix yapf parameter wrapping on method signature Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
/bot run |
… comments from test-db YAMLs Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
PR_Github #42044 [ run ] completed with state
|
|
/bot kill |
|
/bot run --disable-fail-fast |
|
PR_Github #42048 [ run ] triggered by Bot. Commit: |
|
PR_Github #42049 [ kill ] triggered by Bot. Commit: |
|
PR_Github #42048 [ run ] completed with state |
|
PR_Github #42049 [ kill ] completed with state |
|
PR_Github #42050 [ run ] triggered by Bot. Commit: |
|
PR_Github #42050 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #42138 [ run ] triggered by Bot. Commit: |
|
PR_Github #42138 [ run ] completed with state
|
|
/bot run --disable-fail-fast --reuse-test |
|
PR_Github #42164 [ run ] triggered by Bot. Commit: |
|
PR_Github #42164 [ run ] completed with state
|
|
/bot run --disable-fail-fast --reuse-test --post-merge |
|
PR_Github #42263 [ run ] triggered by Bot. Commit: |
|
PR_Github #42263 [ run ] completed with state
|
|
/bot run --reuse-test |
|
PR_Github #42345 [ run ] triggered by Bot. Commit: |
|
PR_Github #42345 [ run ] completed with state
|
|
/bot run --reuse-test |
|
PR_Github #42446 [ run ] triggered by Bot. Commit: |
|
PR_Github #42446 [ run ] completed with state
|
|
/bot run --reuse-test |
|
PR_Github #42491 [ run ] triggered by Bot. Commit: |
|
PR_Github #42491 [ run ] completed with state
|
|
/bot run --reuse-test |
|
PR_Github #42658 [ run ] triggered by Bot. Commit: |
|
PR_Github #42658 [ run ] completed with state
|
|
/bot run --reuse-test |
|
PR_Github #42709 [ run ] triggered by Bot. Commit: |
|
PR_Github #42709 [ run ] completed with state |
Summary
Fixes two P0 CI bugs for the Qwen3-30B-A3B-Instruct-2507 skip softmax attention tests:
test_skip_softmax_attention[target_sparsity_0.9]on B200test_skip_softmax_attention_2gpuscausing OOM in CIChanges:
thr_prefill/thr_decodeto blog-calibrated values for Qwen3-30B-A3B-Instruct-2507 (sparsity 0.5: 587.18/16.52; sparsity 0.9: 18471.56/852.20)fp8kv=[False, True]parametrize dimension; threadKvCacheConfig(dtype="fp8" if fp8kv else "auto")into both test methodstest_skip_softmax_attention_2gpus→test_skip_softmax_attention_4gpus, switch to TP=4/EP=4, add@skip_pre_hopper, move entirely to post-merge CIl0_dgx_h100.yml;l0_b200.ymlreduced to one entry (target_sparsity_0.9-fp8kv=True);l0_dgx_h200.ymlandl0_dgx_b200.ymlgain 4 post-merge 4-GPU entries eachlongbench_v1.yamlwith values measured on B200 using updated thresholds; add 3 new FP8 KV cache reference entriesTest plan
{0.0, 0.5, 0.9} × {fp8kv=False, fp8kv=True}) run on B200 withTRTLLM_ACCURACY_NO_REFERENCE=1to capture new reference valuestest_skip_softmax_attention_2gpusin YAML/test filesparametrize_with_idsoutput format (fp8kv=True/False)Summary by CodeRabbit
Release Notes
New Features
Tests