[https://nvbugs/5910749][https://nvbugs/5995486][test] Fix Qwen3 skip softmax attention CI tests by bobboli · Pull Request #12789 · NVIDIA/TensorRT-LLM

bobboli · 2026-04-07T03:11:16Z

Summary

Fixes two P0 CI bugs for the Qwen3-30B-A3B-Instruct-2507 skip softmax attention tests:

nvbug 5910749: TIMEOUT in test_skip_softmax_attention[target_sparsity_0.9] on B200
nvbug 5995486: Memory leak in test_skip_softmax_attention_2gpus causing OOM in CI

Changes:

Threshold update: sync thr_prefill/thr_decode to blog-calibrated values for Qwen3-30B-A3B-Instruct-2507 (sparsity 0.5: 587.18/16.52; sparsity 0.9: 18471.56/852.20)
FP8 KV cache: add fp8kv=[False, True] parametrize dimension; thread KvCacheConfig(dtype="fp8" if fp8kv else "auto") into both test methods
2GPU → 4GPU: rename test_skip_softmax_attention_2gpus → test_skip_softmax_attention_4gpus, switch to TP=4/EP=4, add @skip_pre_hopper, move entirely to post-merge CI
CI YAML: remove 2-GPU pre-merge entries from l0_dgx_h100.yml; l0_b200.yml reduced to one entry (target_sparsity_0.9-fp8kv=True); l0_dgx_h200.yml and l0_dgx_b200.yml gain 4 post-merge 4-GPU entries each
Reference accuracy: update longbench_v1.yaml with values measured on B200 using updated thresholds; add 3 new FP8 KV cache reference entries

Test plan

All 6 accuracy combinations ({0.0, 0.5, 0.9} × {fp8kv=False, fp8kv=True}) run on B200 with TRTLLM_ACCURACY_NO_REFERENCE=1 to capture new reference values
Verified no remaining references to test_skip_softmax_attention_2gpus in YAML/test files
CI YAML entries validated against parametrize_with_ids output format (fp8kv=True/False)

Summary by CodeRabbit

Release Notes

New Features
- Added FP8 KV-cache quantization support with benchmark accuracy measurements for the Qwen3-30B model across multiple sparsity configurations.
Tests
- Enhanced test coverage with additional quantization variants and expanded multi-GPU testing configurations. Updated and validated accuracy baselines.

… softmax attention CI tests - Update threshold values (thr_prefill/thr_decode) to match blog-calibrated values for Qwen3-30B-A3B-Instruct-2507 - Add fp8kv parametrize dimension to both test methods; KvCacheConfig now accepts dtype="fp8" when fp8kv=True - Rename test_skip_softmax_attention_2gpus -> test_skip_softmax_attention_4gpus, switch to TP=4/EP=4, add @skip_pre_hopper, move to post-merge CI only - Remove 2-GPU pre-merge entries from l0_dgx_h100.yml - l0_b200.yml: replace 2 old entries with single target_sparsity_0.9-fp8kv=True - l0_dgx_h200.yml and l0_dgx_b200.yml: add 4 post-merge 4-GPU entries each (sparsity 0.5/0.9 x fp8kv False/True) - Update longbench_v1.yaml reference accuracy values measured on B200 with updated thresholds; add 3 new FP8 KV cache entries Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli · 2026-04-07T03:11:20Z

/bot run

tensorrt-cicd · 2026-04-07T03:17:55Z

PR_Github #42044 [ run ] triggered by Bot. Commit: d0c9547 Link to invocation

coderabbitai · 2026-04-07T03:19:40Z

📝 Walkthrough

Walkthrough

The PR updates accuracy reference values for the Qwen3-30B model under BF16 KV-cache configuration and introduces new FP8 KV-cache reference entries with corresponding accuracy values. Test methods are parametrized with an fp8kv boolean flag, threshold values are adjusted, and a multi-GPU test configuration is renamed from 2GPUs to 4GPUs. Test lists across multiple hardware platforms are updated to reflect the new test parametrization.

Changes

Cohort / File(s)	Summary
Reference Accuracy Updates `tests/integration/defs/accuracy/references/longbench_v1.yaml`	Updated BF16 KV-cache accuracy values for Qwen3-30B-A3B-Instruct-2507 across target_sparsity variants (0.0, 0.5, 0.9). Added new FP8 KV-cache reference entries with corresponding accuracy values for the same target_sparsity values.
Test Method Updates `tests/integration/defs/accuracy/test_llm_api_pytorch.py`	Added `fp8kv` boolean parameter to `test_skip_softmax_attention` method with conditional KV-cache dtype configuration (fp8 when enabled, auto otherwise). Renamed `test_skip_softmax_attention_2gpus` to `test_skip_softmax_attention_4gpus` with updated tensor/expert parallelism from 2 to 4. Updated threshold values for all target_sparsity cases.
B200 Test List `tests/integration/test_lists/test-db/l0_b200.yml`	Removed target_sparsity_0.5 and 0.9 test entries; added target_sparsity_0.9 with fp8kv=True entry.
DGX H100 Test List `tests/integration/test_lists/test-db/l0_dgx_h100.yml`	Removed two multi-GPU test entries (test_skip_softmax_attention_2gpus) for target_sparsity_0.5 and 0.9.
DGX B200 & H200 Test Lists `tests/integration/test_lists/test-db/l0_dgx_b200.yml`, `tests/integration/test_lists/test-db/l0_dgx_h200.yml`	Added four test entries each for test_skip_softmax_attention_4gpus covering target_sparsity 0.5/0.9 with fp8kv False/True combinations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title references two NVBugs IDs and clearly describes the main change: fixing Qwen3 skip softmax attention CI tests. It directly matches the PR's objective of resolving two P0 CI bugs.
Description check	✅ Passed	The PR description includes all key sections: a comprehensive summary of changes, specific threshold values, test plan with verification steps, and clear resolution of two P0 CI issues. All required template sections are adequately addressed.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Around line 4948-4950: The function signature for
test_skip_softmax_attention_4gpus exceeds yapf's expected wrapping and must be
reformatted to follow the 80-char layout; run yapf (or pre-commit) to rewrap the
parameter list of test_skip_softmax_attention_4gpus so parameters
(target_sparsity, thr_prefill, thr_decode, fp8kv) are each on their own or
appropriately wrapped lines per yapf rules and commit the formatted change to
keep release checks green.
- Around line 4935-4948: The test test_skip_softmax_attention_4gpus must locally
skip when fewer than 4 visible GPUs are present; add a guard at the start of the
function (or a pytest.skipif decorator) that checks torch.cuda.device_count() <
4 and calls pytest.skip("requires 4 GPUs") so the test self-skips on smaller
hosts; ensure torch and pytest are imported if not already and place the check
inside test_skip_softmax_attention_4gpus before any code that assumes
tensor_parallel_size=4 / moe_expert_parallel_size=4.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 600d8841-f924-4a80-8b85-7ace1a449cb8

📥 Commits

Reviewing files that changed from the base of the PR and between ac3dbf1 and d0c9547.

📒 Files selected for processing (6)

tests/integration/defs/accuracy/references/longbench_v1.yaml
tests/integration/defs/accuracy/test_llm_api_pytorch.py
tests/integration/test_lists/test-db/l0_b200.yml
tests/integration/test_lists/test-db/l0_dgx_b200.yml
tests/integration/test_lists/test-db/l0_dgx_h100.yml
tests/integration/test_lists/test-db/l0_dgx_h200.yml

💤 Files with no reviewable changes (1)

tests/integration/test_lists/test-db/l0_dgx_h100.yml

…: add skip_less_device(4) and fix yapf formatting - Add @pytest.mark.skip_less_device(4) to test_skip_softmax_attention_4gpus so direct test selection skips gracefully on <4-GPU hosts - Fix yapf parameter wrapping on method signature Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli · 2026-04-07T03:30:13Z

/bot run

… comments from test-db YAMLs Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

tensorrt-cicd · 2026-04-07T03:32:02Z

PR_Github #42044 [ run ] completed with state FAILURE. Commit: d0c9547
/LLM/main/L0_MergeRequest_PR pipeline #32889 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-07T03:33:18Z

/bot kill

bobboli · 2026-04-07T03:35:30Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-07T03:36:46Z

PR_Github #42048 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

tensorrt-cicd · 2026-04-07T03:41:00Z

PR_Github #42049 [ kill ] triggered by Bot. Commit: 077b91e Link to invocation

tensorrt-cicd · 2026-04-07T03:41:01Z

PR_Github #42048 [ run ] completed with state ABORTED. Commit: 077b91e

Link to invocation

tensorrt-cicd · 2026-04-07T03:41:32Z

PR_Github #42049 [ kill ] completed with state SUCCESS. Commit: 077b91e
Successfully killed previous jobs for commit 077b91e

Link to invocation

tensorrt-cicd · 2026-04-07T03:42:16Z

PR_Github #42050 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

tensorrt-cicd · 2026-04-07T10:56:41Z

PR_Github #42050 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #32893 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-07T11:10:42Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-07T11:18:33Z

PR_Github #42138 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

tensorrt-cicd · 2026-04-07T15:49:15Z

PR_Github #42138 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #32972 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-07T16:41:31Z

/bot run --disable-fail-fast --reuse-test

tensorrt-cicd · 2026-04-07T16:47:55Z

PR_Github #42164 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

tensorrt-cicd · 2026-04-07T22:49:26Z

PR_Github #42164 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #32992 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-08T04:10:22Z

/bot run --disable-fail-fast --reuse-test --post-merge

tensorrt-cicd · 2026-04-08T04:16:30Z

PR_Github #42263 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

tensorrt-cicd · 2026-04-08T13:01:01Z

PR_Github #42263 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #33066 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-08T13:45:20Z

/bot run --reuse-test

tensorrt-cicd · 2026-04-08T13:51:04Z

PR_Github #42345 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

tensorrt-cicd · 2026-04-08T19:22:21Z

PR_Github #42345 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #33131 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-09T03:12:49Z

/bot run --reuse-test

tensorrt-cicd · 2026-04-09T03:56:19Z

PR_Github #42446 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

tensorrt-cicd · 2026-04-09T06:20:50Z

PR_Github #42446 [ run ] completed with state FAILURE. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #33212 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-09T06:51:47Z

/bot run --reuse-test

tensorrt-cicd · 2026-04-09T06:57:14Z

PR_Github #42491 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

tensorrt-cicd · 2026-04-09T12:15:45Z

PR_Github #42491 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #33239 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-10T04:23:03Z

/bot run --reuse-test

tensorrt-cicd · 2026-04-10T05:47:29Z

PR_Github #42658 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

tensorrt-cicd · 2026-04-10T11:36:12Z

PR_Github #42658 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #33368 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-10T15:42:13Z

/bot run --reuse-test

tensorrt-cicd · 2026-04-10T15:48:51Z

PR_Github #42709 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

tensorrt-cicd · 2026-04-10T23:05:52Z

PR_Github #42709 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #33403 completed with status: 'SUCCESS'

CI Report

Link to invocation

bobboli requested a review from a team as a code owner April 7, 2026 03:11

github-actions bot assigned bobboli Apr 7, 2026

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch.py

Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch.py Outdated

[https://nvbugs/5910749][https://nvbugs/5995486][test] Remove section…

077b91e

… comments from test-db YAMLs Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli enabled auto-merge (squash) April 8, 2026 13:45

StanleySun639 approved these changes Apr 10, 2026

View reviewed changes

bobboli merged commit 5e1a98e into NVIDIA:main Apr 10, 2026
4 of 5 checks passed

Conversation

bobboli commented Apr 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

bobboli commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

coderabbitai bot commented Apr 7, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bobboli commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

bobboli commented Apr 7, 2026

Uh oh!

bobboli commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

bobboli commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

bobboli commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

bobboli commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

bobboli commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

bobboli commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

bobboli commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

bobboli commented Apr 10, 2026

bobboli commented Apr 7, 2026 •

edited by coderabbitai bot

Loading