Skip to content

[https://nvbugs/5910749][https://nvbugs/5995486][test] Fix Qwen3 skip softmax attention CI tests#12789

Merged
bobboli merged 3 commits intoNVIDIA:mainfrom
bobboli:fix/qwen3-skip-softmax-attention-ci
Apr 10, 2026
Merged

[https://nvbugs/5910749][https://nvbugs/5995486][test] Fix Qwen3 skip softmax attention CI tests#12789
bobboli merged 3 commits intoNVIDIA:mainfrom
bobboli:fix/qwen3-skip-softmax-attention-ci

Conversation

@bobboli
Copy link
Copy Markdown
Collaborator

@bobboli bobboli commented Apr 7, 2026

Summary

Fixes two P0 CI bugs for the Qwen3-30B-A3B-Instruct-2507 skip softmax attention tests:

  • nvbug 5910749: TIMEOUT in test_skip_softmax_attention[target_sparsity_0.9] on B200
  • nvbug 5995486: Memory leak in test_skip_softmax_attention_2gpus causing OOM in CI

Changes:

  • Threshold update: sync thr_prefill/thr_decode to blog-calibrated values for Qwen3-30B-A3B-Instruct-2507 (sparsity 0.5: 587.18/16.52; sparsity 0.9: 18471.56/852.20)
  • FP8 KV cache: add fp8kv=[False, True] parametrize dimension; thread KvCacheConfig(dtype="fp8" if fp8kv else "auto") into both test methods
  • 2GPU → 4GPU: rename test_skip_softmax_attention_2gpustest_skip_softmax_attention_4gpus, switch to TP=4/EP=4, add @skip_pre_hopper, move entirely to post-merge CI
  • CI YAML: remove 2-GPU pre-merge entries from l0_dgx_h100.yml; l0_b200.yml reduced to one entry (target_sparsity_0.9-fp8kv=True); l0_dgx_h200.yml and l0_dgx_b200.yml gain 4 post-merge 4-GPU entries each
  • Reference accuracy: update longbench_v1.yaml with values measured on B200 using updated thresholds; add 3 new FP8 KV cache reference entries

Test plan

  • All 6 accuracy combinations ({0.0, 0.5, 0.9} × {fp8kv=False, fp8kv=True}) run on B200 with TRTLLM_ACCURACY_NO_REFERENCE=1 to capture new reference values
  • Verified no remaining references to test_skip_softmax_attention_2gpus in YAML/test files
  • CI YAML entries validated against parametrize_with_ids output format (fp8kv=True/False)

Summary by CodeRabbit

Release Notes

  • New Features

    • Added FP8 KV-cache quantization support with benchmark accuracy measurements for the Qwen3-30B model across multiple sparsity configurations.
  • Tests

    • Enhanced test coverage with additional quantization variants and expanded multi-GPU testing configurations. Updated and validated accuracy baselines.

… softmax attention CI tests

- Update threshold values (thr_prefill/thr_decode) to match blog-calibrated
  values for Qwen3-30B-A3B-Instruct-2507
- Add fp8kv parametrize dimension to both test methods; KvCacheConfig now
  accepts dtype="fp8" when fp8kv=True
- Rename test_skip_softmax_attention_2gpus -> test_skip_softmax_attention_4gpus,
  switch to TP=4/EP=4, add @skip_pre_hopper, move to post-merge CI only
- Remove 2-GPU pre-merge entries from l0_dgx_h100.yml
- l0_b200.yml: replace 2 old entries with single target_sparsity_0.9-fp8kv=True
- l0_dgx_h200.yml and l0_dgx_b200.yml: add 4 post-merge 4-GPU entries each
  (sparsity 0.5/0.9 x fp8kv False/True)
- Update longbench_v1.yaml reference accuracy values measured on B200 with
  updated thresholds; add 3 new FP8 KV cache entries

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
@bobboli bobboli requested a review from a team as a code owner April 7, 2026 03:11
@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 7, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42044 [ run ] triggered by Bot. Commit: d0c9547 Link to invocation

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

The PR updates accuracy reference values for the Qwen3-30B model under BF16 KV-cache configuration and introduces new FP8 KV-cache reference entries with corresponding accuracy values. Test methods are parametrized with an fp8kv boolean flag, threshold values are adjusted, and a multi-GPU test configuration is renamed from 2GPUs to 4GPUs. Test lists across multiple hardware platforms are updated to reflect the new test parametrization.

Changes

Cohort / File(s) Summary
Reference Accuracy Updates
tests/integration/defs/accuracy/references/longbench_v1.yaml
Updated BF16 KV-cache accuracy values for Qwen3-30B-A3B-Instruct-2507 across target_sparsity variants (0.0, 0.5, 0.9). Added new FP8 KV-cache reference entries with corresponding accuracy values for the same target_sparsity values.
Test Method Updates
tests/integration/defs/accuracy/test_llm_api_pytorch.py
Added fp8kv boolean parameter to test_skip_softmax_attention method with conditional KV-cache dtype configuration (fp8 when enabled, auto otherwise). Renamed test_skip_softmax_attention_2gpus to test_skip_softmax_attention_4gpus with updated tensor/expert parallelism from 2 to 4. Updated threshold values for all target_sparsity cases.
B200 Test List
tests/integration/test_lists/test-db/l0_b200.yml
Removed target_sparsity_0.5 and 0.9 test entries; added target_sparsity_0.9 with fp8kv=True entry.
DGX H100 Test List
tests/integration/test_lists/test-db/l0_dgx_h100.yml
Removed two multi-GPU test entries (test_skip_softmax_attention_2gpus) for target_sparsity_0.5 and 0.9.
DGX B200 & H200 Test Lists
tests/integration/test_lists/test-db/l0_dgx_b200.yml, tests/integration/test_lists/test-db/l0_dgx_h200.yml
Added four test entries each for test_skip_softmax_attention_4gpus covering target_sparsity 0.5/0.9 with fp8kv False/True combinations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title references two NVBugs IDs and clearly describes the main change: fixing Qwen3 skip softmax attention CI tests. It directly matches the PR's objective of resolving two P0 CI bugs.
Description check ✅ Passed The PR description includes all key sections: a comprehensive summary of changes, specific threshold values, test plan with verification steps, and clear resolution of two P0 CI issues. All required template sections are adequately addressed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Around line 4948-4950: The function signature for
test_skip_softmax_attention_4gpus exceeds yapf's expected wrapping and must be
reformatted to follow the 80-char layout; run yapf (or pre-commit) to rewrap the
parameter list of test_skip_softmax_attention_4gpus so parameters
(target_sparsity, thr_prefill, thr_decode, fp8kv) are each on their own or
appropriately wrapped lines per yapf rules and commit the formatted change to
keep release checks green.
- Around line 4935-4948: The test test_skip_softmax_attention_4gpus must locally
skip when fewer than 4 visible GPUs are present; add a guard at the start of the
function (or a pytest.skipif decorator) that checks torch.cuda.device_count() <
4 and calls pytest.skip("requires 4 GPUs") so the test self-skips on smaller
hosts; ensure torch and pytest are imported if not already and place the check
inside test_skip_softmax_attention_4gpus before any code that assumes
tensor_parallel_size=4 / moe_expert_parallel_size=4.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 600d8841-f924-4a80-8b85-7ace1a449cb8

📥 Commits

Reviewing files that changed from the base of the PR and between ac3dbf1 and d0c9547.

📒 Files selected for processing (6)
  • tests/integration/defs/accuracy/references/longbench_v1.yaml
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
  • tests/integration/test_lists/test-db/l0_b200.yml
  • tests/integration/test_lists/test-db/l0_dgx_b200.yml
  • tests/integration/test_lists/test-db/l0_dgx_h100.yml
  • tests/integration/test_lists/test-db/l0_dgx_h200.yml
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/test-db/l0_dgx_h100.yml

Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch.py
Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch.py Outdated
…: add skip_less_device(4) and fix yapf formatting

- Add @pytest.mark.skip_less_device(4) to test_skip_softmax_attention_4gpus
  so direct test selection skips gracefully on <4-GPU hosts
- Fix yapf parameter wrapping on method signature

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 7, 2026

/bot run

… comments from test-db YAMLs

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42044 [ run ] completed with state FAILURE. Commit: d0c9547
/LLM/main/L0_MergeRequest_PR pipeline #32889 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 7, 2026

/bot kill

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 7, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42048 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42049 [ kill ] triggered by Bot. Commit: 077b91e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42048 [ run ] completed with state ABORTED. Commit: 077b91e

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42049 [ kill ] completed with state SUCCESS. Commit: 077b91e
Successfully killed previous jobs for commit 077b91e

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42050 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42050 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #32893 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 7, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42138 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42138 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #32972 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 7, 2026

/bot run --disable-fail-fast --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42164 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42164 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #32992 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 8, 2026

/bot run --disable-fail-fast --reuse-test --post-merge

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42263 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42263 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #33066 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli bobboli enabled auto-merge (squash) April 8, 2026 13:45
@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 8, 2026

/bot run --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42345 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42345 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #33131 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 9, 2026

/bot run --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42446 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42446 [ run ] completed with state FAILURE. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #33212 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 9, 2026

/bot run --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42491 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42491 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #33239 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 10, 2026

/bot run --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42658 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42658 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #33368 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 10, 2026

/bot run --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42709 [ run ] triggered by Bot. Commit: 077b91e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42709 [ run ] completed with state SUCCESS. Commit: 077b91e
/LLM/main/L0_MergeRequest_PR pipeline #33403 completed with status: 'SUCCESS'

CI Report

Link to invocation

@bobboli bobboli merged commit 5e1a98e into NVIDIA:main Apr 10, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants