[https://nvbugs/6094066][fix] Skip Qwen3 skip-softmax on low-memory GPUs by xxi-nv · Pull Request #13581 · NVIDIA/TensorRT-LLM

xxi-nv · 2026-04-28T23:53:44Z

Description

Skip TestQwen3_30B_A3B_Instruct_2507::test_skip_softmax_attention on pre-Blackwell GPUs.
Add a skip_less_device_memory(140000) guard so low-memory GPUs do not try this Qwen3 30B skip-softmax case.
Remove the now-stale waives.txt entry for the same target test so B200 can run it instead of being waived.
Use a 140GB requirement based on BF16 model weights plus runtime overhead from CUDA/NCCL/cuBLAS, MoE activation peaks, MoE GEMM profiling workspace, warmup, and KV overlap.

Validation

git diff --check origin/main...HEAD
git commit -s --amend --no-edit pre-commit hooks passed after the waives.txt cleanup
Computelab B200: target reached runtime/model execution and observed about 60GB GPU memory before hitting environment issues unrelated to this skip marker (flashinfer/NVRTC cuda.h include setup); no OOM observed.
OCI GB200 batch, qos=short, 2h: source build/import succeeded on GB200, pytest collection selected the target test and did not skip it with the 140GB guard. The target then failed at model loading because OCI model shards for Qwen3/Qwen3-30B-A3B-Instruct-2507 are Git LFS pointer files, not real safetensors; no OOM observed.

NVBug: https://nvbugs/6094066

coderabbitai · 2026-04-28T23:56:11Z

📝 Walkthrough

Walkthrough

This change updates a test's hardware targeting configuration, replacing Hopper-specific hardware gating with Blackwell hardware gating and adding a device-memory threshold constraint for lower-memory devices.

Changes

Cohort / File(s)	Summary
Hardware Gating Configuration `tests/integration/defs/accuracy/test_llm_api_pytorch.py`	Updated `test_skip_softmax_attention` test to use Blackwell pre-skip decorator instead of Hopper, and added device-memory threshold (140000) to skip on lower-memory devices.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately and specifically describes the main change: adding skip markers for a Qwen3 test on low-memory GPUs, following the required format with NVBugs ticket and [fix] type.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	PR description provides clear explanation of changes, rationale for 140GB threshold, and comprehensive validation across multiple environments.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: xxi <xxi@nvidia.com>

xxi-nv · 2026-04-29T01:29:21Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-29T01:35:51Z

PR_Github #46015 [ run ] triggered by Bot. Commit: af09721 Link to invocation

tensorrt-cicd · 2026-04-29T07:51:56Z

PR_Github #46015 [ run ] completed with state SUCCESS. Commit: af09721
/LLM/main/L0_MergeRequest_PR pipeline #36165 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

xxi-nv · 2026-04-29T13:42:23Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-29T13:48:41Z

PR_Github #46141 [ run ] triggered by Bot. Commit: af09721 Link to invocation

tensorrt-cicd · 2026-04-29T18:32:58Z

PR_Github #46141 [ run ] completed with state SUCCESS. Commit: af09721
/LLM/main/L0_MergeRequest_PR pipeline #36267 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

xxi-nv · 2026-04-29T23:58:06Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-30T00:04:40Z

PR_Github #46228 [ run ] triggered by Bot. Commit: af09721 Link to invocation

tensorrt-cicd · 2026-04-30T05:03:03Z

PR_Github #46228 [ run ] completed with state SUCCESS. Commit: af09721
/LLM/main/L0_MergeRequest_PR pipeline #36340 completed with status: 'SUCCESS'

CI Report

Link to invocation

…PUs (NVIDIA#13581) Signed-off-by: xxi <xxi@nvidia.com>

xxi-nv requested a review from a team as a code owner April 28, 2026 23:53

github-actions Bot assigned xxi-nv Apr 28, 2026

[https://nvbugs/6094066][fix] Skip Qwen3 skip-softmax on low-memory GPUs

af09721

Signed-off-by: xxi <xxi@nvidia.com>

xxi-nv force-pushed the dev-xxi-bug6094066-skip-softmax branch from b4174a5 to af09721 Compare April 29, 2026 00:55

xxi-nv enabled auto-merge (squash) April 29, 2026 02:14

xxi-nv requested a review from xinhe-nv April 29, 2026 02:14

xinhe-nv approved these changes Apr 30, 2026

View reviewed changes

xxi-nv merged commit 468b34d into NVIDIA:main Apr 30, 2026
9 checks passed

xxi-nv deleted the dev-xxi-bug6094066-skip-softmax branch April 30, 2026 09:19

evezhier pushed a commit to evezhier/TensorRT-LLM that referenced this pull request May 4, 2026

[https://nvbugs/6094066][fix] Skip Qwen3 skip-softmax on low-memory G…

bd5822d

…PUs (NVIDIA#13581) Signed-off-by: xxi <xxi@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6094066][fix] Skip Qwen3 skip-softmax on low-memory GPUs#13581

[https://nvbugs/6094066][fix] Skip Qwen3 skip-softmax on low-memory GPUs#13581
xxi-nv merged 1 commit intoNVIDIA:mainfrom
xxi-nv:dev-xxi-bug6094066-skip-softmax

xxi-nv commented Apr 28, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

xxi-nv commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

xxi-nv commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

xxi-nv commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xxi-nv commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Validation

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

xxi-nv commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

xxi-nv commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

xxi-nv commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xxi-nv commented Apr 28, 2026 •

edited

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading