Skip to content

[https://nvbugs/6094066][fix] Skip Qwen3 skip-softmax on low-memory GPUs#13581

Merged
xxi-nv merged 1 commit intoNVIDIA:mainfrom
xxi-nv:dev-xxi-bug6094066-skip-softmax
Apr 30, 2026
Merged

[https://nvbugs/6094066][fix] Skip Qwen3 skip-softmax on low-memory GPUs#13581
xxi-nv merged 1 commit intoNVIDIA:mainfrom
xxi-nv:dev-xxi-bug6094066-skip-softmax

Conversation

@xxi-nv
Copy link
Copy Markdown
Collaborator

@xxi-nv xxi-nv commented Apr 28, 2026

Description

  • Skip TestQwen3_30B_A3B_Instruct_2507::test_skip_softmax_attention on pre-Blackwell GPUs.
  • Add a skip_less_device_memory(140000) guard so low-memory GPUs do not try this Qwen3 30B skip-softmax case.
  • Remove the now-stale waives.txt entry for the same target test so B200 can run it instead of being waived.
  • Use a 140GB requirement based on BF16 model weights plus runtime overhead from CUDA/NCCL/cuBLAS, MoE activation peaks, MoE GEMM profiling workspace, warmup, and KV overlap.

Validation

  • git diff --check origin/main...HEAD
  • git commit -s --amend --no-edit pre-commit hooks passed after the waives.txt cleanup
  • Computelab B200: target reached runtime/model execution and observed about 60GB GPU memory before hitting environment issues unrelated to this skip marker (flashinfer/NVRTC cuda.h include setup); no OOM observed.
  • OCI GB200 batch, qos=short, 2h: source build/import succeeded on GB200, pytest collection selected the target test and did not skip it with the 140GB guard. The target then failed at model loading because OCI model shards for Qwen3/Qwen3-30B-A3B-Instruct-2507 are Git LFS pointer files, not real safetensors; no OOM observed.

NVBug: https://nvbugs/6094066

@xxi-nv xxi-nv requested a review from a team as a code owner April 28, 2026 23:53
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

📝 Walkthrough

Walkthrough

This change updates a test's hardware targeting configuration, replacing Hopper-specific hardware gating with Blackwell hardware gating and adding a device-memory threshold constraint for lower-memory devices.

Changes

Cohort / File(s) Summary
Hardware Gating Configuration
tests/integration/defs/accuracy/test_llm_api_pytorch.py
Updated test_skip_softmax_attention test to use Blackwell pre-skip decorator instead of Hopper, and added device-memory threshold (140000) to skip on lower-memory devices.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately and specifically describes the main change: adding skip markers for a Qwen3 test on low-memory GPUs, following the required format with NVBugs ticket and [fix] type.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR description provides clear explanation of changes, rationale for 140GB threshold, and comprehensive validation across multiple environments.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@xxi-nv xxi-nv force-pushed the dev-xxi-bug6094066-skip-softmax branch from b4174a5 to af09721 Compare April 29, 2026 00:55
@xxi-nv
Copy link
Copy Markdown
Collaborator Author

xxi-nv commented Apr 29, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46015 [ run ] triggered by Bot. Commit: af09721 Link to invocation

@xxi-nv xxi-nv enabled auto-merge (squash) April 29, 2026 02:14
@xxi-nv xxi-nv requested a review from xinhe-nv April 29, 2026 02:14
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46015 [ run ] completed with state SUCCESS. Commit: af09721
/LLM/main/L0_MergeRequest_PR pipeline #36165 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@xxi-nv
Copy link
Copy Markdown
Collaborator Author

xxi-nv commented Apr 29, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46141 [ run ] triggered by Bot. Commit: af09721 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46141 [ run ] completed with state SUCCESS. Commit: af09721
/LLM/main/L0_MergeRequest_PR pipeline #36267 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@xxi-nv
Copy link
Copy Markdown
Collaborator Author

xxi-nv commented Apr 29, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46228 [ run ] triggered by Bot. Commit: af09721 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46228 [ run ] completed with state SUCCESS. Commit: af09721
/LLM/main/L0_MergeRequest_PR pipeline #36340 completed with status: 'SUCCESS'

CI Report

Link to invocation

@xxi-nv xxi-nv merged commit 468b34d into NVIDIA:main Apr 30, 2026
9 checks passed
@xxi-nv xxi-nv deleted the dev-xxi-bug6094066-skip-softmax branch April 30, 2026 09:19
evezhier pushed a commit to evezhier/TensorRT-LLM that referenced this pull request May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants