[https://nvbugs/6094108][fix] Fix Qwen3-30B-A3B NVFP4 tep4 CUTLASS MoE test OOM on B300 by tensorrt-cicd · Pull Request #13349 · NVIDIA/TensorRT-LLM

tensorrt-cicd · 2026-04-22T20:54:33Z

Summary

Fix for NVBugs 6094108: [TensorRT-LLM][main]: TestQwen3_30B_A3B::test_nvfp4[tep4_latency_moe_cutlass-torch_compile=False] is failure
Root cause: The Qwen3-30B-A3B nvfp4 test with tp=4/ep=4 CUTLASS MoE ran out of GPU memory on GB300, causing a segfault during MPI finalize. The default KV cache allocation consumed too much GPU memory, leaving insufficient space for model weights and activation buffers during execution.
Fix: Added an explicit KvCacheConfig(free_gpu_memory_fraction=0.8) to cap KV cache memory usage at 80% of free GPU memory, resolving the OOM without reducing max_batch_size from 32. A previous repair attempt had halved max_batch_size to 16 alongside the memory fraction fix, which inadvertently changed scheduler batching behavior and degraded GSM8K accuracy from 85.52 to 75.25; this fix preserves the original batch size to maintain accuracy.
Automated fix generated by repair-bot

Test plan

Verify fix on the same GPU type as the original failure
Check for regressions in related tests

Links

Bug: https://nvbugs/6094108

Summary by CodeRabbit

Tests
- Updated KV cache configuration in LLM accuracy tests to optimize memory utilization during testing.

…OM on B300 The test_nvfp4[tep4_latency_moe_cutlass] variant OOMs on B300 GPUs with the default KV cache memory fraction of 0.9, because the CUTLASS MoE NVFP4 backend with EP4 + CUDA graphs requires significant GPU memory for MoE workspaces, NCCL buffers, and CUDA graph captures, leaving insufficient headroom. Add KvCacheConfig(free_gpu_memory_fraction=0.8) to reduce KV cache allocation and prevent OOM. This matches the pattern used by other multi-GPU MoE tests in the same file. Verified: MMLU accuracy 79.459 (threshold 77.713) and GSM8K accuracy 85.52 (threshold 80.227) both pass on B300 4-GPU configuration. Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>

coderabbitai · 2026-04-22T20:57:28Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1e9bb79a-0178-4869-acc4-7d5e473d83c2

📥 Commits

Reviewing files that changed from the base of the PR and between 7a8bd87 and d1c9c7b.

📒 Files selected for processing (1)

tests/integration/defs/accuracy/test_llm_api_pytorch.py

📝 Walkthrough

Walkthrough

An NVFP4 accuracy test is updated to explicitly configure KV cache settings on LLM construction, replacing the prior default behavior with a KvCacheConfig specifying free_gpu_memory_fraction=0.8.

Changes

Cohort / File(s)	Summary
Test KV Cache Configuration `tests/integration/defs/accuracy/test_llm_api_pytorch.py`	Updated LLM initialization to explicitly pass `KvCacheConfig(free_gpu_memory_fraction=0.8)` instead of relying on default KV cache behavior.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the specific issue (Qwen3-30B-A3B NVFP4 test OOM on B300), includes a proper NVBugs reference, and uses the correct '[fix]' type tag.
Description check	✅ Passed	The PR description follows the template structure with clear Summary, Test plan, and Links sections. It explains the root cause, the fix implementation, and why a previous approach was avoided.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

rosenrodt · 2026-05-08T07:22:18Z

/bot run

tensorrt-cicd · 2026-05-08T07:27:49Z

PR_Github #47345 [ run ] triggered by Bot. Commit: d1c9c7b Link to invocation

rosenrodt · 2026-05-08T07:34:22Z

/bot kill

rosenrodt · 2026-05-08T07:35:38Z

/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Post-Merge-1,DGX_B200-4_GPUs-PyTorch-Post-Merge-2"

tensorrt-cicd · 2026-05-08T07:40:31Z

PR_Github #47348 [ kill ] triggered by Bot. Commit: d1c9c7b Link to invocation

tensorrt-cicd · 2026-05-08T07:41:00Z

PR_Github #47349 [ run ] triggered by Bot. Commit: d1c9c7b Link to invocation

tensorrt-cicd · 2026-05-08T07:41:15Z

PR_Github #47348 [ kill ] completed with state ABORTED. Commit: d1c9c7b

Link to invocation

tensorrt-cicd · 2026-05-08T11:53:52Z

PR_Github #47349 [ run ] completed with state SUCCESS. Commit: d1c9c7b
/LLM/main/L0_MergeRequest_PR pipeline #37285 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

rosenrodt · 2026-05-10T06:19:14Z

@StanleySun639 The CI stages DGX_B200-4_GPUs-PyTorch-Post-Merge-1 & DGX_B200-4_GPUs-PyTorch-Post-Merge-2 passed. As this PR affects only the said stages, I think we can skip the rest and merge.

I do not have the permission to merge so I will leave the judgement to you. Thanks!

rosenrodt · 2026-05-18T06:43:31Z

/bot run

tensorrt-cicd · 2026-05-18T06:50:11Z

PR_Github #48847 [ run ] triggered by Bot. Commit: d1c9c7b Link to invocation

tensorrt-cicd · 2026-05-18T08:29:48Z

PR_Github #48847 [ run ] completed with state SUCCESS. Commit: d1c9c7b
/LLM/main/L0_MergeRequest_PR pipeline #38601 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

rosenrodt · 2026-05-18T09:05:49Z

/bot run

tensorrt-cicd · 2026-05-18T09:11:16Z

PR_Github #48881 [ run ] triggered by Bot. Commit: d1c9c7b Link to invocation

tensorrt-cicd · 2026-05-18T12:39:31Z

PR_Github #48881 [ run ] completed with state SUCCESS. Commit: d1c9c7b
/LLM/main/L0_MergeRequest_PR pipeline #38631 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

rosenrodt · 2026-05-19T04:48:11Z

/bot run

tensorrt-cicd · 2026-05-19T04:54:26Z

PR_Github #49088 [ run ] triggered by Bot. Commit: d1c9c7b Link to invocation

tensorrt-cicd · 2026-05-19T08:05:41Z

PR_Github #49088 [ run ] completed with state SUCCESS. Commit: d1c9c7b
/LLM/main/L0_MergeRequest_PR pipeline #38806 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

rosenrodt · 2026-05-19T14:30:33Z

/bot run

tensorrt-cicd · 2026-05-19T14:37:48Z

PR_Github #49210 [ run ] triggered by Bot. Commit: d1c9c7b Link to invocation

tensorrt-cicd · 2026-05-19T18:19:19Z

PR_Github #49210 [ run ] completed with state SUCCESS. Commit: d1c9c7b
/LLM/main/L0_MergeRequest_PR pipeline #38884 completed with status: 'SUCCESS'

CI Report

Link to invocation

tensorrt-cicd requested a review from a team as a code owner April 22, 2026 20:54

github-actions Bot assigned tensorrt-cicd Apr 22, 2026

StanleySun639 approved these changes Apr 27, 2026

View reviewed changes

StanleySun639 merged commit d724c68 into NVIDIA:main May 20, 2026
8 checks passed

Conversation

tensorrt-cicd commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Links

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 22, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

rosenrodt commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

rosenrodt commented May 8, 2026

Uh oh!

rosenrodt commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

rosenrodt commented May 10, 2026

Uh oh!

rosenrodt commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

rosenrodt commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

rosenrodt commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

rosenrodt commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tensorrt-cicd commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading