[https://nvbugs/6086538][fix] suppress misleading skip-softmax FMHA warning in generation by bobboli · Pull Request #13157 · NVIDIA/TensorRT-LLM

bobboli · 2026-04-17T16:52:15Z

Summary

Suppress the FMHA "Consider using numInstsQ = 2" INFO log for skip-softmax generation kernels.

The warning is misleading for this path because skip-softmax generation intentionally uses a 1x1 numInsts configuration, while the existing INFO suggests 2x1 would be preferable.

Validation

Rebuilt build_wheel_targets successfully
Verified from local MiniMax-M2 repro logs that the warning spam is aligned with generation warmup, not context/prefill warmup
Kept the change scoped to generation skip-softmax so context warnings remain intact

Summary by CodeRabbit

Chores
- Performance optimization for generation kernels enabling conditional softmax computation skipping to improve efficiency in specific scenarios.

… generation Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

coderabbitai · 2026-04-17T16:56:15Z

📝 Walkthrough

Walkthrough

A configuration flag isGenerationSkipSoftmax was introduced to conditionally gate an informational tuning check in a FMHA kernel traits header. The check now only executes when generation skip softmax is disabled, effectively bypassing the recommendation for applicable generation kernels.

Changes

Cohort / File(s)	Summary
FMHA Kernel Configuration `cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/trtllmGen_fmha_export/KernelTraits.h`	Introduced `isGenerationSkipSoftmax` flag derived from `options.mSkipsSoftmaxWhenPossible` and context kernel status; guarded informational tuning check to skip recommendation for generation kernels that can skip softmax.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description explains the issue and solution clearly, includes validation steps performed, and identifies the scope of changes. However, it lacks a dedicated 'Test Coverage' section explicitly listing relevant tests.
Title check	✅ Passed	The title directly addresses the main change: suppressing a misleading FMHA skip-softmax warning in generation kernels, which aligns with the code modifications that guard the tuning check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli · 2026-04-19T12:56:21Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-19T13:03:23Z

PR_Github #44187 [ run ] triggered by Bot. Commit: 10effe6 Link to invocation

tensorrt-cicd · 2026-04-19T17:23:07Z

PR_Github #44187 [ run ] completed with state SUCCESS. Commit: 10effe6
/LLM/main/L0_MergeRequest_PR pipeline #34613 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-19T17:54:17Z

/bot run --disable-fail-fast --reuse-test

tensorrt-cicd · 2026-04-19T18:01:21Z

PR_Github #44210 [ run ] triggered by Bot. Commit: 10effe6 Link to invocation

tensorrt-cicd · 2026-04-19T20:34:14Z

PR_Github #44210 [ run ] completed with state SUCCESS. Commit: 10effe6
/LLM/main/L0_MergeRequest_PR pipeline #34634 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-20T03:11:47Z

/bot run --disable-fail-fast --reuse-test

tensorrt-cicd · 2026-04-20T03:17:40Z

PR_Github #44277 [ run ] triggered by Bot. Commit: 10effe6 Link to invocation

tensorrt-cicd · 2026-04-20T03:46:59Z

PR_Github #44277 [ run ] completed with state FAILURE. Commit: 10effe6
/LLM/main/L0_MergeRequest_PR pipeline #34698 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-20T04:54:03Z

/bot run --disable-fail-fast --reuse-test

tensorrt-cicd · 2026-04-20T05:01:02Z

PR_Github #44295 [ run ] triggered by Bot. Commit: 10effe6 Link to invocation

tensorrt-cicd · 2026-04-20T05:12:36Z

PR_Github #44295 [ run ] completed with state FAILURE. Commit: 10effe6
/LLM/main/L0_MergeRequest_PR pipeline #34716 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bobboli · 2026-04-20T08:54:53Z

/bot run --disable-fail-fast --reuse-test

tensorrt-cicd · 2026-04-20T09:00:38Z

PR_Github #44416 [ run ] triggered by Bot. Commit: 10effe6 Link to invocation

tensorrt-cicd · 2026-04-20T10:58:49Z

PR_Github #44416 [ run ] completed with state SUCCESS. Commit: 10effe6
/LLM/main/L0_MergeRequest_PR pipeline #34828 completed with status: 'SUCCESS'

CI Report

Link to invocation

[NVBUG-6086538][fix] suppress misleading skip-softmax FMHA warning in…

e9a7416

… generation Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

github-actions bot assigned bobboli Apr 17, 2026

[NVBUG-6086538][refactor] tighten skip-softmax generation warning guard

6cd4cfb

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

[NVBUG-6086538][refactor] inline single-inst warning guard

10effe6

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli requested a review from Tom-Zheng April 20, 2026 09:17

bobboli changed the title ~~[NVBUG-6086538][fix] suppress misleading skip-softmax FMHA warning in generation~~ [https://nvbugs/6086538][fix] suppress misleading skip-softmax FMHA warning in generation Apr 20, 2026

bobboli enabled auto-merge (squash) April 20, 2026 16:31

Conversation

bobboli commented Apr 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

bobboli commented Apr 19, 2026

Uh oh!

tensorrt-cicd commented Apr 19, 2026

Uh oh!

tensorrt-cicd commented Apr 19, 2026

Uh oh!

bobboli commented Apr 19, 2026

Uh oh!

tensorrt-cicd commented Apr 19, 2026

Uh oh!

tensorrt-cicd commented Apr 19, 2026

Uh oh!

bobboli commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

bobboli commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

bobboli commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bobboli commented Apr 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 17, 2026 •

edited

Loading