Skip to content

[https://nvbugs/6086538][fix] suppress misleading skip-softmax FMHA warning in generation#13157

Open
bobboli wants to merge 3 commits intoNVIDIA:mainfrom
bobboli:bo/6086538-suppress-skip-softmax-generation-warning
Open

[https://nvbugs/6086538][fix] suppress misleading skip-softmax FMHA warning in generation#13157
bobboli wants to merge 3 commits intoNVIDIA:mainfrom
bobboli:bo/6086538-suppress-skip-softmax-generation-warning

Conversation

@bobboli
Copy link
Copy Markdown
Collaborator

@bobboli bobboli commented Apr 17, 2026

Summary

Suppress the FMHA "Consider using numInstsQ = 2" INFO log for skip-softmax generation kernels.

The warning is misleading for this path because skip-softmax generation intentionally uses a 1x1 numInsts configuration, while the existing INFO suggests 2x1 would be preferable.

Validation

  • Rebuilt build_wheel_targets successfully
  • Verified from local MiniMax-M2 repro logs that the warning spam is aligned with generation warmup, not context/prefill warmup
  • Kept the change scoped to generation skip-softmax so context warnings remain intact

Summary by CodeRabbit

  • Chores
    • Performance optimization for generation kernels enabling conditional softmax computation skipping to improve efficiency in specific scenarios.

… generation

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

A configuration flag isGenerationSkipSoftmax was introduced to conditionally gate an informational tuning check in a FMHA kernel traits header. The check now only executes when generation skip softmax is disabled, effectively bypassing the recommendation for applicable generation kernels.

Changes

Cohort / File(s) Summary
FMHA Kernel Configuration
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/trtllmGen_fmha_export/KernelTraits.h
Introduced isGenerationSkipSoftmax flag derived from options.mSkipsSoftmaxWhenPossible and context kernel status; guarded informational tuning check to skip recommendation for generation kernels that can skip softmax.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The PR description explains the issue and solution clearly, includes validation steps performed, and identifies the scope of changes. However, it lacks a dedicated 'Test Coverage' section explicitly listing relevant tests.
Title check ✅ Passed The title directly addresses the main change: suppressing a misleading FMHA skip-softmax warning in generation kernels, which aligns with the code modifications that guard the tuning check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 19, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44187 [ run ] triggered by Bot. Commit: 10effe6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44187 [ run ] completed with state SUCCESS. Commit: 10effe6
/LLM/main/L0_MergeRequest_PR pipeline #34613 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 19, 2026

/bot run --disable-fail-fast --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44210 [ run ] triggered by Bot. Commit: 10effe6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44210 [ run ] completed with state SUCCESS. Commit: 10effe6
/LLM/main/L0_MergeRequest_PR pipeline #34634 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 20, 2026

/bot run --disable-fail-fast --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44277 [ run ] triggered by Bot. Commit: 10effe6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44277 [ run ] completed with state FAILURE. Commit: 10effe6
/LLM/main/L0_MergeRequest_PR pipeline #34698 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 20, 2026

/bot run --disable-fail-fast --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44295 [ run ] triggered by Bot. Commit: 10effe6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44295 [ run ] completed with state FAILURE. Commit: 10effe6
/LLM/main/L0_MergeRequest_PR pipeline #34716 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@bobboli
Copy link
Copy Markdown
Collaborator Author

bobboli commented Apr 20, 2026

/bot run --disable-fail-fast --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44416 [ run ] triggered by Bot. Commit: 10effe6 Link to invocation

@bobboli bobboli requested a review from Tom-Zheng April 20, 2026 09:17
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44416 [ run ] completed with state SUCCESS. Commit: 10effe6
/LLM/main/L0_MergeRequest_PR pipeline #34828 completed with status: 'SUCCESS'

CI Report

Link to invocation

@bobboli bobboli changed the title [NVBUG-6086538][fix] suppress misleading skip-softmax FMHA warning in generation [https://nvbugs/6086538][fix] suppress misleading skip-softmax FMHA warning in generation Apr 20, 2026
@bobboli bobboli enabled auto-merge (squash) April 20, 2026 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants