Skip to content

[https://nvbugs/5945047][fix] Fix cluster launch enablement for SM120 GPUs in allReduce fusion#13169

Merged
ziyixiong-nv merged 2 commits into
NVIDIA:mainfrom
ziyixiong-nv:repair-bot-bug5945047
May 13, 2026
Merged

[https://nvbugs/5945047][fix] Fix cluster launch enablement for SM120 GPUs in allReduce fusion#13169
ziyixiong-nv merged 2 commits into
NVIDIA:mainfrom
ziyixiong-nv:repair-bot-bug5945047

Conversation

@ziyixiong-nv
Copy link
Copy Markdown
Collaborator

@ziyixiong-nv ziyixiong-nv commented Apr 17, 2026

Summary

  • Fix for NVBugs 5945047: [TensorRT-LLM][L0][Post-Merge][main] Test failed: test_eagle3_4gpus[v2_kv_cache-cutlass-one_model-overlap_scheduler]
  • Root cause: The allReduce fusion kernel unconditionally enabled cluster launch for all GPUs with SM >= 90. However, workstation Blackwell GPUs (SM120/SM121) do not have cluster launch hardware support, causing a CUDA illegal instruction error when the kernel attempted to use cluster launch on these architectures.
  • Fix: Added an architecture range check (SM >= 90 && SM < 120) to gate cluster launch enablement, ensuring it is only used on Hopper (SM90) and datacenter Blackwell (SM100/SM103) GPUs that actually support the feature. The same condition is applied to both the cluster_size selection and the cudaLaunchAttribute configuration.
  • Automated fix generated by repair-bot

Test plan

  • Verify fix on the same GPU type as the original failure
  • Check for regressions in related tests

Links

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

The change refines CUDA cluster launch configuration logic in the all-reduce fusion kernel launcher. The SM architecture check is narrowed from SM >= 90 to SM >= 90 && SM < 120, causing cluster-related launch attributes to be disabled on workstation Blackwell (SM120/SM121) architectures while preserving support for earlier architectures.

Changes

Cohort / File(s) Summary
CUDA Cluster Launch Configuration
cpp/tensorrt_llm/kernels/communicationKernels/allReduceFusionKernels.cu
Refined SM architecture check to narrow cluster launch support from SM >= 90 to SM >= 90 && SM < 120. Updated cluster_size and cfg.numAttrs assignments to depend on the new supports_cluster condition instead of the broader SM check, disabling cluster attributes on unsupported architectures.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the specific bug fix (NVBugs 5945047) and the primary change (fixing cluster launch enablement for SM120 GPUs in allReduce fusion).
Description check ✅ Passed The PR description covers all key sections: summary with root cause analysis, detailed fix explanation, test plan with verified checkmarks, and relevant bug link. All required template sections are addressed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/tensorrt_llm/kernels/communicationKernels/allReduceFusionKernels.cu (1)

693-700: ⚠️ Potential issue | 🔴 Critical

Fix critical launch attribute configuration on SM120/SM121: programmatic stream serialization must not be disabled when kernels use cudaGridDependencySynchronize() or cudaTriggerProgrammaticLaunchCompletion().

At line 700, setting cfg.numAttrs = 0 when supports_cluster is false disables cudaLaunchAttributeProgrammaticStreamSerialization set at lines 693–694. However, the kernels in this file call cudaGridDependencySynchronize() (lines 460, 539) and cudaTriggerProgrammaticLaunchCompletion() (lines 463, 521, 590), which require this attribute to be present in the launch configuration. On SM120/SM121 (where cluster launch is unsupported but programmatic launch is available), cfg.numAttrs should be 1 (programmatic serialization only) rather than 0.

Suggested fix
-    cfg.numAttrs = supports_cluster ? 2 : 0;
+    bool const supports_programmatic_launch = (SM >= 90);
+    cfg.numAttrs = supports_programmatic_launch ? (supports_cluster ? 2 : 1) : 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/communicationKernels/allReduceFusionKernels.cu`
around lines 693 - 700, The launch-attribute setup currently zeroes cfg.numAttrs
when supports_cluster is false, which inadvertently drops the required
cudaLaunchAttributeProgrammaticStreamSerialization used by kernels that call
cudaGridDependencySynchronize() and cudaTriggerProgrammaticLaunchCompletion();
change the logic so cfg.numAttrs is set to 2 when supports_cluster is true and
to 1 otherwise, while still only populating the cluster attribute
(cudaLaunchAttributeClusterDimension) when supports_cluster is true so the
programmatic serialization attribute
(cudaLaunchAttributeProgrammaticStreamSerialization) is always passed for
SM120/SM121; update the block that assigns attribute[], cfg.attrs and
cfg.numAttrs accordingly (look for variables named attribute, cfg,
supports_cluster and the attribute ids
cudaLaunchAttributeProgrammaticStreamSerialization /
cudaLaunchAttributeClusterDimension).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@cpp/tensorrt_llm/kernels/communicationKernels/allReduceFusionKernels.cu`:
- Around line 693-700: The launch-attribute setup currently zeroes cfg.numAttrs
when supports_cluster is false, which inadvertently drops the required
cudaLaunchAttributeProgrammaticStreamSerialization used by kernels that call
cudaGridDependencySynchronize() and cudaTriggerProgrammaticLaunchCompletion();
change the logic so cfg.numAttrs is set to 2 when supports_cluster is true and
to 1 otherwise, while still only populating the cluster attribute
(cudaLaunchAttributeClusterDimension) when supports_cluster is true so the
programmatic serialization attribute
(cudaLaunchAttributeProgrammaticStreamSerialization) is always passed for
SM120/SM121; update the block that assigns attribute[], cfg.attrs and
cfg.numAttrs accordingly (look for variables named attribute, cfg,
supports_cluster and the attribute ids
cudaLaunchAttributeProgrammaticStreamSerialization /
cudaLaunchAttributeClusterDimension).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 98a94691-9314-4566-aa8e-7dee0fcec9a9

📥 Commits

Reviewing files that changed from the base of the PR and between 813d877 and 8146a29.

📒 Files selected for processing (1)
  • cpp/tensorrt_llm/kernels/communicationKernels/allReduceFusionKernels.cu

@ziyixiong-nv ziyixiong-nv changed the title [https://nvbugs/5945047][fix] [TensorRT-LLM][L0][Post-Merge][main] Test failed: [https://nvbugs/5945047][fix] Fix cluster launch on SM120 in allReduce fusion kernels Apr 18, 2026
@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44254 [ run ] triggered by Bot. Commit: 8146a29 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44254 [ run ] completed with state FAILURE. Commit: 8146a29
/LLM/main/L0_MergeRequest_PR pipeline #34674 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44338 [ run ] triggered by Bot. Commit: 8146a29 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44338 [ run ] completed with state FAILURE. Commit: 8146a29
/LLM/main/L0_MergeRequest_PR pipeline #34755 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@ziyixiong-nv ziyixiong-nv force-pushed the repair-bot-bug5945047 branch from 8146a29 to 9f905f4 Compare April 22, 2026 06:11
@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44910 [ run ] triggered by Bot. Commit: 9f905f4 Link to invocation

@ziyixiong-nv ziyixiong-nv changed the title [https://nvbugs/5945047][fix] Fix cluster launch on SM120 in allReduce fusion kernels [https://nvbugs/5945047][fix] Fix cluster launch enablement for SM120 GPUs in allReduce fusion Apr 22, 2026
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44910 [ run ] completed with state SUCCESS. Commit: 9f905f4
/LLM/main/L0_MergeRequest_PR pipeline #35242 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@ziyixiong-nv ziyixiong-nv force-pushed the repair-bot-bug5945047 branch from 9f905f4 to 7fcbe29 Compare April 28, 2026 01:26
@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45814 [ run ] triggered by Bot. Commit: 7fcbe29 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45814 [ run ] completed with state SUCCESS. Commit: 7fcbe29
/LLM/main/L0_MergeRequest_PR pipeline #36001 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45859 [ run ] triggered by Bot. Commit: 7fcbe29 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45859 [ run ] completed with state SUCCESS. Commit: 7fcbe29
/LLM/main/L0_MergeRequest_PR pipeline #36037 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45917 [ run ] triggered by Bot. Commit: 7fcbe29 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45917 [ run ] completed with state FAILURE. Commit: 7fcbe29
/LLM/main/L0_MergeRequest_PR pipeline #36079 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@ziyixiong-nv ziyixiong-nv force-pushed the repair-bot-bug5945047 branch from 7fcbe29 to 7af0f5c Compare April 29, 2026 00:22
@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46000 [ run ] triggered by Bot. Commit: 7af0f5c Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46000 [ run ] completed with state FAILURE. Commit: 7af0f5c
/LLM/main/L0_MergeRequest_PR pipeline #36150 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47452 [ run ] triggered by Bot. Commit: d0e0dad Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47452 [ run ] completed with state SUCCESS. Commit: d0e0dad
/LLM/main/L0_MergeRequest_PR pipeline #37373 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@ziyixiong-nv ziyixiong-nv force-pushed the repair-bot-bug5945047 branch from d0e0dad to 0c6a75b Compare May 10, 2026 07:13
@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47572 [ run ] triggered by Bot. Commit: 0c6a75b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47572 [ run ] completed with state SUCCESS. Commit: 0c6a75b
/LLM/main/L0_MergeRequest_PR pipeline #37483 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47589 [ run ] triggered by Bot. Commit: 0c6a75b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47589 [ run ] completed with state SUCCESS. Commit: 0c6a75b
/LLM/main/L0_MergeRequest_PR pipeline #37499 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@ziyixiong-nv ziyixiong-nv force-pushed the repair-bot-bug5945047 branch from 0c6a75b to 16d64d1 Compare May 11, 2026 00:00
@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47626 [ run ] triggered by Bot. Commit: 16d64d1 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47626 [ run ] completed with state SUCCESS. Commit: 16d64d1
/LLM/main/L0_MergeRequest_PR pipeline #37531 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47668 [ run ] triggered by Bot. Commit: 16d64d1 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47668 [ run ] completed with state FAILURE. Commit: 16d64d1
/LLM/main/L0_MergeRequest_PR pipeline #37566 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…e fusion kernels

The allreduce fusion kernel launcher used `SM >= 90` to enable CUDA
cluster launch, which incorrectly included SM120 (workstation Blackwell
/ RTX PRO 6000). SM120 does not support cluster launch, causing
`cudaErrorInvalidArgument` that surfaced as "RMSNorm failed with error
code invalid argument" during Eagle3 inference.

Fix the condition to `SM >= 90 && SM < 120`, matching the existing
pattern in `is_lamport_supported()` which already correctly excludes
SM120.

Signed-off-by: Ziyi Xiong <219238287+ziyixiong-nv@users.noreply.github.com>
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
@ziyixiong-nv ziyixiong-nv force-pushed the repair-bot-bug5945047 branch from 16d64d1 to 42e9227 Compare May 11, 2026 08:01
@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47694 [ run ] triggered by Bot. Commit: 42e9227 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47694 [ run ] completed with state SUCCESS. Commit: 42e9227
/LLM/main/L0_MergeRequest_PR pipeline #37589 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@ziyixiong-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47911 [ run ] triggered by Bot. Commit: 42e9227 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47911 [ run ] completed with state SUCCESS. Commit: 42e9227
/LLM/main/L0_MergeRequest_PR pipeline #37758 completed with status: 'SUCCESS'

CI Report

Link to invocation

@ziyixiong-nv ziyixiong-nv requested a review from hyukn May 13, 2026 01:12
@ziyixiong-nv ziyixiong-nv merged commit 9d12d1e into NVIDIA:main May 13, 2026
6 checks passed
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026
… GPUs in allReduce fusion (NVIDIA#13169)

Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants