[https://nvbugs/5945047][fix] Fix cluster launch enablement for SM120 GPUs in allReduce fusion by ziyixiong-nv · Pull Request #13169 · NVIDIA/TensorRT-LLM

ziyixiong-nv · 2026-04-17T22:56:46Z

Summary

Fix for NVBugs 5945047: [TensorRT-LLM][L0][Post-Merge][main] Test failed: test_eagle3_4gpus[v2_kv_cache-cutlass-one_model-overlap_scheduler]
Root cause: The allReduce fusion kernel unconditionally enabled cluster launch for all GPUs with SM >= 90. However, workstation Blackwell GPUs (SM120/SM121) do not have cluster launch hardware support, causing a CUDA illegal instruction error when the kernel attempted to use cluster launch on these architectures.
Fix: Added an architecture range check (SM >= 90 && SM < 120) to gate cluster launch enablement, ensuring it is only used on Hopper (SM90) and datacenter Blackwell (SM100/SM103) GPUs that actually support the feature. The same condition is applied to both the cluster_size selection and the cudaLaunchAttribute configuration.
Automated fix generated by repair-bot

Test plan

Verify fix on the same GPU type as the original failure
Check for regressions in related tests

Links

Bug: https://nvbugs/5945047

coderabbitai · 2026-04-17T23:01:30Z

📝 Walkthrough

Walkthrough

The change refines CUDA cluster launch configuration logic in the all-reduce fusion kernel launcher. The SM architecture check is narrowed from SM >= 90 to SM >= 90 && SM < 120, causing cluster-related launch attributes to be disabled on workstation Blackwell (SM120/SM121) architectures while preserving support for earlier architectures.

Changes

Cohort / File(s)	Summary
CUDA Cluster Launch Configuration `cpp/tensorrt_llm/kernels/communicationKernels/allReduceFusionKernels.cu`	Refined SM architecture check to narrow cluster launch support from `SM >= 90` to `SM >= 90 && SM < 120`. Updated `cluster_size` and `cfg.numAttrs` assignments to depend on the new `supports_cluster` condition instead of the broader SM check, disabling cluster attributes on unsupported architectures.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the specific bug fix (NVBugs 5945047) and the primary change (fixing cluster launch enablement for SM120 GPUs in allReduce fusion).
Description check	✅ Passed	The PR description covers all key sections: summary with root cause analysis, detailed fix explanation, test plan with verified checkmarks, and relevant bug link. All required template sections are addressed.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/tensorrt_llm/kernels/communicationKernels/allReduceFusionKernels.cu (1)
693-700: ⚠️ Potential issue | 🔴 Critical

Fix critical launch attribute configuration on SM120/SM121: programmatic stream serialization must not be disabled when kernels use cudaGridDependencySynchronize() or cudaTriggerProgrammaticLaunchCompletion().

At line 700, setting cfg.numAttrs = 0 when supports_cluster is false disables cudaLaunchAttributeProgrammaticStreamSerialization set at lines 693–694. However, the kernels in this file call cudaGridDependencySynchronize() (lines 460, 539) and cudaTriggerProgrammaticLaunchCompletion() (lines 463, 521, 590), which require this attribute to be present in the launch configuration. On SM120/SM121 (where cluster launch is unsupported but programmatic launch is available), cfg.numAttrs should be 1 (programmatic serialization only) rather than 0.
Suggested fix
-    cfg.numAttrs = supports_cluster ? 2 : 0;
+    bool const supports_programmatic_launch = (SM >= 90);
+    cfg.numAttrs = supports_programmatic_launch ? (supports_cluster ? 2 : 1) : 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/communicationKernels/allReduceFusionKernels.cu`
around lines 693 - 700, The launch-attribute setup currently zeroes cfg.numAttrs
when supports_cluster is false, which inadvertently drops the required
cudaLaunchAttributeProgrammaticStreamSerialization used by kernels that call
cudaGridDependencySynchronize() and cudaTriggerProgrammaticLaunchCompletion();
change the logic so cfg.numAttrs is set to 2 when supports_cluster is true and
to 1 otherwise, while still only populating the cluster attribute
(cudaLaunchAttributeClusterDimension) when supports_cluster is true so the
programmatic serialization attribute
(cudaLaunchAttributeProgrammaticStreamSerialization) is always passed for
SM120/SM121; update the block that assigns attribute[], cfg.attrs and
cfg.numAttrs accordingly (look for variables named attribute, cfg,
supports_cluster and the attribute ids
cudaLaunchAttributeProgrammaticStreamSerialization /
cudaLaunchAttributeClusterDimension).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@cpp/tensorrt_llm/kernels/communicationKernels/allReduceFusionKernels.cu`:
- Around line 693-700: The launch-attribute setup currently zeroes cfg.numAttrs
when supports_cluster is false, which inadvertently drops the required
cudaLaunchAttributeProgrammaticStreamSerialization used by kernels that call
cudaGridDependencySynchronize() and cudaTriggerProgrammaticLaunchCompletion();
change the logic so cfg.numAttrs is set to 2 when supports_cluster is true and
to 1 otherwise, while still only populating the cluster attribute
(cudaLaunchAttributeClusterDimension) when supports_cluster is true so the
programmatic serialization attribute
(cudaLaunchAttributeProgrammaticStreamSerialization) is always passed for
SM120/SM121; update the block that assigns attribute[], cfg.attrs and
cfg.numAttrs accordingly (look for variables named attribute, cfg,
supports_cluster and the attribute ids
cudaLaunchAttributeProgrammaticStreamSerialization /
cudaLaunchAttributeClusterDimension).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 98a94691-9314-4566-aa8e-7dee0fcec9a9

📥 Commits

Reviewing files that changed from the base of the PR and between 813d877 and 8146a29.

📒 Files selected for processing (1)

cpp/tensorrt_llm/kernels/communicationKernels/allReduceFusionKernels.cu

ziyixiong-nv · 2026-04-20T02:18:55Z

/bot run

tensorrt-cicd · 2026-04-20T02:26:30Z

PR_Github #44254 [ run ] triggered by Bot. Commit: 8146a29 Link to invocation

tensorrt-cicd · 2026-04-20T03:33:07Z

PR_Github #44254 [ run ] completed with state FAILURE. Commit: 8146a29
/LLM/main/L0_MergeRequest_PR pipeline #34674 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

ziyixiong-nv · 2026-04-20T05:57:00Z

/bot run

tensorrt-cicd · 2026-04-20T06:02:39Z

PR_Github #44338 [ run ] triggered by Bot. Commit: 8146a29 Link to invocation

tensorrt-cicd · 2026-04-20T06:11:08Z

PR_Github #44338 [ run ] completed with state FAILURE. Commit: 8146a29
/LLM/main/L0_MergeRequest_PR pipeline #34755 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

ziyixiong-nv · 2026-04-22T06:11:13Z

/bot run

tensorrt-cicd · 2026-04-22T06:17:11Z

PR_Github #44910 [ run ] triggered by Bot. Commit: 9f905f4 Link to invocation

tensorrt-cicd · 2026-04-22T14:28:29Z

PR_Github #44910 [ run ] completed with state SUCCESS. Commit: 9f905f4
/LLM/main/L0_MergeRequest_PR pipeline #35242 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

ziyixiong-nv · 2026-04-28T01:26:59Z

/bot run

tensorrt-cicd · 2026-04-28T01:34:15Z

PR_Github #45814 [ run ] triggered by Bot. Commit: 7fcbe29 Link to invocation

tensorrt-cicd · 2026-04-28T05:04:20Z

PR_Github #45814 [ run ] completed with state SUCCESS. Commit: 7fcbe29
/LLM/main/L0_MergeRequest_PR pipeline #36001 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

ziyixiong-nv · 2026-04-28T05:11:19Z

/bot run

tensorrt-cicd · 2026-04-28T05:18:31Z

PR_Github #45859 [ run ] triggered by Bot. Commit: 7fcbe29 Link to invocation

tensorrt-cicd · 2026-04-28T09:21:11Z

PR_Github #45859 [ run ] completed with state SUCCESS. Commit: 7fcbe29
/LLM/main/L0_MergeRequest_PR pipeline #36037 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

ziyixiong-nv · 2026-04-28T09:41:32Z

/bot run

tensorrt-cicd · 2026-04-28T09:48:40Z

PR_Github #45917 [ run ] triggered by Bot. Commit: 7fcbe29 Link to invocation

tensorrt-cicd · 2026-04-28T14:03:45Z

PR_Github #45917 [ run ] completed with state FAILURE. Commit: 7fcbe29
/LLM/main/L0_MergeRequest_PR pipeline #36079 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

ziyixiong-nv · 2026-04-29T00:23:08Z

/bot run

tensorrt-cicd · 2026-04-29T00:29:50Z

PR_Github #46000 [ run ] triggered by Bot. Commit: 7af0f5c Link to invocation

tensorrt-cicd · 2026-04-29T01:55:05Z

PR_Github #46000 [ run ] completed with state FAILURE. Commit: 7af0f5c
/LLM/main/L0_MergeRequest_PR pipeline #36150 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

ziyixiong-nv · 2026-04-29T02:18:12Z

/bot run

ziyixiong-nv · 2026-05-09T00:15:17Z

/bot run

tensorrt-cicd · 2026-05-09T00:20:34Z

PR_Github #47452 [ run ] triggered by Bot. Commit: d0e0dad Link to invocation

tensorrt-cicd · 2026-05-09T01:37:32Z

PR_Github #47452 [ run ] completed with state SUCCESS. Commit: d0e0dad
/LLM/main/L0_MergeRequest_PR pipeline #37373 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

ziyixiong-nv · 2026-05-10T07:13:58Z

/bot run

tensorrt-cicd · 2026-05-10T07:20:02Z

PR_Github #47572 [ run ] triggered by Bot. Commit: 0c6a75b Link to invocation

tensorrt-cicd · 2026-05-10T09:08:55Z

PR_Github #47572 [ run ] completed with state SUCCESS. Commit: 0c6a75b
/LLM/main/L0_MergeRequest_PR pipeline #37483 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

ziyixiong-nv · 2026-05-10T11:28:26Z

/bot run

tensorrt-cicd · 2026-05-10T11:34:41Z

PR_Github #47589 [ run ] triggered by Bot. Commit: 0c6a75b Link to invocation

tensorrt-cicd · 2026-05-10T16:13:40Z

PR_Github #47589 [ run ] completed with state SUCCESS. Commit: 0c6a75b
/LLM/main/L0_MergeRequest_PR pipeline #37499 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

ziyixiong-nv · 2026-05-11T00:01:07Z

/bot run

tensorrt-cicd · 2026-05-11T00:07:36Z

PR_Github #47626 [ run ] triggered by Bot. Commit: 16d64d1 Link to invocation

tensorrt-cicd · 2026-05-11T05:04:40Z

PR_Github #47626 [ run ] completed with state SUCCESS. Commit: 16d64d1
/LLM/main/L0_MergeRequest_PR pipeline #37531 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

ziyixiong-nv · 2026-05-11T05:21:36Z

/bot run

tensorrt-cicd · 2026-05-11T05:27:01Z

PR_Github #47668 [ run ] triggered by Bot. Commit: 16d64d1 Link to invocation

tensorrt-cicd · 2026-05-11T07:43:00Z

PR_Github #47668 [ run ] completed with state FAILURE. Commit: 16d64d1
/LLM/main/L0_MergeRequest_PR pipeline #37566 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…e fusion kernels The allreduce fusion kernel launcher used `SM >= 90` to enable CUDA cluster launch, which incorrectly included SM120 (workstation Blackwell / RTX PRO 6000). SM120 does not support cluster launch, causing `cudaErrorInvalidArgument` that surfaced as "RMSNorm failed with error code invalid argument" during Eagle3 inference. Fix the condition to `SM >= 90 && SM < 120`, matching the existing pattern in `is_lamport_supported()` which already correctly excludes SM120. Signed-off-by: Ziyi Xiong <219238287+ziyixiong-nv@users.noreply.github.com>

Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>

ziyixiong-nv · 2026-05-11T08:02:14Z

/bot run

tensorrt-cicd · 2026-05-11T08:08:13Z

PR_Github #47694 [ run ] triggered by Bot. Commit: 42e9227 Link to invocation

tensorrt-cicd · 2026-05-11T13:11:18Z

PR_Github #47694 [ run ] completed with state SUCCESS. Commit: 42e9227
/LLM/main/L0_MergeRequest_PR pipeline #37589 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

ziyixiong-nv · 2026-05-12T06:32:19Z

/bot run

tensorrt-cicd · 2026-05-12T06:37:56Z

PR_Github #47911 [ run ] triggered by Bot. Commit: 42e9227 Link to invocation

tensorrt-cicd · 2026-05-12T15:18:02Z

PR_Github #47911 [ run ] completed with state SUCCESS. Commit: 42e9227
/LLM/main/L0_MergeRequest_PR pipeline #37758 completed with status: 'SUCCESS'

CI Report

Link to invocation

… GPUs in allReduce fusion (NVIDIA#13169) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>

github-actions Bot assigned ziyixiong-nv Apr 17, 2026

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

ziyixiong-nv changed the title ~~[https://nvbugs/5945047][fix] [TensorRT-LLM][L0][Post-Merge][main] Test failed:~~ [https://nvbugs/5945047][fix] Fix cluster launch on SM120 in allReduce fusion kernels Apr 18, 2026

ziyixiong-nv force-pushed the repair-bot-bug5945047 branch from 8146a29 to 9f905f4 Compare April 22, 2026 06:11

ziyixiong-nv changed the title ~~[https://nvbugs/5945047][fix] Fix cluster launch on SM120 in allReduce fusion kernels~~ [https://nvbugs/5945047][fix] Fix cluster launch enablement for SM120 GPUs in allReduce fusion Apr 22, 2026

ziyixiong-nv force-pushed the repair-bot-bug5945047 branch from 9f905f4 to 7fcbe29 Compare April 28, 2026 01:26

ziyixiong-nv force-pushed the repair-bot-bug5945047 branch from 7fcbe29 to 7af0f5c Compare April 29, 2026 00:22

ziyixiong-nv force-pushed the repair-bot-bug5945047 branch from d0e0dad to 0c6a75b Compare May 10, 2026 07:13

ziyixiong-nv force-pushed the repair-bot-bug5945047 branch from 0c6a75b to 16d64d1 Compare May 11, 2026 00:00

ziyixiong-nv added 2 commits May 11, 2026 16:01

Remove the waiver

42e9227

Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>

ziyixiong-nv force-pushed the repair-bot-bug5945047 branch from 16d64d1 to 42e9227 Compare May 11, 2026 08:01

ziyixiong-nv requested a review from hyukn May 13, 2026 01:12

hyukn approved these changes May 13, 2026

View reviewed changes

ziyixiong-nv merged commit 9d12d1e into NVIDIA:main May 13, 2026
6 checks passed

Conversation

ziyixiong-nv commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Links

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ziyixiong-nv commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

ziyixiong-nv commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

ziyixiong-nv commented Apr 22, 2026

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

ziyixiong-nv commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

ziyixiong-nv commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

ziyixiong-nv commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

ziyixiong-nv commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

ziyixiong-nv commented Apr 29, 2026

Uh oh!

ziyixiong-nv commented May 9, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

ziyixiong-nv commented May 10, 2026

Uh oh!

tensorrt-cicd commented May 10, 2026

Uh oh!

tensorrt-cicd commented May 10, 2026

Uh oh!

ziyixiong-nv commented May 10, 2026

Uh oh!

tensorrt-cicd commented May 10, 2026

Uh oh!

tensorrt-cicd commented May 10, 2026

Uh oh!

ziyixiong-nv commented May 11, 2026

ziyixiong-nv commented Apr 17, 2026 •

edited

Loading

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading