[TRTLLM-10004][chore] Enable NCCL symmetric zero-copy by default by nv-lschneider · Pull Request #14472 · NVIDIA/TensorRT-LLM

nv-lschneider · 2026-05-22T21:53:39Z

Summary by CodeRabbit

Release Notes

Bug Fixes
- Updated default behavior for NCCL symmetric zero-copy operations to be enabled by default. Users can disable this behavior by setting an environment variable to 0 if needed.

Description

This PR enables NCCL symmetric zero-copy AllReduce by default for the PyTorch distributed path.

Previously, TLLM_NCCL_SYMMETRIC_ZERO_COPY defaulted to disabled unless explicitly set. This change flips the default so the zero-copy path is active by default, while preserving the existing opt-out behavior via:

TLLM_NCCL_SYMMETRIC_ZERO_COPY=0

Internal E2E throughput sweeps showed improvement on dense FP8 Llama models:

nvidia/Llama-3.3-70B-Instruct-FP8
nvidia/Llama-3.1-405B-Instruct-FP8

The same sweeps did not show E2E throughput regression on the other measured models:

Qwen/Qwen2.5-72B-Instruct
mistralai/Mixtral-8x22B-Instruct-v0.1
deepseek-ai/DeepSeek-R1

Test Coverage

CI has been run with this settings before. And will be repeated.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES (https://github.com/NVIDIA/TensorRT-LLM/blob/main/CODING_GUIDELINES.md) to the best of your knowledge.
Test cases are provided for new code paths (see test instructions (https://github.com/NVIDIA/TensorRT-LLM/tree/main/tests#1-how-does-the-ci-work))
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS (https://github.com/NVIDIA/TensorRT-LLM/blob/main/.github/CODEOWNERS) updated if ownership changes
Documentation updated as needed
Update tava architecture diagram (https://github.com/NVIDIA/TensorRT-LLM/blob/main/.github/tava_architecture_diagram.md) if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-05-22T21:56:29Z

📝 Walkthrough

Walkthrough

The NCCL symmetric zero-copy feature flag _NCCL_SYMMETRIC_ZERO_COPY is changed from disabled to enabled by default. When the TLLM_NCCL_SYMMETRIC_ZERO_COPY environment variable is unset, the flag now defaults to "1" instead of "0". The surrounding comment is updated accordingly to document the new default behavior.

Changes

Symmetric Zero Copy Feature Default

Layer / File(s)	Summary
Feature flag default switch `tensorrt_llm/_torch/distributed/ops.py`	The `_NCCL_SYMMETRIC_ZERO_COPY` module-level flag defaults to enabled ("1") when the environment variable is unset, with the comment updated to reflect that the feature is now on by default and can be disabled by setting the env var to `0`.

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly describes the main change: enabling NCCL symmetric zero-copy by default, which aligns perfectly with the code modification.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description adequately explains the change (enabling NCCL symmetric zero-copy by default), provides rationale (E2E throughput improvements), documents the opt-out mechanism, and covers testing and checklist items.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tabrizian · 2026-05-22T22:54:54Z

/bot run --disable-fail-fast

Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>

nv-lschneider · 2026-05-22T22:55:50Z

/bot run --add-multi-gpu --disable-fail-fast

tensorrt-cicd · 2026-05-22T23:00:28Z

PR_Github #49998 [ run ] triggered by Bot. Commit: 9bba6d7 Link to invocation

tensorrt-cicd · 2026-05-22T23:01:54Z

PR_Github #49999 [ run ] triggered by Bot. Commit: 9bba6d7 Link to invocation

tensorrt-cicd · 2026-05-23T06:40:05Z

PR_Github #49999 [ run ] completed with state SUCCESS. Commit: 9bba6d7
/LLM/main/L0_MergeRequest_PR pipeline #39564 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

nv-lschneider · 2026-05-26T13:13:43Z

/bot run --add-multi-gpu --disable-fail-fast

tensorrt-cicd · 2026-05-26T13:20:02Z

PR_Github #50343 [ run ] triggered by Bot. Commit: 9bba6d7 Link to invocation

tensorrt-cicd · 2026-05-26T17:25:53Z

PR_Github #50343 [ run ] completed with state FAILURE. Commit: 9bba6d7
/LLM/main/L0_MergeRequest_PR pipeline #39870 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

nv-lschneider · 2026-05-26T22:31:52Z

/bot run --add-multi-gpu --disable-fail-fast

tensorrt-cicd · 2026-05-26T22:37:58Z

PR_Github #50397 [ run ] triggered by Bot. Commit: 9bba6d7 Link to invocation

tensorrt-cicd · 2026-05-26T23:37:04Z

PR_Github #50397 [ run ] completed with state SUCCESS. Commit: 9bba6d7
/LLM/main/L0_MergeRequest_PR pipeline #39921 completed with status: 'SUCCESS'

CI Report

Link to invocation

nv-lschneider requested a review from a team as a code owner May 22, 2026 21:53

nv-lschneider requested review from HuiGao-NV and suyoggupta May 22, 2026 21:53

github-actions Bot assigned nv-lschneider May 22, 2026

nv-lschneider requested review from Tabrizian and hyukn May 22, 2026 21:57

Tabrizian approved these changes May 22, 2026

View reviewed changes

Enable NCCL symmetric zero-copy by default

9bba6d7

Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>

nv-lschneider force-pushed the lschneider/activate-nccl-symm-zero-copy-allreduce branch from 6095b5e to 9bba6d7 Compare May 22, 2026 22:55

hyukn approved these changes May 29, 2026

View reviewed changes

Tabrizian merged commit 3d56a4e into NVIDIA:main May 29, 2026
7 checks passed

Conversation

nv-lschneider commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

Tabrizian commented May 22, 2026

Uh oh!

nv-lschneider commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 23, 2026

Uh oh!

nv-lschneider commented May 26, 2026

Uh oh!

tensorrt-cicd commented May 26, 2026

Uh oh!

tensorrt-cicd commented May 26, 2026

Uh oh!

nv-lschneider commented May 26, 2026

Uh oh!

tensorrt-cicd commented May 26, 2026

Uh oh!

tensorrt-cicd commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nv-lschneider commented May 22, 2026 •

edited

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading