Skip to content

[TRTLLM-10004][chore] Enable NCCL symmetric zero-copy by default#14472

Merged
Tabrizian merged 1 commit into
NVIDIA:mainfrom
nv-lschneider:lschneider/activate-nccl-symm-zero-copy-allreduce
May 29, 2026
Merged

[TRTLLM-10004][chore] Enable NCCL symmetric zero-copy by default#14472
Tabrizian merged 1 commit into
NVIDIA:mainfrom
nv-lschneider:lschneider/activate-nccl-symm-zero-copy-allreduce

Conversation

@nv-lschneider
Copy link
Copy Markdown
Collaborator

@nv-lschneider nv-lschneider commented May 22, 2026

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Updated default behavior for NCCL symmetric zero-copy operations to be enabled by default. Users can disable this behavior by setting an environment variable to 0 if needed.

Review Change Stack

Description

This PR enables NCCL symmetric zero-copy AllReduce by default for the PyTorch distributed path.

Previously, TLLM_NCCL_SYMMETRIC_ZERO_COPY defaulted to disabled unless explicitly set. This change flips the default so the zero-copy path is active by default, while preserving the existing opt-out behavior via:

TLLM_NCCL_SYMMETRIC_ZERO_COPY=0

Internal E2E throughput sweeps showed improvement on dense FP8 Llama models:

  • nvidia/Llama-3.3-70B-Instruct-FP8
  • nvidia/Llama-3.1-405B-Instruct-FP8

The same sweeps did not show E2E throughput regression on the other measured models:

  • Qwen/Qwen2.5-72B-Instruct
  • mistralai/Mixtral-8x22B-Instruct-v0.1
  • deepseek-ai/DeepSeek-R1

Test Coverage

CI has been run with this settings before. And will be repeated.

PR Checklist

Please review the following before submitting your PR:

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

📝 Walkthrough

Walkthrough

The NCCL symmetric zero-copy feature flag _NCCL_SYMMETRIC_ZERO_COPY is changed from disabled to enabled by default. When the TLLM_NCCL_SYMMETRIC_ZERO_COPY environment variable is unset, the flag now defaults to "1" instead of "0". The surrounding comment is updated accordingly to document the new default behavior.

Changes

Symmetric Zero Copy Feature Default

Layer / File(s) Summary
Feature flag default switch
tensorrt_llm/_torch/distributed/ops.py
The _NCCL_SYMMETRIC_ZERO_COPY module-level flag defaults to enabled ("1") when the environment variable is unset, with the comment updated to reflect that the feature is now on by default and can be disabled by setting the env var to 0.

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly describes the main change: enabling NCCL symmetric zero-copy by default, which aligns perfectly with the code modification.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description adequately explains the change (enabling NCCL symmetric zero-copy by default), provides rationale (E2E throughput improvements), documents the opt-out mechanism, and covers testing and checklist items.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@nv-lschneider nv-lschneider requested review from Tabrizian and hyukn May 22, 2026 21:57
@Tabrizian
Copy link
Copy Markdown
Member

/bot run --disable-fail-fast

Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
@nv-lschneider nv-lschneider force-pushed the lschneider/activate-nccl-symm-zero-copy-allreduce branch from 6095b5e to 9bba6d7 Compare May 22, 2026 22:55
@nv-lschneider
Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49998 [ run ] triggered by Bot. Commit: 9bba6d7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49999 [ run ] triggered by Bot. Commit: 9bba6d7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49999 [ run ] completed with state SUCCESS. Commit: 9bba6d7
/LLM/main/L0_MergeRequest_PR pipeline #39564 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@nv-lschneider
Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50343 [ run ] triggered by Bot. Commit: 9bba6d7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50343 [ run ] completed with state FAILURE. Commit: 9bba6d7
/LLM/main/L0_MergeRequest_PR pipeline #39870 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@nv-lschneider
Copy link
Copy Markdown
Collaborator Author

/bot run --add-multi-gpu --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50397 [ run ] triggered by Bot. Commit: 9bba6d7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50397 [ run ] completed with state SUCCESS. Commit: 9bba6d7
/LLM/main/L0_MergeRequest_PR pipeline #39921 completed with status: 'SUCCESS'

CI Report

Link to invocation

@Tabrizian Tabrizian merged commit 3d56a4e into NVIDIA:main May 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants