[https://nvbugs/6074014][fix] Min-reduce available host memory to ensure that all ranks agree about whether prefetch is enabled by dhansen-nvidia · Pull Request #13161 · NVIDIA/TensorRT-LLM

dhansen-nvidia · 2026-04-17T19:20:57Z

Summary by CodeRabbit

Bug Fixes
- Fixed inconsistent prefetch decisions during weight loading where different local ranks could independently make different prefetching choices. Now implements synchronized host memory availability calculations across all local ranks, ensuring all ranks make coordinated prefetch decisions for more reliable weight loading behavior in multi-device configurations.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…ure that all ranks agree about whether prefetch is enabled Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>

dhansen-nvidia · 2026-04-17T19:22:13Z

/bot run

coderabbitai · 2026-04-17T19:25:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d1edf0ae-e5bb-4e40-8ecb-05176dd12ffd

📥 Commits

Reviewing files that changed from the base of the PR and between 813d877 and 299b34b.

📒 Files selected for processing (1)

tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py

📝 Walkthrough

Walkthrough

Added MPI-aware memory detection to weight loader prefetch logic. Introduces a static method that computes available host memory with collective synchronization across local ranks when multi-device mode is enabled, replacing direct per-rank memory queries for consistent prefetch decisions.

Changes

Cohort / File(s)	Summary
Memory-aware prefetch logic `tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py`	Added `_get_local_available_host_memory()` static method with MPI allreduce (minimum) synchronization when `ENABLE_MULTI_DEVICE` is enabled. Updated `load_weights()` prefetch condition to compare total prefetch size against 90% of synchronized memory value instead of per-rank local memory, ensuring consistent prefetch decisions across local ranks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is the template with uncompleted sections. The 'Description', 'Test Coverage', and required checklist items are not filled in; only the template placeholder comments remain.	Complete the Description section explaining the issue and solution, add Test Coverage details listing relevant tests, and provide substantive checklist answers beyond the template.
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is clear and specific, accurately describing the main change: performing a min-reduce of available host memory to ensure consistent prefetch decisions across ranks.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2026-04-17T19:32:43Z

PR_Github #44065 [ run ] triggered by Bot. Commit: 299b34b Link to invocation

tensorrt-cicd · 2026-04-18T13:25:01Z

PR_Github #44065 [ run ] completed with state SUCCESS. Commit: 299b34b
/LLM/main/L0_MergeRequest_PR pipeline #34496 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

dhansen-nvidia · 2026-04-18T17:33:22Z

/bot run

tensorrt-cicd · 2026-04-18T17:39:06Z

PR_Github #44131 [ run ] triggered by Bot. Commit: 299b34b Link to invocation

tensorrt-cicd · 2026-04-18T23:39:35Z

PR_Github #44131 [ run ] completed with state SUCCESS. Commit: 299b34b
/LLM/main/L0_MergeRequest_PR pipeline #34557 completed with status: 'SUCCESS'

CI Report

Link to invocation

brb-nv

LGTM.

[https://nvbugs/6074014][fix] Min-reduce available host memory to ens…

299b34b

…ure that all ranks agree about whether prefetch is enabled Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>

dhansen-nvidia requested a review from a team as a code owner April 17, 2026 19:20

dhansen-nvidia requested a review from brb-nv April 17, 2026 19:20

github-actions Bot assigned dhansen-nvidia Apr 17, 2026

2ez4bz approved these changes Apr 17, 2026

View reviewed changes

brb-nv approved these changes Apr 21, 2026

View reviewed changes

achartier merged commit 96bb8b7 into NVIDIA:main Apr 21, 2026
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6074014][fix] Min-reduce available host memory to ensure that all ranks agree about whether prefetch is enabled#13161

[https://nvbugs/6074014][fix] Min-reduce available host memory to ensure that all ranks agree about whether prefetch is enabled#13161
achartier merged 1 commit intoNVIDIA:mainfrom
dhansen-nvidia:enable_prefetch_fix

dhansen-nvidia commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

dhansen-nvidia commented Apr 17, 2026

Uh oh!

coderabbitai Bot commented Apr 17, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 18, 2026

Uh oh!

dhansen-nvidia commented Apr 18, 2026

Uh oh!

tensorrt-cicd commented Apr 18, 2026

Uh oh!

tensorrt-cicd commented Apr 18, 2026

Uh oh!

brb-nv left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

dhansen-nvidia commented Apr 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

dhansen-nvidia commented Apr 17, 2026

Uh oh!

coderabbitai Bot commented Apr 17, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 18, 2026

Uh oh!

dhansen-nvidia commented Apr 18, 2026

Uh oh!

tensorrt-cicd commented Apr 18, 2026

Uh oh!

tensorrt-cicd commented Apr 18, 2026

Uh oh!

brb-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dhansen-nvidia commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading