[None][fix] Fix bugs related with nemotron-nas model by Wanli-Jiang · Pull Request #13968 · NVIDIA/TensorRT-LLM

Wanli-Jiang · 2026-05-11T05:39:46Z

Bugs to be resolved:

Summary by CodeRabbit

New Features
- Added support for Nemotron NAS language model architecture with proper weight mapping.
Bug Fixes
- Fixed KV cache transfer computation to correctly handle empty memory pools.
Tests
- Re-enabled Nemotron NAS integration and LoRA tests for improved test coverage.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-05-11T05:43:28Z

📝 Walkthrough

Walkthrough

This PR introduces a new Hugging Face weight mapper for the Nemotron NAS model (DeciLMForCausalLM) that handles per-layer KV-head configurations, exports it from the checkpoints module, enables previously-waived tests, and includes a defensive fix for KV cache transfer buffer handling with zero-sized buffers.

Changes

Nemotron NAS Weight Mapper & Test Enablement

Layer / File(s)	Summary
Weight Mapper Export `tensorrt_llm/_torch/models/checkpoints/__init__.py`	`NemotronNasHfWeightMapper` is imported from the `hf` module and added to the public `__all__` export list.
Mapper Implementation `tensorrt_llm/_torch/models/checkpoints/hf/nemotron_nas_weight_mapper.py`	`NemotronNasHfWeightMapper` class overrides `apply_callbacks` to handle per-layer `num_key_value_heads` by inferring the layer index from module names. Helper methods `_layer_idx_from_breakdown` and `_duplicate_kv_weights_for_layer` extract the layer index and duplicate `k_proj`/`v_proj` weights (including `weight_scale` when NVFP4 quantization is active) using per-layer KV-head counts.
Test Waiver Removal & Parameter Adjustment `tests/integration/test_lists/waives.txt`, `tests/unittest/llmapi/test_llm_pytorch.py`	Test waiver entries for Nemotron NAS tests are removed, and `test_nemotron_nas_lora` is configured to generate fewer tokens (`max_tokens` reduced from 10 to 3).

KV Cache Transfer Manager Empty Buffer Handling

Layer / File(s)	Summary
Empty Buffer Guard `cpp/tensorrt_llm/batch_manager/kvCacheTransferManager.cpp`	The guard clause in `computeBlockTransferBytes` is expanded to skip pools when `pool.primaryPtr` is null OR has zero size, preventing transfer byte-count calculations for empty buffers.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The pull request description is largely incomplete, with only NVBugs tickets listed and all required sections left blank or as template placeholders.	Complete the Description section explaining the issue and solution, list relevant test cases under Test Coverage, and ensure all PR checklist items are properly addressed and explained.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title mentions 'nemotron-nas model' and 'fix bugs', which directly aligns with the primary changes: adding NemotronNasHfWeightMapper, fixing KV cache transfer logic, and resolving related test issues.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/unittest/llmapi/test_llm_pytorch.py (1)
620-620: LGTM — tightening max_tokens here is a reasonable stabilization for test_nemotron_nas_lora.

QA note: this is unittest-only scope (tests/unittest/...), so no tests/integration/test_lists/qa/* update is needed in this PR.

As per coding guidelines tests/**: "If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/llmapi/test_llm_pytorch.py` at line 620, This change tightens
SamplingParams(max_tokens=3, add_special_tokens=False) in the unit test
test_nemotron_nas_lora and is limited to tests/** unit scope; update the PR
description (or PR body) to explicitly state that the change only touches
unittest scope (tests/unittest/...), that QA list updates under
tests/integration/... are unnecessary, and therefore no QA list changes are
required for this PR so reviewers and QA know it's out-of-scope for integration
QA updates.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/models/checkpoints/__init__.py`:
- Line 10: Update this modified file to include the required NVIDIA source-file
header at the top of the file: add or update the standard copyright block (with
the current year and NVIDIA ownership) before any imports/definitions so the
file (which imports NemotronNasHfWeightMapper) conforms to the project's header
policy for Python files; ensure the header format matches other files in the
repo (same wording and year) and place it above the existing "from
.hf.nemotron_nas_weight_mapper import NemotronNasHfWeightMapper" line.

In `@tensorrt_llm/_torch/models/checkpoints/hf/nemotron_nas_weight_mapper.py`:
- Line 1: Add the repository-required NVIDIA copyright header to the top of the
new Python module nemotron_nas_weight_mapper.py (module:
tensorrt_llm._torch.models.checkpoints.hf.nemotron_nas_weight_mapper); insert
the standard NVIDIA header block used across the repo (with the appropriate
year) as the very first lines of the file before any imports (e.g., before "from
torch import nn") so the file complies with the coding guidelines for new .py
files.

---

Nitpick comments:
In `@tests/unittest/llmapi/test_llm_pytorch.py`:
- Line 620: This change tightens SamplingParams(max_tokens=3,
add_special_tokens=False) in the unit test test_nemotron_nas_lora and is limited
to tests/** unit scope; update the PR description (or PR body) to explicitly
state that the change only touches unittest scope (tests/unittest/...), that QA
list updates under tests/integration/... are unnecessary, and therefore no QA
list changes are required for this PR so reviewers and QA know it's out-of-scope
for integration QA updates.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ec64ed26-ce60-4e44-9be0-f35606a37157

📥 Commits

Reviewing files that changed from the base of the PR and between ee1fb68 and 6b641f1.

📒 Files selected for processing (5)

cpp/tensorrt_llm/batch_manager/kvCacheTransferManager.cpp
tensorrt_llm/_torch/models/checkpoints/__init__.py
tensorrt_llm/_torch/models/checkpoints/hf/nemotron_nas_weight_mapper.py
tests/integration/test_lists/waives.txt
tests/unittest/llmapi/test_llm_pytorch.py

💤 Files with no reviewable changes (1)

tests/integration/test_lists/waives.txt

Wanli-Jiang · 2026-05-11T05:46:07Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-11T05:51:36Z

PR_Github #47669 [ run ] triggered by Bot. Commit: 6b641f1 Link to invocation

tensorrt-cicd · 2026-05-11T16:26:48Z

PR_Github #47669 [ run ] completed with state SUCCESS. Commit: 6b641f1
/LLM/main/L0_MergeRequest_PR pipeline #37569 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Wanli-Jiang · 2026-05-11T22:03:37Z

/bot run --disable-fail-fast

Wanli-Jiang · 2026-05-12T01:59:48Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-12T02:04:58Z

PR_Github #47825 [ run ] triggered by Bot. Commit: 67e0a3a Link to invocation

tensorrt-cicd · 2026-05-12T11:22:45Z

PR_Github #47825 [ run ] completed with state SUCCESS. Commit: 67e0a3a
/LLM/main/L0_MergeRequest_PR pipeline #37710 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

* https://nvbugs/6070857 * https://nvbugs/6140399 * https://nvbugs/5648522 * https://nvbugs/6163032 Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang · 2026-05-13T04:47:57Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-13T04:53:50Z

PR_Github #48101 [ run ] triggered by Bot. Commit: 1b44911 Link to invocation

tensorrt-cicd · 2026-05-13T15:07:03Z

PR_Github #48101 [ run ] completed with state FAILURE. Commit: 1b44911
/LLM/main/L0_MergeRequest_PR pipeline #37931 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang · 2026-05-14T08:58:15Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-14T09:03:56Z

PR_Github #48343 [ run ] triggered by Bot. Commit: be7b388 Link to invocation

tensorrt-cicd · 2026-05-14T21:27:28Z

PR_Github #48343 [ run ] completed with state SUCCESS. Commit: be7b388
/LLM/main/L0_MergeRequest_PR pipeline #38151 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Wanli-Jiang · 2026-05-15T04:45:21Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-15T04:51:40Z

PR_Github #48524 [ run ] triggered by Bot. Commit: be7b388 Link to invocation

tensorrt-cicd · 2026-05-15T23:28:38Z

PR_Github #48524 [ run ] completed with state FAILURE. Commit: be7b388
/LLM/main/L0_MergeRequest_PR pipeline #38317 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang · 2026-05-16T03:32:43Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-16T03:38:11Z

PR_Github #48668 [ run ] triggered by Bot. Commit: f00123d Link to invocation

tensorrt-cicd · 2026-05-16T10:46:46Z

PR_Github #48668 [ run ] completed with state SUCCESS. Commit: f00123d
/LLM/main/L0_MergeRequest_PR pipeline #38449 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Wanli-Jiang · 2026-05-16T13:00:43Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-16T13:06:04Z

PR_Github #48686 [ run ] triggered by Bot. Commit: f00123d Link to invocation

tensorrt-cicd · 2026-05-16T16:21:06Z

PR_Github #48686 [ run ] completed with state ABORTED. Commit: f00123d

Link to invocation

Wanli-Jiang · 2026-05-18T02:58:16Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-18T03:04:45Z

PR_Github #48815 [ run ] triggered by Bot. Commit: f00123d Link to invocation

tensorrt-cicd · 2026-05-18T05:46:17Z

PR_Github #48815 [ run ] completed with state SUCCESS. Commit: f00123d
/LLM/main/L0_MergeRequest_PR pipeline #38577 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Wanli-Jiang · 2026-05-18T05:50:06Z

/bot skip --comment "all related tests were passed"

tensorrt-cicd · 2026-05-18T06:01:49Z

PR_Github #48842 [ skip ] triggered by Bot. Commit: f00123d Link to invocation

tensorrt-cicd · 2026-05-18T06:13:13Z

PR_Github #48842 [ skip ] completed with state SUCCESS. Commit: f00123d
Skipping testing for commit f00123d

Link to invocation

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang requested review from a team as code owners May 11, 2026 05:39

Wanli-Jiang requested a review from syuoni May 11, 2026 05:39

github-actions Bot assigned Wanli-Jiang May 11, 2026

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/checkpoints/__init__.py

Comment thread tensorrt_llm/_torch/models/checkpoints/hf/nemotron_nas_weight_mapper.py

Wanli-Jiang mentioned this pull request May 11, 2026

[https://nvbugs/6140399][fix] Added NemotronNASHfWeightMapper registered for DeciLMForCausalLM that overri #13851

Closed

2 tasks

syuoni approved these changes May 11, 2026

View reviewed changes

Wanli-Jiang force-pushed the user/williamj/fix-nemotraon-nas branch from 6b641f1 to 67e0a3a Compare May 12, 2026 01:59

[None][fix] Fix bugs related with nemotron-nas model

1b44911

* https://nvbugs/6070857 * https://nvbugs/6140399 * https://nvbugs/5648522 * https://nvbugs/6163032 Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang force-pushed the user/williamj/fix-nemotraon-nas branch from 67e0a3a to 1b44911 Compare May 13, 2026 04:47

Wanli-Jiang requested a review from lowsfer May 13, 2026 04:50

lowsfer approved these changes May 13, 2026

View reviewed changes

Merge branch 'main' into user/williamj/fix-nemotraon-nas

be7b388

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Merge branch 'main' into user/williamj/fix-nemotraon-nas

f00123d

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang enabled auto-merge (squash) May 18, 2026 05:54

Wanli-Jiang merged commit 7095183 into NVIDIA:main May 18, 2026
11 checks passed

KleinBlueC pushed a commit to KleinBlueC/TensorRT-LLM that referenced this pull request May 19, 2026

[None][fix] Fix bugs related with nemotron-nas model (NVIDIA#13968)

70df15f

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Conversation

Wanli-Jiang commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bugs to be resolved:

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Wanli-Jiang commented May 11, 2026

Uh oh!

tensorrt-cicd commented May 11, 2026

Uh oh!

tensorrt-cicd commented May 11, 2026

Uh oh!

Wanli-Jiang commented May 11, 2026

Uh oh!

Wanli-Jiang commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

Wanli-Jiang commented May 13, 2026

Uh oh!

tensorrt-cicd commented May 13, 2026

Uh oh!

tensorrt-cicd commented May 13, 2026

Uh oh!

Wanli-Jiang commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

Wanli-Jiang commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

Wanli-Jiang commented May 16, 2026

Uh oh!

tensorrt-cicd commented May 16, 2026

Uh oh!

tensorrt-cicd commented May 16, 2026

Uh oh!

Wanli-Jiang commented May 16, 2026

Uh oh!

tensorrt-cicd commented May 16, 2026

Uh oh!

tensorrt-cicd commented May 16, 2026

Uh oh!

Wanli-Jiang commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

Wanli-Jiang commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Wanli-Jiang commented May 11, 2026 •

edited

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading