Skip to content

[None][fix] Fix bugs related with nemotron-nas model#13968

Merged
Wanli-Jiang merged 3 commits into
NVIDIA:mainfrom
Wanli-Jiang:user/williamj/fix-nemotraon-nas
May 18, 2026
Merged

[None][fix] Fix bugs related with nemotron-nas model#13968
Wanli-Jiang merged 3 commits into
NVIDIA:mainfrom
Wanli-Jiang:user/williamj/fix-nemotraon-nas

Conversation

@Wanli-Jiang
Copy link
Copy Markdown
Collaborator

@Wanli-Jiang Wanli-Jiang commented May 11, 2026

Bugs to be resolved:

Summary by CodeRabbit

  • New Features

    • Added support for Nemotron NAS language model architecture with proper weight mapping.
  • Bug Fixes

    • Fixed KV cache transfer computation to correctly handle empty memory pools.
  • Tests

    • Re-enabled Nemotron NAS integration and LoRA tests for improved test coverage.

Review Change Stack

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@Wanli-Jiang Wanli-Jiang requested review from a team as code owners May 11, 2026 05:39
@Wanli-Jiang Wanli-Jiang requested a review from syuoni May 11, 2026 05:39
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

📝 Walkthrough

Walkthrough

This PR introduces a new Hugging Face weight mapper for the Nemotron NAS model (DeciLMForCausalLM) that handles per-layer KV-head configurations, exports it from the checkpoints module, enables previously-waived tests, and includes a defensive fix for KV cache transfer buffer handling with zero-sized buffers.

Changes

Nemotron NAS Weight Mapper & Test Enablement

Layer / File(s) Summary
Weight Mapper Export
tensorrt_llm/_torch/models/checkpoints/__init__.py
NemotronNasHfWeightMapper is imported from the hf module and added to the public __all__ export list.
Mapper Implementation
tensorrt_llm/_torch/models/checkpoints/hf/nemotron_nas_weight_mapper.py
NemotronNasHfWeightMapper class overrides apply_callbacks to handle per-layer num_key_value_heads by inferring the layer index from module names. Helper methods _layer_idx_from_breakdown and _duplicate_kv_weights_for_layer extract the layer index and duplicate k_proj/v_proj weights (including weight_scale when NVFP4 quantization is active) using per-layer KV-head counts.
Test Waiver Removal & Parameter Adjustment
tests/integration/test_lists/waives.txt, tests/unittest/llmapi/test_llm_pytorch.py
Test waiver entries for Nemotron NAS tests are removed, and test_nemotron_nas_lora is configured to generate fewer tokens (max_tokens reduced from 10 to 3).

KV Cache Transfer Manager Empty Buffer Handling

Layer / File(s) Summary
Empty Buffer Guard
cpp/tensorrt_llm/batch_manager/kvCacheTransferManager.cpp
The guard clause in computeBlockTransferBytes is expanded to skip pools when pool.primaryPtr is null OR has zero size, preventing transfer byte-count calculations for empty buffers.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The pull request description is largely incomplete, with only NVBugs tickets listed and all required sections left blank or as template placeholders. Complete the Description section explaining the issue and solution, list relevant test cases under Test Coverage, and ensure all PR checklist items are properly addressed and explained.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title mentions 'nemotron-nas model' and 'fix bugs', which directly aligns with the primary changes: adding NemotronNasHfWeightMapper, fixing KV cache transfer logic, and resolving related test issues.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/unittest/llmapi/test_llm_pytorch.py (1)

620-620: LGTM — tightening max_tokens here is a reasonable stabilization for test_nemotron_nas_lora.

QA note: this is unittest-only scope (tests/unittest/...), so no tests/integration/test_lists/qa/* update is needed in this PR.

As per coding guidelines tests/**: "If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/llmapi/test_llm_pytorch.py` at line 620, This change tightens
SamplingParams(max_tokens=3, add_special_tokens=False) in the unit test
test_nemotron_nas_lora and is limited to tests/** unit scope; update the PR
description (or PR body) to explicitly state that the change only touches
unittest scope (tests/unittest/...), that QA list updates under
tests/integration/... are unnecessary, and therefore no QA list changes are
required for this PR so reviewers and QA know it's out-of-scope for integration
QA updates.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/models/checkpoints/__init__.py`:
- Line 10: Update this modified file to include the required NVIDIA source-file
header at the top of the file: add or update the standard copyright block (with
the current year and NVIDIA ownership) before any imports/definitions so the
file (which imports NemotronNasHfWeightMapper) conforms to the project's header
policy for Python files; ensure the header format matches other files in the
repo (same wording and year) and place it above the existing "from
.hf.nemotron_nas_weight_mapper import NemotronNasHfWeightMapper" line.

In `@tensorrt_llm/_torch/models/checkpoints/hf/nemotron_nas_weight_mapper.py`:
- Line 1: Add the repository-required NVIDIA copyright header to the top of the
new Python module nemotron_nas_weight_mapper.py (module:
tensorrt_llm._torch.models.checkpoints.hf.nemotron_nas_weight_mapper); insert
the standard NVIDIA header block used across the repo (with the appropriate
year) as the very first lines of the file before any imports (e.g., before "from
torch import nn") so the file complies with the coding guidelines for new .py
files.

---

Nitpick comments:
In `@tests/unittest/llmapi/test_llm_pytorch.py`:
- Line 620: This change tightens SamplingParams(max_tokens=3,
add_special_tokens=False) in the unit test test_nemotron_nas_lora and is limited
to tests/** unit scope; update the PR description (or PR body) to explicitly
state that the change only touches unittest scope (tests/unittest/...), that QA
list updates under tests/integration/... are unnecessary, and therefore no QA
list changes are required for this PR so reviewers and QA know it's out-of-scope
for integration QA updates.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ec64ed26-ce60-4e44-9be0-f35606a37157

📥 Commits

Reviewing files that changed from the base of the PR and between ee1fb68 and 6b641f1.

📒 Files selected for processing (5)
  • cpp/tensorrt_llm/batch_manager/kvCacheTransferManager.cpp
  • tensorrt_llm/_torch/models/checkpoints/__init__.py
  • tensorrt_llm/_torch/models/checkpoints/hf/nemotron_nas_weight_mapper.py
  • tests/integration/test_lists/waives.txt
  • tests/unittest/llmapi/test_llm_pytorch.py
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/waives.txt

Comment thread tensorrt_llm/_torch/models/checkpoints/__init__.py
@Wanli-Jiang
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47669 [ run ] triggered by Bot. Commit: 6b641f1 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47669 [ run ] completed with state SUCCESS. Commit: 6b641f1
/LLM/main/L0_MergeRequest_PR pipeline #37569 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@Wanli-Jiang
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@Wanli-Jiang Wanli-Jiang force-pushed the user/williamj/fix-nemotraon-nas branch from 6b641f1 to 67e0a3a Compare May 12, 2026 01:59
@Wanli-Jiang
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47825 [ run ] triggered by Bot. Commit: 67e0a3a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47825 [ run ] completed with state SUCCESS. Commit: 67e0a3a
/LLM/main/L0_MergeRequest_PR pipeline #37710 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

* https://nvbugs/6070857
* https://nvbugs/6140399
* https://nvbugs/5648522
* https://nvbugs/6163032

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
@Wanli-Jiang Wanli-Jiang force-pushed the user/williamj/fix-nemotraon-nas branch from 67e0a3a to 1b44911 Compare May 13, 2026 04:47
@Wanli-Jiang
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@Wanli-Jiang Wanli-Jiang requested a review from lowsfer May 13, 2026 04:50
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48101 [ run ] triggered by Bot. Commit: 1b44911 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48101 [ run ] completed with state FAILURE. Commit: 1b44911
/LLM/main/L0_MergeRequest_PR pipeline #37931 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
@Wanli-Jiang
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48343 [ run ] triggered by Bot. Commit: be7b388 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48343 [ run ] completed with state SUCCESS. Commit: be7b388
/LLM/main/L0_MergeRequest_PR pipeline #38151 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@Wanli-Jiang
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48524 [ run ] triggered by Bot. Commit: be7b388 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48524 [ run ] completed with state FAILURE. Commit: be7b388
/LLM/main/L0_MergeRequest_PR pipeline #38317 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
@Wanli-Jiang
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48668 [ run ] triggered by Bot. Commit: f00123d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48668 [ run ] completed with state SUCCESS. Commit: f00123d
/LLM/main/L0_MergeRequest_PR pipeline #38449 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@Wanli-Jiang
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48686 [ run ] triggered by Bot. Commit: f00123d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48686 [ run ] completed with state ABORTED. Commit: f00123d

Link to invocation

@Wanli-Jiang
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48815 [ run ] triggered by Bot. Commit: f00123d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48815 [ run ] completed with state SUCCESS. Commit: f00123d
/LLM/main/L0_MergeRequest_PR pipeline #38577 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@Wanli-Jiang
Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "all related tests were passed"

@Wanli-Jiang Wanli-Jiang enabled auto-merge (squash) May 18, 2026 05:54
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48842 [ skip ] triggered by Bot. Commit: f00123d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48842 [ skip ] completed with state SUCCESS. Commit: f00123d
Skipping testing for commit f00123d

Link to invocation

@Wanli-Jiang Wanli-Jiang merged commit 7095183 into NVIDIA:main May 18, 2026
11 checks passed
KleinBlueC pushed a commit to KleinBlueC/TensorRT-LLM that referenced this pull request May 19, 2026
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants