[TRTLLM-11768][fix] Config updates to enable NVFP4 by 2ez4bz · Pull Request #12776 · NVIDIA/TensorRT-LLM

2ez4bz · 2026-04-06T18:32:35Z

Summary by CodeRabbit

Bug Fixes

Enhanced quantization configuration handling for Nemotron Nano VL V2 model. During initialization, quantization settings are now automatically normalized to ensure module name patterns correctly align with the inner language model's namespace structure. This improvement enables accurate configuration application throughout the model hierarchy and enhances stability during quantization operations.

Description

Why?

The Nemotron Nano VL model checkpoints for NVFP4 could not be loaded
into TRT-LLM.

What?

Makes the necessary config parsing changes to fix this.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-04-06T18:37:16Z

📝 Walkthrough

Walkthrough

Added quantization configuration normalization to NemotronH_Nano_VL_V2 initialization. A new static method removes the "language_model." prefix from quantization module name patterns to align with the inner LLM module namespace before model construction.

Changes

Cohort / File(s)	Summary
Quantization Configuration Normalization `tensorrt_llm/_torch/models/modeling_nemotron_nano.py`	Added `_update_config_for_quantization` static method that normalizes quantization settings by removing `"language_model."` prefix from `exclude_modules` entries and `quant_config_dict` keys. Integrated into `__init__` before LLM model construction with temporary config unfreezing to allow mutations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description explains the problem (NVFP4 checkpoints couldn't load) and solution (config parsing changes), but the Test Coverage section is empty with only a comment placeholder.	Provide specific test cases that validate NVFP4 checkpoint loading works correctly after the config changes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly describes the main change: enabling NVFP4 support through config updates. It follows the required format with JIRA ticket and type indicator.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py`:
- Around line 1487-1492: The code unconditionally sets llm_model_config._frozen
= True after remapping quant_config_dict which can change caller-visible state
and leave the config in the wrong frozen state if an exception occurs; fix it by
saving the original value (orig = llm_model_config._frozen), set
llm_model_config._frozen = False before modifying
llm_model_config.quant_config_dict, perform the remap in a try block, and in a
finally block restore llm_model_config._frozen = orig so the original frozen
state is preserved even on exceptions (referencing llm_model_config, _frozen,
quant_config_dict and _LM_PREFIX).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 49b54381-a07b-4e0b-b61e-e95a519c7cc0

📥 Commits

Reviewing files that changed from the base of the PR and between d0c8c5b and 5a3ca0b.

📒 Files selected for processing (1)

tensorrt_llm/_torch/models/modeling_nemotron_nano.py

tensorrt_llm/_torch/models/modeling_nemotron_nano.py

* Why? The Nemotron Nano VL model checkpoints for NVFP4 could not be loaded into TRT-LLM. * What? Makes the necessary config parsing changes to fix this. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz · 2026-04-06T18:45:03Z

/bot run --disable-fail-fast

2ez4bz · 2026-04-06T19:09:39Z

/bot run --disable-fail-fast

2ez4bz · 2026-04-06T19:15:11Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-06T19:21:23Z

PR_Github #41978 [ run ] triggered by Bot. Commit: ab7890f Link to invocation

tensorrt-cicd · 2026-04-06T23:35:10Z

PR_Github #41978 [ run ] completed with state SUCCESS. Commit: ab7890f
/LLM/main/L0_MergeRequest_PR pipeline #32830 completed with status: 'SUCCESS'

CI Report

Link to invocation

* Why? The Nemotron Nano VL model checkpoints for NVFP4 could not be loaded into TRT-LLM. * What? This commit makes the necessary config parsing changes to fix this. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz requested a review from a team as a code owner April 6, 2026 18:32

2ez4bz requested a review from Wanli-Jiang April 6, 2026 18:32

github-actions bot assigned 2ez4bz Apr 6, 2026

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

tensorrt_llm/_torch/models/modeling_nemotron_nano.py Outdated Show resolved Hide resolved

2ez4bz force-pushed the dev-nano-v3-fp4 branch from 5a3ca0b to 991aa38 Compare April 6, 2026 18:39

[TRTLLM-11768][fix] Config updates to enable NVFP4

ab7890f

* Why? The Nemotron Nano VL model checkpoints for NVFP4 could not be loaded into TRT-LLM. * What? Makes the necessary config parsing changes to fix this. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz force-pushed the dev-nano-v3-fp4 branch from 991aa38 to ab7890f Compare April 6, 2026 18:44

Wanli-Jiang approved these changes Apr 7, 2026

View reviewed changes

2ez4bz merged commit ba5c79c into NVIDIA:main Apr 7, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRTLLM-11768][fix] Config updates to enable NVFP4#12776

[TRTLLM-11768][fix] Config updates to enable NVFP4#12776
2ez4bz merged 1 commit intoNVIDIA:mainfrom
2ez4bz:dev-nano-v3-fp4

2ez4bz commented Apr 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 6, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

2ez4bz commented Apr 6, 2026

Uh oh!

2ez4bz commented Apr 6, 2026

Uh oh!

2ez4bz commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

2ez4bz commented Apr 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Apr 6, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

2ez4bz commented Apr 6, 2026

Uh oh!

2ez4bz commented Apr 6, 2026

Uh oh!

2ez4bz commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

2ez4bz commented Apr 6, 2026 •

edited by coderabbitai bot

Loading