Skip to content

[TRTLLM-11768][fix] Config updates to enable NVFP4#12776

Merged
2ez4bz merged 1 commit intoNVIDIA:mainfrom
2ez4bz:dev-nano-v3-fp4
Apr 7, 2026
Merged

[TRTLLM-11768][fix] Config updates to enable NVFP4#12776
2ez4bz merged 1 commit intoNVIDIA:mainfrom
2ez4bz:dev-nano-v3-fp4

Conversation

@2ez4bz
Copy link
Copy Markdown
Collaborator

@2ez4bz 2ez4bz commented Apr 6, 2026

Summary by CodeRabbit

Bug Fixes

  • Enhanced quantization configuration handling for Nemotron Nano VL V2 model. During initialization, quantization settings are now automatically normalized to ensure module name patterns correctly align with the inner language model's namespace structure. This improvement enables accurate configuration application throughout the model hierarchy and enhances stability during quantization operations.

Description

  • Why?

The Nemotron Nano VL model checkpoints for NVFP4 could not be loaded
into TRT-LLM.

  • What?

Makes the necessary config parsing changes to fix this.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@2ez4bz 2ez4bz requested a review from a team as a code owner April 6, 2026 18:32
@2ez4bz 2ez4bz requested a review from Wanli-Jiang April 6, 2026 18:32
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

Added quantization configuration normalization to NemotronH_Nano_VL_V2 initialization. A new static method removes the "language_model." prefix from quantization module name patterns to align with the inner LLM module namespace before model construction.

Changes

Cohort / File(s) Summary
Quantization Configuration Normalization
tensorrt_llm/_torch/models/modeling_nemotron_nano.py
Added _update_config_for_quantization static method that normalizes quantization settings by removing "language_model." prefix from exclude_modules entries and quant_config_dict keys. Integrated into __init__ before LLM model construction with temporary config unfreezing to allow mutations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description explains the problem (NVFP4 checkpoints couldn't load) and solution (config parsing changes), but the Test Coverage section is empty with only a comment placeholder. Provide specific test cases that validate NVFP4 checkpoint loading works correctly after the config changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly describes the main change: enabling NVFP4 support through config updates. It follows the required format with JIRA ticket and type indicator.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py`:
- Around line 1487-1492: The code unconditionally sets llm_model_config._frozen
= True after remapping quant_config_dict which can change caller-visible state
and leave the config in the wrong frozen state if an exception occurs; fix it by
saving the original value (orig = llm_model_config._frozen), set
llm_model_config._frozen = False before modifying
llm_model_config.quant_config_dict, perform the remap in a try block, and in a
finally block restore llm_model_config._frozen = orig so the original frozen
state is preserved even on exceptions (referencing llm_model_config, _frozen,
quant_config_dict and _LM_PREFIX).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 49b54381-a07b-4e0b-b61e-e95a519c7cc0

📥 Commits

Reviewing files that changed from the base of the PR and between d0c8c5b and 5a3ca0b.

📒 Files selected for processing (1)
  • tensorrt_llm/_torch/models/modeling_nemotron_nano.py

@2ez4bz 2ez4bz force-pushed the dev-nano-v3-fp4 branch from 5a3ca0b to 991aa38 Compare April 6, 2026 18:39
* Why?

The Nemotron Nano VL model checkpoints for NVFP4 could not be loaded
into TRT-LLM.

* What?

Makes the necessary config parsing changes to fix this.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
@2ez4bz 2ez4bz force-pushed the dev-nano-v3-fp4 branch from 991aa38 to ab7890f Compare April 6, 2026 18:44
@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 6, 2026

/bot run --disable-fail-fast

2 similar comments
@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 6, 2026

/bot run --disable-fail-fast

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 6, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41978 [ run ] triggered by Bot. Commit: ab7890f Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41978 [ run ] completed with state SUCCESS. Commit: ab7890f
/LLM/main/L0_MergeRequest_PR pipeline #32830 completed with status: 'SUCCESS'

CI Report

Link to invocation

@2ez4bz 2ez4bz merged commit ba5c79c into NVIDIA:main Apr 7, 2026
5 checks passed
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026
* Why?

The Nemotron Nano VL model checkpoints for NVFP4 could not be loaded
into TRT-LLM.

* What?

This commit makes the necessary config parsing changes to fix this.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
* Why?

The Nemotron Nano VL model checkpoints for NVFP4 could not be loaded
into TRT-LLM.

* What?

This commit makes the necessary config parsing changes to fix this.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
suyoggupta pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Apr 8, 2026
* Why?

The Nemotron Nano VL model checkpoints for NVFP4 could not be loaded
into TRT-LLM.

* What?

This commit makes the necessary config parsing changes to fix this.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants