[None][fix] Qwen3.5 dense weight loading by amukkara · Pull Request #13090 · NVIDIA/TensorRT-LLM

amukkara · 2026-04-15T19:39:00Z

Summary by CodeRabbit

New Features
- Added support for Qwen/Qwen3.5-4B model with improved checkpoint handling
- Enhanced weight mapping for ModelOpt FP8 checkpoints with automatic key remapping
Tests
- Established accuracy benchmark of 81.0% for Qwen3.5-4B on GSM8K task
- Added BF16 precision tests with optimized memory and batch configuration

Description

Minor fixes to weight remap logic to handle BF16 (HF original) and blockwise FP8 checkpoint (from modelopt).

Test Coverage

Added new test: accuracy/test_llm_api_pytorch.py::TestQwen3_5_4B::test_bf16

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-04-15T23:39:01Z

📝 Walkthrough

Walkthrough

Adds support for Qwen3.5-4B model with preprocessing for ModelOpt FP8 checkpoints, including weight key remapping and tensor dimension squeezing. Includes GSM8K accuracy reference and corresponding test class with integration test entries.

Changes

Cohort / File(s)	Summary
Weight Mapper Preprocessing `tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`	Added detection and preprocessing for ModelOpt FP8 checkpoints with `weight_scale` → `weight_scale_inv` remapping and singleton dimension squeezing. Added conditional skip of FP8 attention dequantization for ModelOpt checkpoints. Added MLP weight key namespace remapping for non-MoE models to insert `.mlp.` between layer prefix and projection names.
Accuracy Benchmarking `tests/integration/defs/accuracy/references/gsm8k.yaml`	Added GSM8K accuracy reference entry for Qwen/Qwen3.5-4B model with 81.0 accuracy baseline.
Test Class & Integration `tests/integration/defs/accuracy/test_llm_api_pytorch.py`, `tests/integration/test_lists/qa/llm_function_core.txt`, `tests/integration/test_lists/test-db/l0_h100.yml`	Added `TestQwen3_5_4B` test class with `test_bf16` method using KvCacheConfig and CudaGraphConfig, with chat template and GSM8K evaluation. Registered test in both QA core and H100 test lists.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: a fix for Qwen3.5 dense weight loading, which matches the primary purpose of the weight mapper modifications.
Description check	✅ Passed	The description includes a brief explanation of the fix (handling BF16 and blockwise FP8 checkpoints) and lists the test coverage added (TestQwen3_5_4B::test_bf16 test), with the PR checklist marked complete.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py (1)

284-286: Avoid recompiling the dense-MLP regex on every call.

Move _DENSE_MLP_PATTERN to a class-level compiled constant to reduce repeated regex compilation overhead in hot paths.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py` around
lines 284 - 286, The regex _DENSE_MLP_PATTERN is being compiled inside a hot
path; move its compilation out of the function scope and into a module- or
class-level constant (e.g., define _DENSE_MLP_PATTERN = re.compile(... ) at
top-level or as a class attribute in the weight mapper) and update any
functions/methods that currently recompile it to reference that single
precompiled symbol instead (search for uses of "_DENSE_MLP_PATTERN" in
qwen3_5_weight_mapper and replace the local compilation with the shared
constant).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`:
- Around line 60-70: The ModelOpt detection in _preprocess_modelopt_ckpt is too
broad and can flip true for unrelated "weight_scale" keys and also overwrites
remapped entries; change the logic to first skip visual/irrelevant tensors (same
filter used elsewhere in this file) before testing for modelopt keys, only set
is_modelopt_ckpt when you actually perform the "weight_scale" ->
"weight_scale_inv" remap for a tensor that matched the ModelOpt pattern, and
avoid clobbering by checking if the remapped key already exists (keep the
existing tensor or prefer the original variant deterministically). Apply the
same guarded logic to the other analogous block later in the file (the similar
code at the other occurrence).

In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Around line 5716-5720: EXTRA_EVALUATOR_KWARGS is a mutable class-level dict
causing RUF012; replace it with a method or property that constructs and returns
a fresh dict each time (e.g., rename to extra_evaluator_kwargs() or an `@property`
extra_evaluator_kwargs) so callers receive a new dict instead of shared state;
update all usages to call the method/property and ensure the returned dict
contains the same keys apply_chat_template, fewshot_as_multiturn, and
chat_template_kwargs with enable_thinking=False.

---

Nitpick comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`:
- Around line 284-286: The regex _DENSE_MLP_PATTERN is being compiled inside a
hot path; move its compilation out of the function scope and into a module- or
class-level constant (e.g., define _DENSE_MLP_PATTERN = re.compile(... ) at
top-level or as a class attribute in the weight mapper) and update any
functions/methods that currently recompile it to reference that single
precompiled symbol instead (search for uses of "_DENSE_MLP_PATTERN" in
qwen3_5_weight_mapper and replace the local compilation with the shared
constant).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 28725b5f-4065-4ac8-8af4-4dc81eea4e69

📥 Commits

Reviewing files that changed from the base of the PR and between 51f7956 and b0fcba5.

📒 Files selected for processing (5)

tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
tests/integration/defs/accuracy/references/gsm8k.yaml
tests/integration/defs/accuracy/test_llm_api_pytorch.py
tests/integration/test_lists/qa/llm_function_core.txt
tests/integration/test_lists/test-db/l0_h100.yml

amukkara · 2026-04-17T19:47:58Z

/bot run

tensorrt-cicd · 2026-04-17T19:54:45Z

PR_Github #44068 [ run ] triggered by Bot. Commit: a0f6438 Link to invocation

tensorrt-cicd · 2026-04-18T13:41:55Z

PR_Github #44068 [ run ] completed with state SUCCESS. Commit: a0f6438
/LLM/main/L0_MergeRequest_PR pipeline #34498 completed with status: 'SUCCESS'

CI Report

Link to invocation

xinhe-nv

when my comment is resolved, approve this pr.

amukkara · 2026-04-24T17:38:02Z

/bot run

tensorrt-cicd · 2026-04-24T17:45:06Z

PR_Github #45429 [ run ] triggered by Bot. Commit: b29269d Link to invocation

tensorrt-cicd · 2026-04-24T18:44:25Z

PR_Github #45429 [ run ] completed with state SUCCESS. Commit: b29269d
/LLM/main/L0_MergeRequest_PR pipeline #35662 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

amukkara · 2026-04-28T16:12:17Z

/bot help

github-actions · 2026-04-28T16:12:27Z

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

amukkara · 2026-04-28T16:14:42Z

/bot run --stage-list "DGX_H100-PyTorch-1"

tensorrt-cicd · 2026-04-28T16:21:12Z

PR_Github #45958 [ run ] triggered by Bot. Commit: 74dd397 Link to invocation

amukkara · 2026-04-28T19:45:09Z

/bot kill

amukkara · 2026-04-28T19:53:11Z

/bot run

tensorrt-cicd · 2026-04-28T19:55:39Z

PR_Github #45980 [ kill ] triggered by Bot. Commit: 74dd397 Link to invocation

tensorrt-cicd · 2026-04-28T19:55:41Z

PR_Github #45958 [ run ] completed with state ABORTED. Commit: 74dd397

Link to invocation

tensorrt-cicd · 2026-04-28T19:56:12Z

PR_Github #45980 [ kill ] completed with state SUCCESS. Commit: 74dd397
Successfully killed previous jobs for commit 74dd397

Link to invocation

tensorrt-cicd · 2026-04-28T20:00:35Z

PR_Github #45981 [ run ] triggered by Bot. Commit: 74dd397 Link to invocation

tensorrt-cicd · 2026-04-29T00:16:43Z

PR_Github #45981 [ run ] completed with state FAILURE. Commit: 74dd397
/LLM/main/L0_MergeRequest_PR pipeline #36132 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

amukkara · 2026-04-29T00:32:26Z

/bot run

tensorrt-cicd · 2026-04-29T00:38:12Z

PR_Github #46006 [ run ] triggered by Bot. Commit: 74dd397 Link to invocation

tensorrt-cicd · 2026-04-29T02:13:28Z

PR_Github #46006 [ run ] completed with state FAILURE. Commit: 74dd397
/LLM/main/L0_MergeRequest_PR pipeline #36155 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>

amukkara · 2026-04-29T16:03:51Z

/bot run

tensorrt-cicd · 2026-04-29T16:13:30Z

PR_Github #46173 [ run ] triggered by Bot. Commit: f8ace08 Link to invocation

tensorrt-cicd · 2026-04-30T00:08:33Z

PR_Github #46173 [ run ] completed with state SUCCESS. Commit: f8ace08
/LLM/main/L0_MergeRequest_PR pipeline #36293 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

amukkara · 2026-04-30T00:12:22Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-30T00:19:13Z

PR_Github #46237 [ run ] triggered by Bot. Commit: f8ace08 Link to invocation

tensorrt-cicd · 2026-04-30T05:20:17Z

PR_Github #46237 [ run ] completed with state SUCCESS. Commit: f8ace08
/LLM/main/L0_MergeRequest_PR pipeline #36347 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>

github-actions Bot assigned amukkara Apr 15, 2026

amukkara marked this pull request as ready for review April 15, 2026 23:33

amukkara requested review from a team as code owners April 15, 2026 23:33

amukkara requested a review from symphonylyh April 15, 2026 23:33

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py

Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch.py

amukkara force-pushed the qwen3.5-dense branch from b0fcba5 to bbd7bbb Compare April 16, 2026 00:27

amukkara requested review from nv-guomingz and rosenrodt April 16, 2026 21:48

symphonylyh approved these changes Apr 17, 2026

View reviewed changes

amukkara requested review from jieli-matrix and xinhe-nv April 17, 2026 16:43

amukkara force-pushed the qwen3.5-dense branch from bbd7bbb to a0f6438 Compare April 17, 2026 19:46

amukkara requested a review from a team as a code owner April 17, 2026 19:46

amukkara requested review from moraxu and tijyojwad April 17, 2026 19:46

moraxu approved these changes Apr 20, 2026

View reviewed changes

tijyojwad reviewed Apr 20, 2026

View reviewed changes

Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch.py Outdated

xinhe-nv reviewed Apr 21, 2026

View reviewed changes

Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch.py

xinhe-nv approved these changes Apr 21, 2026

View reviewed changes

nv-guomingz approved these changes Apr 23, 2026

View reviewed changes

amukkara force-pushed the qwen3.5-dense branch from dd4fb34 to b29269d Compare April 24, 2026 17:21

amukkara force-pushed the qwen3.5-dense branch 2 times, most recently from 734d967 to 74dd397 Compare April 28, 2026 16:11

amukkara enabled auto-merge (squash) April 28, 2026 16:15

amukkara added 3 commits April 29, 2026 08:56

Fix Qwen3.5 dense weight loading

de347f8

Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>

Fixes for modelopt blockwise fp8 checkpoint

c8545de

Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>

Add tests

f8ace08

Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>

amukkara force-pushed the qwen3.5-dense branch from 74dd397 to f8ace08 Compare April 29, 2026 16:00

amukkara merged commit f74ec9a into NVIDIA:main Apr 30, 2026
6 checks passed

amukkara deleted the qwen3.5-dense branch April 30, 2026 19:09

evezhier pushed a commit to evezhier/TensorRT-LLM that referenced this pull request May 4, 2026

[None][fix] Qwen3.5 dense weight loading (NVIDIA#13090)

7ec35cb

Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>

Conversation

amukkara commented Apr 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Apr 15, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

amukkara commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 18, 2026

Uh oh!

Uh oh!

Uh oh!

xinhe-nv left a comment

Choose a reason for hiding this comment

Uh oh!

amukkara commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

amukkara commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

amukkara commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

amukkara commented Apr 28, 2026

Uh oh!

amukkara commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

amukkara commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

amukkara commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

amukkara commented Apr 30, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

amukkara commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading