Skip to content

[None][fix] Qwen3.5 dense weight loading#13090

Merged
amukkara merged 3 commits intoNVIDIA:mainfrom
amukkara:qwen3.5-dense
Apr 30, 2026
Merged

[None][fix] Qwen3.5 dense weight loading#13090
amukkara merged 3 commits intoNVIDIA:mainfrom
amukkara:qwen3.5-dense

Conversation

@amukkara
Copy link
Copy Markdown
Collaborator

@amukkara amukkara commented Apr 15, 2026

Summary by CodeRabbit

  • New Features

    • Added support for Qwen/Qwen3.5-4B model with improved checkpoint handling
    • Enhanced weight mapping for ModelOpt FP8 checkpoints with automatic key remapping
  • Tests

    • Established accuracy benchmark of 81.0% for Qwen3.5-4B on GSM8K task
    • Added BF16 precision tests with optimized memory and batch configuration

Description

Minor fixes to weight remap logic to handle BF16 (HF original) and blockwise FP8 checkpoint (from modelopt).

Test Coverage

Added new test: accuracy/test_llm_api_pytorch.py::TestQwen3_5_4B::test_bf16

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@amukkara amukkara marked this pull request as ready for review April 15, 2026 23:33
@amukkara amukkara requested review from a team as code owners April 15, 2026 23:33
@amukkara amukkara requested a review from symphonylyh April 15, 2026 23:33
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

📝 Walkthrough

Walkthrough

Adds support for Qwen3.5-4B model with preprocessing for ModelOpt FP8 checkpoints, including weight key remapping and tensor dimension squeezing. Includes GSM8K accuracy reference and corresponding test class with integration test entries.

Changes

Cohort / File(s) Summary
Weight Mapper Preprocessing
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
Added detection and preprocessing for ModelOpt FP8 checkpoints with weight_scaleweight_scale_inv remapping and singleton dimension squeezing. Added conditional skip of FP8 attention dequantization for ModelOpt checkpoints. Added MLP weight key namespace remapping for non-MoE models to insert .mlp. between layer prefix and projection names.
Accuracy Benchmarking
tests/integration/defs/accuracy/references/gsm8k.yaml
Added GSM8K accuracy reference entry for Qwen/Qwen3.5-4B model with 81.0 accuracy baseline.
Test Class & Integration
tests/integration/defs/accuracy/test_llm_api_pytorch.py, tests/integration/test_lists/qa/llm_function_core.txt, tests/integration/test_lists/test-db/l0_h100.yml
Added TestQwen3_5_4B test class with test_bf16 method using KvCacheConfig and CudaGraphConfig, with chat template and GSM8K evaluation. Registered test in both QA core and H100 test lists.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: a fix for Qwen3.5 dense weight loading, which matches the primary purpose of the weight mapper modifications.
Description check ✅ Passed The description includes a brief explanation of the fix (handling BF16 and blockwise FP8 checkpoints) and lists the test coverage added (TestQwen3_5_4B::test_bf16 test), with the PR checklist marked complete.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py (1)

284-286: Avoid recompiling the dense-MLP regex on every call.

Move _DENSE_MLP_PATTERN to a class-level compiled constant to reduce repeated regex compilation overhead in hot paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py` around
lines 284 - 286, The regex _DENSE_MLP_PATTERN is being compiled inside a hot
path; move its compilation out of the function scope and into a module- or
class-level constant (e.g., define _DENSE_MLP_PATTERN = re.compile(... ) at
top-level or as a class attribute in the weight mapper) and update any
functions/methods that currently recompile it to reference that single
precompiled symbol instead (search for uses of "_DENSE_MLP_PATTERN" in
qwen3_5_weight_mapper and replace the local compilation with the shared
constant).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`:
- Around line 60-70: The ModelOpt detection in _preprocess_modelopt_ckpt is too
broad and can flip true for unrelated "weight_scale" keys and also overwrites
remapped entries; change the logic to first skip visual/irrelevant tensors (same
filter used elsewhere in this file) before testing for modelopt keys, only set
is_modelopt_ckpt when you actually perform the "weight_scale" ->
"weight_scale_inv" remap for a tensor that matched the ModelOpt pattern, and
avoid clobbering by checking if the remapped key already exists (keep the
existing tensor or prefer the original variant deterministically). Apply the
same guarded logic to the other analogous block later in the file (the similar
code at the other occurrence).

In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Around line 5716-5720: EXTRA_EVALUATOR_KWARGS is a mutable class-level dict
causing RUF012; replace it with a method or property that constructs and returns
a fresh dict each time (e.g., rename to extra_evaluator_kwargs() or an `@property`
extra_evaluator_kwargs) so callers receive a new dict instead of shared state;
update all usages to call the method/property and ensure the returned dict
contains the same keys apply_chat_template, fewshot_as_multiturn, and
chat_template_kwargs with enable_thinking=False.

---

Nitpick comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`:
- Around line 284-286: The regex _DENSE_MLP_PATTERN is being compiled inside a
hot path; move its compilation out of the function scope and into a module- or
class-level constant (e.g., define _DENSE_MLP_PATTERN = re.compile(... ) at
top-level or as a class attribute in the weight mapper) and update any
functions/methods that currently recompile it to reference that single
precompiled symbol instead (search for uses of "_DENSE_MLP_PATTERN" in
qwen3_5_weight_mapper and replace the local compilation with the shared
constant).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 28725b5f-4065-4ac8-8af4-4dc81eea4e69

📥 Commits

Reviewing files that changed from the base of the PR and between 51f7956 and b0fcba5.

📒 Files selected for processing (5)
  • tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
  • tests/integration/defs/accuracy/references/gsm8k.yaml
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
  • tests/integration/test_lists/qa/llm_function_core.txt
  • tests/integration/test_lists/test-db/l0_h100.yml

Comment thread tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch.py
@amukkara amukkara requested a review from a team as a code owner April 17, 2026 19:46
@amukkara amukkara requested review from moraxu and tijyojwad April 17, 2026 19:46
@amukkara
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44068 [ run ] triggered by Bot. Commit: a0f6438 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44068 [ run ] completed with state SUCCESS. Commit: a0f6438
/LLM/main/L0_MergeRequest_PR pipeline #34498 completed with status: 'SUCCESS'

CI Report

Link to invocation

Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch.py Outdated
Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch.py
Copy link
Copy Markdown
Collaborator

@xinhe-nv xinhe-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when my comment is resolved, approve this pr.

@amukkara
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45429 [ run ] triggered by Bot. Commit: b29269d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45429 [ run ] completed with state SUCCESS. Commit: b29269d
/LLM/main/L0_MergeRequest_PR pipeline #35662 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@amukkara amukkara force-pushed the qwen3.5-dense branch 2 times, most recently from 734d967 to 74dd397 Compare April 28, 2026 16:11
@amukkara
Copy link
Copy Markdown
Collaborator Author

/bot help

@github-actions
Copy link
Copy Markdown

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@amukkara
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list "DGX_H100-PyTorch-1"

@amukkara amukkara enabled auto-merge (squash) April 28, 2026 16:15
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45958 [ run ] triggered by Bot. Commit: 74dd397 Link to invocation

@amukkara
Copy link
Copy Markdown
Collaborator Author

/bot kill

@amukkara
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45980 [ kill ] triggered by Bot. Commit: 74dd397 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45958 [ run ] completed with state ABORTED. Commit: 74dd397

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45980 [ kill ] completed with state SUCCESS. Commit: 74dd397
Successfully killed previous jobs for commit 74dd397

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45981 [ run ] triggered by Bot. Commit: 74dd397 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45981 [ run ] completed with state FAILURE. Commit: 74dd397
/LLM/main/L0_MergeRequest_PR pipeline #36132 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@amukkara
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46006 [ run ] triggered by Bot. Commit: 74dd397 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46006 [ run ] completed with state FAILURE. Commit: 74dd397
/LLM/main/L0_MergeRequest_PR pipeline #36155 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
@amukkara
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46173 [ run ] triggered by Bot. Commit: f8ace08 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46173 [ run ] completed with state SUCCESS. Commit: f8ace08
/LLM/main/L0_MergeRequest_PR pipeline #36293 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@amukkara
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46237 [ run ] triggered by Bot. Commit: f8ace08 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46237 [ run ] completed with state SUCCESS. Commit: f8ace08
/LLM/main/L0_MergeRequest_PR pipeline #36347 completed with status: 'SUCCESS'

CI Report

Link to invocation

@amukkara amukkara merged commit f74ec9a into NVIDIA:main Apr 30, 2026
6 checks passed
@amukkara amukkara deleted the qwen3.5-dense branch April 30, 2026 19:09
evezhier pushed a commit to evezhier/TensorRT-LLM that referenced this pull request May 4, 2026
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants