[None][fix] Qwen3.5 dense weight loading#13090
Conversation
📝 WalkthroughWalkthroughAdds support for Qwen3.5-4B model with preprocessing for ModelOpt FP8 checkpoints, including weight key remapping and tensor dimension squeezing. Includes GSM8K accuracy reference and corresponding test class with integration test entries. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py (1)
284-286: Avoid recompiling the dense-MLP regex on every call.Move
_DENSE_MLP_PATTERNto a class-level compiled constant to reduce repeated regex compilation overhead in hot paths.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py` around lines 284 - 286, The regex _DENSE_MLP_PATTERN is being compiled inside a hot path; move its compilation out of the function scope and into a module- or class-level constant (e.g., define _DENSE_MLP_PATTERN = re.compile(... ) at top-level or as a class attribute in the weight mapper) and update any functions/methods that currently recompile it to reference that single precompiled symbol instead (search for uses of "_DENSE_MLP_PATTERN" in qwen3_5_weight_mapper and replace the local compilation with the shared constant).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`:
- Around line 60-70: The ModelOpt detection in _preprocess_modelopt_ckpt is too
broad and can flip true for unrelated "weight_scale" keys and also overwrites
remapped entries; change the logic to first skip visual/irrelevant tensors (same
filter used elsewhere in this file) before testing for modelopt keys, only set
is_modelopt_ckpt when you actually perform the "weight_scale" ->
"weight_scale_inv" remap for a tensor that matched the ModelOpt pattern, and
avoid clobbering by checking if the remapped key already exists (keep the
existing tensor or prefer the original variant deterministically). Apply the
same guarded logic to the other analogous block later in the file (the similar
code at the other occurrence).
In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Around line 5716-5720: EXTRA_EVALUATOR_KWARGS is a mutable class-level dict
causing RUF012; replace it with a method or property that constructs and returns
a fresh dict each time (e.g., rename to extra_evaluator_kwargs() or an `@property`
extra_evaluator_kwargs) so callers receive a new dict instead of shared state;
update all usages to call the method/property and ensure the returned dict
contains the same keys apply_chat_template, fewshot_as_multiturn, and
chat_template_kwargs with enable_thinking=False.
---
Nitpick comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`:
- Around line 284-286: The regex _DENSE_MLP_PATTERN is being compiled inside a
hot path; move its compilation out of the function scope and into a module- or
class-level constant (e.g., define _DENSE_MLP_PATTERN = re.compile(... ) at
top-level or as a class attribute in the weight mapper) and update any
functions/methods that currently recompile it to reference that single
precompiled symbol instead (search for uses of "_DENSE_MLP_PATTERN" in
qwen3_5_weight_mapper and replace the local compilation with the shared
constant).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 28725b5f-4065-4ac8-8af4-4dc81eea4e69
📒 Files selected for processing (5)
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.pytests/integration/defs/accuracy/references/gsm8k.yamltests/integration/defs/accuracy/test_llm_api_pytorch.pytests/integration/test_lists/qa/llm_function_core.txttests/integration/test_lists/test-db/l0_h100.yml
|
/bot run |
|
PR_Github #44068 [ run ] triggered by Bot. Commit: |
|
PR_Github #44068 [ run ] completed with state |
xinhe-nv
left a comment
There was a problem hiding this comment.
when my comment is resolved, approve this pr.
|
/bot run |
|
PR_Github #45429 [ run ] triggered by Bot. Commit: |
|
PR_Github #45429 [ run ] completed with state
|
734d967 to
74dd397
Compare
|
/bot help |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot run --stage-list "DGX_H100-PyTorch-1" |
|
PR_Github #45958 [ run ] triggered by Bot. Commit: |
|
/bot kill |
|
/bot run |
|
PR_Github #45980 [ kill ] triggered by Bot. Commit: |
|
PR_Github #45958 [ run ] completed with state |
|
PR_Github #45980 [ kill ] completed with state |
|
PR_Github #45981 [ run ] triggered by Bot. Commit: |
|
PR_Github #45981 [ run ] completed with state
|
|
/bot run |
|
PR_Github #46006 [ run ] triggered by Bot. Commit: |
|
PR_Github #46006 [ run ] completed with state
|
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
|
/bot run |
|
PR_Github #46173 [ run ] triggered by Bot. Commit: |
|
PR_Github #46173 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #46237 [ run ] triggered by Bot. Commit: |
|
PR_Github #46237 [ run ] completed with state |
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Summary by CodeRabbit
New Features
Tests
Description
Minor fixes to weight remap logic to handle BF16 (HF original) and blockwise FP8 checkpoint (from modelopt).
Test Coverage
Added new test:
accuracy/test_llm_api_pytorch.py::TestQwen3_5_4B::test_bf16PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.