[None][fix] Use bf16 for LTX-2 FP4 stage 2#13244
Conversation
47d49d4 to
b95bce5
Compare
|
/bot run |
📝 WalkthroughWalkthroughThis change extends LoRA delta application to handle FP4 quantization by detecting dynamic Changes
Sequence Diagram(s)sequenceDiagram
participant Client as LoRA Application
participant Detector as Module Detection
participant Storage as Parameter Storage
participant Stage2 as Stage 2 Execution
participant Restorer as State Restoration
Client->>Detector: Build module-path mapping
Detector->>Detector: Scan for NVFP4LinearMethod
Detector->>Storage: Detect FP4 Linear modules
Storage->>Storage: Save original quant_method
Storage->>Storage: Replace with BF16 tensor data
Storage->>Storage: Swap quant_method to UnquantizedLinearMethod
Storage-->>Stage2: Execute with BF16 weights
Note over Stage2: Stage 2 processes with BF16<br/>(FP4 quantization disabled)
Stage2-->>Restorer: Denoising complete
Restorer->>Restorer: Retrieve saved_state
Restorer->>Storage: Restore original quant_method
Restorer->>Storage: Restore original parameter data
Storage-->>Client: State fully recovered
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2_two_stages.py`:
- Around line 518-523: The current loop that handles names starting with
"__quant_method__" silently skips when the target module is missing or
incompatible; change the handler so that when encountering a "__quant_method__"
entry you lookup mod via module_dict (as currently), then if mod is None or not
an instance of the expected Linear class (or does not support quant_method)
raise an explicit error instead of continuing; otherwise set mod.quant_method =
data. Ensure the error message mentions the offending module path and the
expected type (e.g., Linear) so restoration fails fast rather than leaving
modules on UnquantizedLinearMethod with mismatched packed weights.
- Around line 471-483: The code mutates param.data to bf16 before verifying the
parent Linear (module_dict.get(base)) exists, which can leave the tensor in an
incompatible layout if lookup fails; change the logic in
pipeline_ltx2_two_stages.py to first resolve and validate linear_mod (ensure
isinstance(linear_mod, Linear)), raise an exception or abort if not found, and
only then set param.data = bf16 and update
saved_state[f"__quant_method__{base}"] and linear_mod.quant_method =
UnquantizedLinearMethod(); do not keep the current warn-and-continue behavior
(logger.warning) because it leaves the model in a broken state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: e5920e3c-e051-490d-bbec-07975ddac2ea
📒 Files selected for processing (1)
tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2_two_stages.py
|
PR_Github #44784 [ run ] triggered by Bot. Commit: |
|
PR_Github #44784 [ run ] completed with state
|
|
/bot run |
|
PR_Github #44961 [ run ] triggered by Bot. Commit: |
3b6f04c to
9f39104
Compare
|
/bot run |
|
PR_Github #45356 [ run ] triggered by Bot. Commit: |
|
PR_Github #45356 [ run ] completed with state |
chang-l
left a comment
There was a problem hiding this comment.
Thanks Yibin.
Can you also include some perf data implication in the PR description for records? Also, do we have 2-stage lora w/ NVFP4 E2E accuracy protected in CI?
|
How do we test 2 stage implementation? For example, we are saying that "FP4 stage 2" accuracy is not good enough, and we switch to BF16 in this PR. How do we prove that this PR improves the accuracy? |
244b19a to
94d0223
Compare
|
/bot run |
|
PR_Github #46037 [ run ] triggered by Bot. Commit: |
|
PR_Github #46037 [ run ] completed with state
|
|
/bot run |
|
PR_Github #46155 [ run ] triggered by Bot. Commit: |
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
15664e7 to
258c8a0
Compare
|
/bot kill |
|
PR_Github #46160 [ kill ] triggered by Bot. Commit: |
|
PR_Github #46155 [ run ] completed with state |
|
PR_Github #46160 [ kill ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #46172 [ run ] triggered by Bot. Commit: |
|
PR_Github #46172 [ run ] completed with state
|
|
/bot run |
|
PR_Github #46297 [ run ] triggered by Bot. Commit: |
|
PR_Github #46297 [ run ] completed with state |
Summary by CodeRabbit
Description
Image Quality
See videos for BF16 / FP4 stage 2 comparison on dynamic FP4 and static FP4. BF16 stage 2 is visually better.
https://drive.google.com/drive/folders/1aPXUjUXioV5UXOaUooDQj0NpdRNHDHDJ?usp=sharing
Perf Summary
With 10s video of 1536x1024 resolution, here is the perf table comparision.
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.