-
Notifications
You must be signed in to change notification settings - Fork 190
[5571471] Fix quantization logic for residual branches with different backbones #425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5571471] Fix quantization logic for residual branches with different backbones #425
Conversation
WalkthroughChanges adjust non-residual detection in ONNX quantization graph logic to avoid early classification when backbone ops differ; add a ConvTranspose→Conv residual test model builder (duplicated) and a unit test that verifies INT8 quantization and Add-node dequantization behavior. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as quantization flow
participant G as graph_utils.build_non_residual_input_map
participant B1 as backbone1
participant B2 as backbone2
Note over G: Revised decision: don't mark non-residual solely because backbone ops differ
Caller->>G: Provide node inputs and candidate backbones
G->>B1: Inspect existence, op type, path length (d1)
G->>B2: Inspect existence, op type, path length (d2)
alt both backbones exist AND ops equal
G->>G: Classify input as non-residual
else both backbones exist AND ops differ
G->>G: Compare d1 vs d2 and continue analysis (may mark non-residual based on path lengths)
else any missing backbone
G->>G: Continue analysis / treat as non-residual depending on other rules
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧠 Learnings (2)📚 Learning: 2025-10-14T16:22:38.493ZApplied to files:
📚 Learning: 2025-09-22T21:14:46.675ZApplied to files:
🧬 Code graph analysis (1)tests/unit/onnx/test_quantize_int8.py (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
modelopt/onnx/quantization/graph_utils.py(1 hunks)tests/_test_utils/onnx_quantization/lib_test_models.py(1 hunks)tests/unit/onnx/test_quantize_int8.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/unit/onnx/test_quantize_int8.py (1)
tests/_test_utils/onnx_quantization/lib_test_models.py (3)
SimpleMLP(75-91)build_convtranspose_conv_residual_model(377-556)export_as_onnx(109-127)
tests/_test_utils/onnx_quantization/lib_test_models.py (1)
modelopt/onnx/utils.py (1)
check_model(557-569)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (2)
modelopt/onnx/quantization/graph_utils.py (1)
529-534: LGTM! Correct fix for residual branch handling.The change correctly restricts non-residual input determination to only the case where both backbones are the same instance. This allows proper distance-based analysis when backbone types differ (e.g., ConvTranspose vs Conv), fixing the bug where Q/DQ nodes were incorrectly added to all Add inputs instead of only the residual branch.
tests/_test_utils/onnx_quantization/lib_test_models.py (1)
377-556: Ignore duplicate function suggestionThe function
build_convtranspose_conv_residual_model()is only defined once inlib_test_models.py; there is no duplicate to remove.Likely an incorrect or invalid review comment.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #425 +/- ##
==========================================
- Coverage 73.38% 73.37% -0.01%
==========================================
Files 180 180
Lines 17934 17934
==========================================
- Hits 13160 13159 -1
- Misses 4774 4775 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
5f86ecf to
c2083fa
Compare
Signed-off-by: Ali Boubezari <aboubezari@nuro.ai> cleanup Signed-off-by: Ali Boubezari <aboubezari@nuro.ai> Update modelopt/onnx/autocast/precisionconverter.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: aboubezari <126983138+aboubezari@users.noreply.github.com> [5571471] Fix quantization logic for residual branches with different backbones (NVIDIA#425) Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com> modify casts; testing Add support for tensor scales Signed-off-by: Ali Boubezari <aboubezari@nuro.ai> Generalize & automate skipping inputs; only skip index 2 for bfloat16 Signed-off-by: Ali Boubezari <aboubezari@nuro.ai> bugfixes Signed-off-by: Ali Boubezari <aboubezari@nuro.ai>
Signed-off-by: Ali Boubezari <aboubezari@nuro.ai> cleanup Signed-off-by: Ali Boubezari <aboubezari@nuro.ai> Update modelopt/onnx/autocast/precisionconverter.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: aboubezari <126983138+aboubezari@users.noreply.github.com> [5571471] Fix quantization logic for residual branches with different backbones (#425) Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com> modify casts; testing Add support for tensor scales Signed-off-by: Ali Boubezari <aboubezari@nuro.ai> Generalize & automate skipping inputs; only skip index 2 for bfloat16 Signed-off-by: Ali Boubezari <aboubezari@nuro.ai> bugfixes Signed-off-by: Ali Boubezari <aboubezari@nuro.ai>
What does this PR do?
Type of change: Bug fix
Overview: Q/DQ nodes were being added to all inputs of
Addnodes, whereas it should only be added in the residual branch. This was due to an incorrect logic in cases where the op type of the root node was different to the op type of the non-residual branch node. This PR fixes that.Usage
Testing
Added unittest.
Before your PR is "Ready for review"
Additional Information
Confirmed no regressions in: ConvNext, EfficientViT, EfficientNet, MobileNet, ResNet, ResNext, ViT.
Summary by CodeRabbit
Bug Fixes
Tests
Chores