Skip to content

Conversation

@gcunhase
Copy link
Contributor

@gcunhase gcunhase commented Oct 10, 2025

What does this PR do?

Type of change: Bug fix

Overview: Q/DQ nodes were being added to all inputs of Add nodes, whereas it should only be added in the residual branch. This was due to an incorrect logic in cases where the op type of the root node was different to the op type of the non-residual branch node. This PR fixes that.

Usage

$ python -m modelopt.onnx.quantization --onnx_path=${MODEL_NAME}.onnx

Testing

Added unittest.

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

Additional Information

Confirmed no regressions in: ConvNext, EfficientViT, EfficientNet, MobileNet, ResNet, ResNext, ViT.

Summary by CodeRabbit

  • Bug Fixes

    • Improved residual-path detection during ONNX INT8 quantization to avoid misclassification when backbone operations differ, enabling correct quantization across more model topologies.
  • Tests

    • Added end-to-end test coverage for ConvTranspose→Conv residual models to verify INT8 quantization and correct handling of residual additions.
  • Chores

    • Expanded test utilities to construct complex residual graphs used in validation.

@gcunhase gcunhase requested a review from a team as a code owner October 10, 2025 21:08
@gcunhase gcunhase requested a review from ajrasane October 10, 2025 21:08
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 10, 2025

Walkthrough

Changes adjust non-residual detection in ONNX quantization graph logic to avoid early classification when backbone ops differ; add a ConvTranspose→Conv residual test model builder (duplicated) and a unit test that verifies INT8 quantization and Add-node dequantization behavior.

Changes

Cohort / File(s) Summary
Quantization graph utils
modelopt/onnx/quantization/graph_utils.py
Removed the branch that treated differing backbone op types as an immediate non-residual case; when both backbones exist but ops differ the code now proceeds to compare path lengths (d1 vs d2) and further analysis instead of early classification.
Test model builders (duplicated)
tests/_test_utils/onnx_quantization/lib_test_models.py
Added build_convtranspose_conv_residual_model() which constructs a ConvTranspose → Relu → Conv → BatchNormalization → Add → Relu ONNX model with initializers and model setup. The function was appended twice, creating two identical definitions.
Unit tests
tests/unit/onnx/test_quantize_int8.py
Added imports for build_convtranspose_conv_residual_model and save_onnx plus new test test_convtranspose_conv_residual_int8: saves the model, runs INT8 quantization (high_precision_dtype="fp16"), asserts output file exists, checks Conv and ConvTranspose nodes are quantized, and verifies each Add node has exactly one quantized input (by inspecting DequantizeLinear inputs).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller as quantization flow
  participant G as graph_utils.build_non_residual_input_map
  participant B1 as backbone1
  participant B2 as backbone2

  Note over G: Revised decision: don't mark non-residual solely because backbone ops differ
  Caller->>G: Provide node inputs and candidate backbones
  G->>B1: Inspect existence, op type, path length (d1)
  G->>B2: Inspect existence, op type, path length (d2)
  alt both backbones exist AND ops equal
    G->>G: Classify input as non-residual
  else both backbones exist AND ops differ
    G->>G: Compare d1 vs d2 and continue analysis (may mark non-residual based on path lengths)
  else any missing backbone
    G->>G: Continue analysis / treat as non-residual depending on other rules
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I hop through graphs where tensors wind,
I sniff two backbones, then clear my mind.
When ops differ I pause, I pry,
I measure paths before I cry.
A twitch of whiskers—tests applaud the find.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly summarizes the core fix of adjusting quantization logic for residual branches when their backbones differ, matching the main change in the code without extraneous detail.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f86ecf and c2083fa.

📒 Files selected for processing (3)
  • modelopt/onnx/quantization/graph_utils.py (1 hunks)
  • tests/_test_utils/onnx_quantization/lib_test_models.py (1 hunks)
  • tests/unit/onnx/test_quantize_int8.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/_test_utils/onnx_quantization/lib_test_models.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-14T16:22:38.493Z
Learnt from: gcunhase
PR: NVIDIA/TensorRT-Model-Optimizer#425
File: tests/unit/onnx/test_quantize_int8.py:88-89
Timestamp: 2025-10-14T16:22:38.493Z
Learning: In ModelOpt's ONNX quantization pipeline, when `--direct_io_types` is not enabled (the default behavior), QuantizeLinear (Q) nodes are always inserted before DequantizeLinear (DQ) nodes. This means that in tests checking for DQ nodes, the inputs will always have at least one input node (the Q node), so checking `inp.inputs` for emptiness before accessing `inp.inputs[0]` is unnecessary in the default quantization workflow.

Applied to files:

  • tests/unit/onnx/test_quantize_int8.py
📚 Learning: 2025-09-22T21:14:46.675Z
Learnt from: gcunhase
PR: NVIDIA/TensorRT-Model-Optimizer#354
File: modelopt/onnx/quantization/qdq_utils.py:874-944
Timestamp: 2025-09-22T21:14:46.675Z
Learning: In ModelOpt's ONNX quantization pipeline, DequantizeLinear nodes always have 3 inputs (data, scale, zero_point) and zero-points are always provided as initializers, not as Constant nodes. This means the `remove_graph_input_q` function can safely assume `dq_node[0].input[2]` exists and is an initializer.

Applied to files:

  • tests/unit/onnx/test_quantize_int8.py
🧬 Code graph analysis (1)
tests/unit/onnx/test_quantize_int8.py (1)
tests/_test_utils/onnx_quantization/lib_test_models.py (1)
  • build_convtranspose_conv_residual_model (377-556)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (2)
modelopt/onnx/quantization/graph_utils.py (1)

530-540: LGTM! Path length comparison correctly handles different backbone types.

The removal of the early classification when backbone1.op != backbone2.op is the core fix for this PR. Now when both inputs have backbones of different types (e.g., ConvTranspose vs Conv), the code correctly uses path length to determine which branch is the residual connection. The longer path to the lowest common ancestor is treated as the main computation path (non-residual, no quantization), while the shorter path is treated as the residual/skip connection and gets quantized. This aligns with the PR objective of adding Q/DQ nodes only to the residual branch.

tests/unit/onnx/test_quantize_int8.py (1)

65-91: LGTM! Test effectively validates the fix for different backbone types.

This test correctly verifies the core bug fix: when an Add node has residual branches with different backbone operations (ConvTranspose vs Conv), only the residual branch should be quantized. The test builds a model with this pattern, runs INT8 quantization, and asserts that exactly one input of each Add node is quantized through DequantizeLinear. This directly validates that the fix prevents Q/DQ nodes from being incorrectly added to both inputs.

Note: The direct access to inp.inputs[0] at line 88 is safe per the codebase's quantization pipeline, where QuantizeLinear nodes always precede DequantizeLinear nodes in the default workflow.

Based on learnings


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08fb23f and 2fe7547.

📒 Files selected for processing (3)
  • modelopt/onnx/quantization/graph_utils.py (1 hunks)
  • tests/_test_utils/onnx_quantization/lib_test_models.py (1 hunks)
  • tests/unit/onnx/test_quantize_int8.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/unit/onnx/test_quantize_int8.py (1)
tests/_test_utils/onnx_quantization/lib_test_models.py (3)
  • SimpleMLP (75-91)
  • build_convtranspose_conv_residual_model (377-556)
  • export_as_onnx (109-127)
tests/_test_utils/onnx_quantization/lib_test_models.py (1)
modelopt/onnx/utils.py (1)
  • check_model (557-569)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (2)
modelopt/onnx/quantization/graph_utils.py (1)

529-534: LGTM! Correct fix for residual branch handling.

The change correctly restricts non-residual input determination to only the case where both backbones are the same instance. This allows proper distance-based analysis when backbone types differ (e.g., ConvTranspose vs Conv), fixing the bug where Q/DQ nodes were incorrectly added to all Add inputs instead of only the residual branch.

tests/_test_utils/onnx_quantization/lib_test_models.py (1)

377-556: Ignore duplicate function suggestion

The function build_convtranspose_conv_residual_model() is only defined once in lib_test_models.py; there is no duplicate to remove.

Likely an incorrect or invalid review comment.

@codecov
Copy link

codecov bot commented Oct 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.37%. Comparing base (46a9e49) to head (c2083fa).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #425      +/-   ##
==========================================
- Coverage   73.38%   73.37%   -0.01%     
==========================================
  Files         180      180              
  Lines       17934    17934              
==========================================
- Hits        13160    13159       -1     
- Misses       4774     4775       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
@gcunhase gcunhase force-pushed the dev/gcunhasergio/res_add_diff_backbone_5571471 branch from 5f86ecf to c2083fa Compare October 14, 2025 18:21
@gcunhase gcunhase enabled auto-merge (squash) October 14, 2025 18:22
@gcunhase gcunhase merged commit 4df4091 into NVIDIA:main Oct 14, 2025
27 checks passed
aboubezari added a commit to aboubezari/TensorRT-Model-Optimizer that referenced this pull request Oct 21, 2025
Signed-off-by: Ali Boubezari <aboubezari@nuro.ai>

cleanup

Signed-off-by: Ali Boubezari <aboubezari@nuro.ai>

Update modelopt/onnx/autocast/precisionconverter.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: aboubezari <126983138+aboubezari@users.noreply.github.com>

[5571471] Fix quantization logic for residual branches with different backbones (NVIDIA#425)

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

modify casts; testing

Add support for tensor scales

Signed-off-by: Ali Boubezari <aboubezari@nuro.ai>

Generalize & automate skipping inputs; only skip index 2 for bfloat16

Signed-off-by: Ali Boubezari <aboubezari@nuro.ai>

bugfixes

Signed-off-by: Ali Boubezari <aboubezari@nuro.ai>
galagam pushed a commit that referenced this pull request Oct 22, 2025
Signed-off-by: Ali Boubezari <aboubezari@nuro.ai>

cleanup

Signed-off-by: Ali Boubezari <aboubezari@nuro.ai>

Update modelopt/onnx/autocast/precisionconverter.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: aboubezari <126983138+aboubezari@users.noreply.github.com>

[5571471] Fix quantization logic for residual branches with different backbones (#425)

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

modify casts; testing

Add support for tensor scales

Signed-off-by: Ali Boubezari <aboubezari@nuro.ai>

Generalize & automate skipping inputs; only skip index 2 for bfloat16

Signed-off-by: Ali Boubezari <aboubezari@nuro.ai>

bugfixes

Signed-off-by: Ali Boubezari <aboubezari@nuro.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants