Layout transform: Fix-up QDQ units and add constant folding #20685

adrianlizarraga · 2024-05-15T05:56:57Z

Description

Problem 1: Broken Transpose QDQ unit

Layout transform's specialized cost function aggressively pushes down transposes with channel-first or channel-last perms. This can lead to a situation where a channel-fist/last Transpose gets stuck after being pushed through an Unsqueeze node that makes the Transpose's perm no longer channel-first/last. At this point, the specialized cost function defers to the default const function, which does not see a need to continue pushing this transpose node. This breaks the QDQ node units for both the Unsqueeze and the Transpose: DQ -> Unsqueeze -> Transpose -> Q.

The transpose optimizer should insert a Q -> DQ pair between the Unsqueeze and Transpose nodes to fix both QDQ node units: DQ -> Unsqueeze -> Q[new] -> DQ[new] -> Transpose -> Q

Problem 2: Inserted Squeeze/Transpose nodes should be constant folded when possible.

The transpose optimizer inserts Squeeze (and Transpose) ops between an initializer and a DQ to counteract the effect of Unsqueezing that initializer if it is consumed by multiple nodes. This results in a graph where the inserted nodes are not in valid node units:

Original graph where two Mul nodes share a common initializer input:

Resulting graph after transpose optimization without constant folding:

Here, the circled Transpose and Squeeze nodes operate on a quantized integer type but are not in valid QDQ node units. The solution is to run constant folding, which results in:

Motivation and Context

Improve the layout transformation to allow more models to run on EPs that prefer the channel-last layout.

… Unsqueeze -> Transpose -> Q

onnxruntime/test/testdata/make_layout_transform_const_fold_inserted_squeezes.py

onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc

onnxruntime/test/testdata/make_layout_transform_const_folding.py

onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc

…does constant folding

…ub.com:microsoft/onnxruntime into adrianl/reenable-l1-opt-after-layout-transform

…'fix-ups' easier.

onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc

onnxruntime/test/testdata/make_layout_transform_const_folding.py

Co-authored-by: Scott McKay <skottmckay@gmail.com>

adrianlizarraga added 2 commits May 14, 2024 22:51

First attempt at fixing stuck Transpose after layout transform: DQ ->…

28a41d0

… Unsqueeze -> Transpose -> Q

Add mini constant-folding to transpose optimizer

aa9a08d

github-advanced-security bot found potential problems May 16, 2024

View reviewed changes

onnxruntime/test/testdata/make_layout_transform_const_fold_inserted_squeezes.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems May 16, 2024

View reviewed changes

onnxruntime/test/testdata/make_layout_transform_const_fold_inserted_squeezes.py Fixed Show fixed Hide fixed

onnxruntime/test/testdata/make_layout_transform_const_fold_inserted_squeezes.py Fixed Show fixed Hide fixed

adrianlizarraga added 3 commits May 15, 2024 23:18

Run linter

11221a5

Rename const folding test model and script

29a8106

Move constant folding to a separate function

46a446a

adrianlizarraga changed the title ~~Layout transform: Fix QDQ unit for Transpose stuck after Unsqueeze~~ Layout transform: Fix QDQ units and add constant folding May 16, 2024

adrianlizarraga marked this pull request as ready for review May 16, 2024 08:16

adrianlizarraga requested review from skottmckay and edgchen1 May 16, 2024 08:17