[Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose #9336

AndrewZhaoLuo · 2021-10-21T01:27:28Z

The old kernel_layout convention was incorrect for conv2d_transpose. They mixed up I and O.

E.g. if we had a conv with 5 input channels and 10 output channels with height and width of 3 the weight for IOHW would be [10, 5, 3, 3] instead of [5, 10, 3, 3]. Likewise for OIHW we would have [5, 10, 3, 3] instead of [10, 5, 3, 3]. This is extremely confusing and several times within the codebase people note the "convention" for "O" and "I" is flipped.

This is just wrong I believe. I and O should be related to the input and output channels of the convolution only.

Why was the confusion here in first case? The actual topi assume IOHW kernel inputs and we layout transform to achieve this. It seems for regular convs we are used to OIHW layouts which is why we might have had the confusion. Indeed in PyTorch, conv2d_transpose weights follow IOHW convention:

>>> nn.ConvTranspose2d(10, 30, 5, groups=2).state_dict()['weight'].shape
torch.Size([10, 15, 5, 5])

While conv2d follow OIHW:

>>> nn.Conv2d(10, 30, 5, groups=2).state_dict()['weight'].shape
torch.Size([30, 5, 5, 5])

This has caused me a great deal of confusion. It seems like the current code only works because people mix up the layout convention manually. E.g. the layout in the code is said to be OIHW but the shape of the tensor and how they run the calculation is done in IOHW.

This is my attempt at untying the knot and making it much more consistent.

TODOs:
Make pyxir remove this line and renable the skipped test for vitas.py: https://github.com/Xilinx/pyxir/blob/master/python/pyxir/frontend/tvm/relay_tools/relay_l2_convolution.py#L380

mbrookhart · 2021-10-22T22:46:12Z

@masahi Perhaps you or someone you know has a better sense of framework conventions here than I do? I'm not sure if there's a common representation here.

AndrewZhaoLuo · 2021-10-22T23:07:47Z

I think in general, I = the dimension that will change when we expect the input channels to change. O = the dimension that will change when we expect the output channels to change. In this operator, this is not the case (and it also isn't for the 1D and 3D cases too probably but I will change those if this gets good traction).

I will say I can understand maybe why this convention is so weird.

conv2d_transpose can be seen as the gradient of conv2d.

So 'I' and 'O' represent the input and output channels of the imaginary conv2d function this is a gradient of. However, in conv2d_transpose, we don't have this context really and flipping the signs will cause confusion.

@ganler

…aoLuo/tvm into aluo/qnn/conv2d-transpose-fixes * 'aluo/qnn/conv2d-transpose-fixes' of github.com:AndrewZhaoLuo/tvm: (27 commits) fix keras jostle more frontend tests fixes remove f2qi for later pr remove formatting only change undo just formatting changes lint add todo fix fake quantization pass fix qnn conv2d transpose tests fix vitis tests change layouts for conv2d_transpose too fix bug with layout transform add test lint make pytorch tests pass fix a lot of initial tests Disable Hexagon TestConv2dPackedFilter test (apache#9344) BUG: Look through on_device annotations when looking for shape constants (apache#9345) [Community] @ganler -> Reviewer (apache#9346) ...

comaniac

The clarification makes sense to me so I approved first.
It would be better if @masahi could help take a final check.

comaniac · 2021-11-10T17:50:02Z

tests/python/contrib/test_vitis_ai/test_vitis_ai_codegen.py

+@pytest.mark.skip(
+    reason="I and O used to be mixed up in kernel layouts in TVM."
+    "This is fixed, but vitis needs to adopt the new convention."
+    "To change, simply remove this line:"
+    "https://github.com/Xilinx/pyxir/blob/bef661d6d77adcdbd2cf4163f2cf3a1d31d40406/"
+    "python/pyxir/frontend/tvm/relay_tools/relay_l2_convolution.py#L380"
+)


Thanks, this kernel layout switch makes a lot of sense to me. I will fix this in the Vitis flow and then we can remove skipping the test.

masahi · 2021-11-11T02:22:09Z

Is the PR #9465 related?

junrushao · 2021-11-11T06:08:26Z

@Lyken17 is this related to your fix? Would be great if you could take a look :-) Thanks a lot!

Lyken17 · 2021-11-11T11:57:55Z

@junrushao1994 Yes, definitely related. We both noticed that the conv2d_transpose weight layout should be IOHW rather than OIHW (used in conv2d). This is indeed confusing and leads to several wrong shape checks. Will go through the code in details later today.

Lyken17 · 2021-11-11T16:38:32Z

src/relay/op/nn/convolution.h

@@ -1044,7 +1044,7 @@ bool Conv2DTransposeRel(const Array<Type>& types, int num_inputs, const Attrs& a
  if (data == nullptr) return false;

  static const Layout kNCHW("NCHW");
-  static const Layout kOIHW("OIHW");
+  static const Layout kIOHW("IOHW");


Does the layout variable affect later calculation, or is it just an alias of layout?

Lyken17

LGTM. This PR fixs the wrongly adapted OIHW in current topi.nn.conv2d_transpose as well as many related shape claims / defnitions.

junrushao · 2021-11-12T00:46:17Z

Thank you all for the huge effort this week!

* main: (119 commits) [Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose (apache#9336) Fix repository URL in ubuntu_install_rocm.sh (apache#9425) Add LLVM-13 installation to Docker setup (apache#9498) [Relay] Use target_host determined at Relay level instead of recalculating it (apache#9499) Arm(R) Ethos(TM)-U NPU BinaryElementwise operators support (apache#9442) [COMMUNITY] Junru's and Wuwei's PGP key for ASF release (apache#9488) Add default for split op (apache#9489) [HOTFIX][TARGET] Change LOG in compilation config to DLOG (apache#9486) Fixed some warnings about lambda's closures that are bigger than necessary (apache#9481) [Support] Add libinfo into the runtime build (apache#9310) Change Call with TIRCallAttrs to call_lowered op (apache#9312) [ETHOSN] Streamline Ethos(TM)-N cross-compile rpc usage (apache#9477) [CMSIS-NN] Assert correct amount of CMSIS-NN artifacts in MLF (apache#9480) [MicroTVM][PyTest] Explicitly skip MicroTVM unittests. (apache#9335) [microNPU] Replace ICHECK with diagnostic context in type inference (apache#9470) Better host handling in CompilationConfig & debug printing (apache#9460) [AOT][Tests] Use pre-built libraries in Reference System tests (apache#9271) [TIR] Add type hint for TIR (apache#9432) [TVMC] Add test for quantized pytorch model (apache#9467) [CMSIS-NN] Convert CMSIS-NN to use Target Hooks (apache#9397) ...

… for conv2d_transpose (apache#9336) * fix a lot of initial tests * make pytorch tests pass * lint * add test * fix bug with layout transform * change layouts for conv2d_transpose too * fix vitis tests * fix qnn conv2d transpose tests * fix fake quantization pass * add todo * lint * undo just formatting changes * remove formatting only change * remove f2qi for later pr * more frontend tests fixes * fix a lot of initial tests * make pytorch tests pass * lint * add test * fix bug with layout transform * change layouts for conv2d_transpose too * fix vitis tests * fix qnn conv2d transpose tests * fix fake quantization pass * add todo * lint * undo just formatting changes * remove formatting only change * remove f2qi for later pr * more frontend tests fixes * jostle * fix keras * fix another frontend test * fix things * jostle ci

* [Caffe Frontend] supporting group > 1 cases for Deconv op - Handling group > 1 cases, assuming group == output channels - Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops - Added some test cases Signed-off-by: zotanika <zotanika@gmail.com> * [Caffe Frontend] amending a test case for Deconv op Signed-off-by: zotanika <zotanika@gmail.com> * explicit importing tvm.testing * changing split axis to 0, according to PR #9336

* [Caffe Frontend] supporting group > 1 cases for Deconv op - Handling group > 1 cases, assuming group == output channels - Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops - Added some test cases Signed-off-by: zotanika <zotanika@gmail.com> * [Caffe Frontend] amending a test case for Deconv op Signed-off-by: zotanika <zotanika@gmail.com> * explicit importing tvm.testing * changing split axis to 0, according to PR apache#9336

AndrewZhaoLuo added 11 commits October 20, 2021 18:11

fix a lot of initial tests

6dd1566

make pytorch tests pass

6b00883

lint

0447f10

add test

2425b41

fix bug with layout transform

a1c8f6b

change layouts for conv2d_transpose too

26e1462

fix vitis tests

8b93a0e

fix qnn conv2d transpose tests

aa3daf9

fix fake quantization pass

5bb9e80

add todo

8c526e2

lint

03bad08

AndrewZhaoLuo mentioned this pull request Oct 21, 2021

[FQ2I] Support Conv2dTranspose FQ2I #9347

Merged

AndrewZhaoLuo added 3 commits October 21, 2021 16:36

undo just formatting changes

962766c

remove formatting only change

83d4fa6

remove f2qi for later pr

b2f049d

AndrewZhaoLuo changed the title ~~[WIP] Fix inconsistent kernel layout conventions for conv2d_transpose~~ [Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose Oct 21, 2021

AndrewZhaoLuo marked this pull request as ready for review October 22, 2021 00:01

AndrewZhaoLuo requested review from anijain2305, areusch, comaniac, Huyuwei, jcf94, jroesch, junrushao, jwfromm, kazum, kevinthesun, Laurawly, MarisaKirisame and masahi as code owners October 22, 2021 00:01

AndrewZhaoLuo added 4 commits October 24, 2021 17:34

fix another frontend test

dbfa74b

fix things

7a2838d

jostle ci

2c2136e

AndrewZhaoLuo mentioned this pull request Nov 5, 2021

Fix group transpose conv2d #9443

Closed

comaniac approved these changes Nov 10, 2021

View reviewed changes

masahi self-assigned this Nov 11, 2021

Lyken17 mentioned this pull request Nov 11, 2021

[Conv2DTransposed] Fix wrong shape check and add new TOPI module to support groups #9465

Merged

Lyken17 reviewed Nov 11, 2021

View reviewed changes

junrushao merged commit 6159b8e into apache:main Nov 12, 2021

masahi mentioned this pull request Dec 6, 2021

[Bug] test_tensorrt.py::test_conv2d_transpose failed #9653

Closed

zotanika added a commit to zotanika/incubator-tvm that referenced this pull request Jan 14, 2022

changing split axis to 0, according to PR apache#9336

d574f13

AndrewZhaoLuo mentioned this pull request Jan 14, 2022

[ONNX] Fix onnx convtranspose error #9938

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose #9336

[Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose #9336

AndrewZhaoLuo commented Oct 21, 2021 •

edited

mbrookhart commented Oct 22, 2021 •

edited

AndrewZhaoLuo commented Oct 22, 2021 •

edited

comaniac left a comment

comaniac Nov 10, 2021

jtuyls Nov 11, 2021 •

edited

masahi commented Nov 11, 2021

junrushao commented Nov 11, 2021

Lyken17 commented Nov 11, 2021

Lyken17 Nov 11, 2021

Lyken17 left a comment

junrushao commented Nov 12, 2021

[Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose #9336

[Topi][Op][PyTorch][Vitas] Fix inconsistent kernel layout conventions for conv2d_transpose #9336

Conversation

AndrewZhaoLuo commented Oct 21, 2021 • edited

mbrookhart commented Oct 22, 2021 • edited

AndrewZhaoLuo commented Oct 22, 2021 • edited

comaniac left a comment

Choose a reason for hiding this comment

comaniac Nov 10, 2021

Choose a reason for hiding this comment

jtuyls Nov 11, 2021 • edited

Choose a reason for hiding this comment

masahi commented Nov 11, 2021

junrushao commented Nov 11, 2021

Lyken17 commented Nov 11, 2021

Lyken17 Nov 11, 2021

Choose a reason for hiding this comment

Lyken17 left a comment

Choose a reason for hiding this comment

junrushao commented Nov 12, 2021

AndrewZhaoLuo commented Oct 21, 2021 •

edited

mbrookhart commented Oct 22, 2021 •

edited

AndrewZhaoLuo commented Oct 22, 2021 •

edited

jtuyls Nov 11, 2021 •

edited