Add dot product support for quantized convolution. #6445

giuseros · 2020-09-10T17:00:34Z

High level description of the submission

We added two new intrinsics in: topi/arm_cpu/tensor_intrin.py, namely

mmla4x4: compute a matrix multiplication between tile A(4,4) and tile
B(4,4)
mmla16x4: compute a matrix multiplication between tile A(rows,4) and tile
B(4,16)
Then we used those intrinsics in two separate strategies. We added the
strategies in topi/arm_cpu/conv2d_int8.py and implemented the schedules
in topi/arm_cpu/conv2d_gemm.py. In particular:
schedule_conv2d_gemm, when accelerated, packs matrix A, compute GEMM,
and unpack the resulting matrix. This uses the mmla4x4 intrinsic
schedule_conv2d_gemm_hybrid doesn't do any packing on A and C which
are in native form. This uses the mmla16x4 intrinsic

Please note that for the limitations of tensorize we need to pad
matrix A in both cases (when dimensions are not multiple of the tiling
shape)

RFC

This PR is based on the following RFC: https://discuss.tvm.apache.org/t/rfc-accelerate-quantized-convolution-through-dot-product/7873

Change-Id: Id0d818d84ffc458c6dad7983fd350a0f3d5db395

ZihengJiang · 2020-09-16T19:40:23Z

@anijain2305 @FrozenGene Would you please have a look at this?

u99127 · 2020-09-17T22:37:40Z

Can you see why ci is failing @giuseros ?

giuseros · 2020-09-18T13:58:14Z

@u99127 , I am on it. It is strange that the command I ran locally didn't catch this.

giuseros · 2020-09-18T15:46:56Z

Mmmm the last failure seems like a tolerance issue. Let me retrigger the CI

mbaret

Broadly looks good, although I find some of the naming confusing and combining the strategies of dot product/no dot product could become difficult to maintain as more intrinsics get added.

Also on testing, I think we want some tests which at least exercise the compilation route here and test against either some known TIR or assembly (until we get Arm CI).

python/tvm/relay/op/strategy/arm_cpu.py

python/tvm/topi/arm_cpu/arm_utils.py

python/tvm/topi/arm_cpu/conv2d_alter_op.py

python/tvm/topi/arm_cpu/conv2d_gemm.py

python/tvm/topi/arm_cpu/conv2d_int8.py

python/tvm/topi/arm_cpu/tensor_intrin.py

giuseros · 2020-09-25T11:21:23Z

Hi @mbaret ,
Thanks for the review!

I addressed the comments and added compilation tests to verify the compilation flow with dot-product.

FrozenGene

LGTM.

We added two new intrinsics in: topi/arm_cpu/tensor_intrin.py, namely - mmla4x4: compute a matrix multiplication between tile A(4,4) and tile B(4,4) - mmla16x4: compute a matrix multiplication between tile A(rows,4) and tile B(4,16) Then we used those intrinsics in two separate strategies. We added the strategies in topi/arm_cpu/conv2d_int8.py and implemented the schedules in topi/arm_cpu/conv2d_gemm.py. In particular: - schedule_conv2d_gemm, when accelerated, packs matrix A, compute GEMM, and unpack the resulting matrix. This uses the mmla4x4 intrinsic - schedule_conv2d_gemm_hybrid doesn't do any packing on A and C which are in native form. This uses the mmla16x4 intrinsic Please note that for the limitations of `tensorize` we need to pad matrix A in both cases (when dimensions are not multiple of the tiling shape) Change-Id: Id0d818d84ffc458c6dad7983fd350a0f3d5db395

Change-Id: I8b1826a7ae1d742956296e8d157da19955a4942c

Change-Id: Ic74ef5461a90bca9f4d4980a214137e384d5f923

Change-Id: I5fb8a2ae4467a87bd3470f6b3753c074f9b7cc78

Change-Id: I284b1f2c121051e672f548d6c6ee2a3267854e31

Change-Id: I1813b0226b536aedee0dce9eeeba27aa2d95518b

Change-Id: Ic040722abd5538fccb85af4de922394c939e7000

Change-Id: If09e3baa514c85dc78d3c27c2ac2fa2e01773d89

python/tvm/topi/arm_cpu/tensor_intrin.py

Change-Id: I857b28b6f9b23307d8c1eebc509de6ad2783c756

u99127

LGTM, thanks.

mbaret

final few comments

python/tvm/topi/arm_cpu/tensor_intrin.py

Change-Id: I63d1a639d4a72abeb33148fd2868cd356ef84122

mbaret

Thanks for your patience :) LGTM

giuseros · 2020-10-05T17:06:09Z

Hi @mbaret ,
Thank you for the careful review!

@FrozenGene , @anijain2305 should we merge this in?

FrozenGene

last comment

FrozenGene · 2020-10-06T13:09:45Z

python/tvm/topi/arm_cpu/tensor_intrin.py

@@ -445,7 +443,7 @@ def gemm_quantized(M, N, K, unroll, interleave, in_type, out_type):
    )

    c_buffer = tvm.tir.decl_buffer(
-        C.shape, dtype=out_type, name="c_buffer", offset_factor=1, strides=[te.var("sc"), 1]
+        C.shape, dtype="int32", name="c_buffer", offset_factor=1, strides=[te.var("sc"), 1]


what is the reason changing this?

Hi @FrozenGene , the problem is the following: in quantized conv2d, we do conv2d and then requantization (those are two different relay operators). Conv2d goes from int8->int32, and requantization goes from int32->int8. So in theory this would work with out_type.

However, in some tests (pre-existing to my changes, that I run internally) I noticed that they set the (conv2d) out_type to int8(or uint8). In this case the intrinsic still needs to produce an int32 value and the cast to int8 (or uint8) needs to happen at a later stage.

This change is basically saying: no matter the out_type the intrinsic will produce a int32 result. If we want the output to be int8 (which would be wrong, but some tests do it to simplify the testing) the conversion needs to happen later.

FrozenGene · 2020-10-06T13:54:03Z

Thanks everyone. Merged now.

giuseros · 2020-10-06T14:00:21Z

Thanks @FrozenGene !

* Add dot product support for quantized convolution. We added two new intrinsics in: topi/arm_cpu/tensor_intrin.py, namely - mmla4x4: compute a matrix multiplication between tile A(4,4) and tile B(4,4) - mmla16x4: compute a matrix multiplication between tile A(rows,4) and tile B(4,16) Then we used those intrinsics in two separate strategies. We added the strategies in topi/arm_cpu/conv2d_int8.py and implemented the schedules in topi/arm_cpu/conv2d_gemm.py. In particular: - schedule_conv2d_gemm, when accelerated, packs matrix A, compute GEMM, and unpack the resulting matrix. This uses the mmla4x4 intrinsic - schedule_conv2d_gemm_hybrid doesn't do any packing on A and C which are in native form. This uses the mmla16x4 intrinsic Please note that for the limitations of `tensorize` we need to pad matrix A in both cases (when dimensions are not multiple of the tiling shape) Change-Id: Id0d818d84ffc458c6dad7983fd350a0f3d5db395 * Add back nhwc_spatial_pack strategy as default Change-Id: I8b1826a7ae1d742956296e8d157da19955a4942c * Fix linting through Black Change-Id: Ic74ef5461a90bca9f4d4980a214137e384d5f923 * Fix python linting Change-Id: I5fb8a2ae4467a87bd3470f6b3753c074f9b7cc78 * Addressing review comments Change-Id: I284b1f2c121051e672f548d6c6ee2a3267854e31 * Fix black linting issues Change-Id: I1813b0226b536aedee0dce9eeeba27aa2d95518b * Fixing failing test and adding tests for dot-product compilation Change-Id: Ic040722abd5538fccb85af4de922394c939e7000 * Fixing linting and review comments Change-Id: If09e3baa514c85dc78d3c27c2ac2fa2e01773d89 * Fixing black linting and address comments Change-Id: I857b28b6f9b23307d8c1eebc509de6ad2783c756 * Address review comments Change-Id: I63d1a639d4a72abeb33148fd2868cd356ef84122

giuseros force-pushed the conv2d_dot_prod branch 2 times, most recently from f7b86dc to 995c6ab Compare September 16, 2020 17:24

ZihengJiang added the status: need review label Sep 16, 2020

mbaret requested changes Sep 23, 2020

View reviewed changes

giuseros force-pushed the conv2d_dot_prod branch from e1d2238 to 146da4b Compare September 25, 2020 11:22

FrozenGene approved these changes Sep 27, 2020

View reviewed changes

Giuseppe Rossini added 8 commits September 29, 2020 14:50

Add back nhwc_spatial_pack strategy as default

8b69ffd

Change-Id: I8b1826a7ae1d742956296e8d157da19955a4942c

Fix linting through Black

16292e9

Change-Id: Ic74ef5461a90bca9f4d4980a214137e384d5f923

Fix python linting

7c8c867

Change-Id: I5fb8a2ae4467a87bd3470f6b3753c074f9b7cc78

Addressing review comments

8168589

Change-Id: I284b1f2c121051e672f548d6c6ee2a3267854e31

Fix black linting issues

01d827a

Change-Id: I1813b0226b536aedee0dce9eeeba27aa2d95518b

Fixing failing test and adding tests for dot-product compilation

82ce647

Change-Id: Ic040722abd5538fccb85af4de922394c939e7000

Fixing linting and review comments

16b9c6e

Change-Id: If09e3baa514c85dc78d3c27c2ac2fa2e01773d89

mbaret reviewed Sep 29, 2020

View reviewed changes

Fixing black linting and address comments

8fd794a

Change-Id: I857b28b6f9b23307d8c1eebc509de6ad2783c756

giuseros force-pushed the conv2d_dot_prod branch from b49e65d to 8fd794a Compare September 29, 2020 16:36

u99127 approved these changes Oct 2, 2020

View reviewed changes

mbaret requested changes Oct 2, 2020

View reviewed changes

Address review comments

0f8116a

Change-Id: I63d1a639d4a72abeb33148fd2868cd356ef84122

mbaret approved these changes Oct 5, 2020

View reviewed changes

FrozenGene reviewed Oct 6, 2020

View reviewed changes

FrozenGene merged commit 8369adc into apache:master Oct 6, 2020

giuseros deleted the conv2d_dot_prod branch October 6, 2020 14:00

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dot product support for quantized convolution. #6445

Add dot product support for quantized convolution. #6445

giuseros commented Sep 10, 2020

ZihengJiang commented Sep 16, 2020

u99127 commented Sep 17, 2020

giuseros commented Sep 18, 2020

giuseros commented Sep 18, 2020

mbaret left a comment

giuseros commented Sep 25, 2020

FrozenGene left a comment

u99127 left a comment

mbaret left a comment

mbaret left a comment

giuseros commented Oct 5, 2020

FrozenGene left a comment

FrozenGene Oct 6, 2020

giuseros Oct 6, 2020 •

edited

FrozenGene Oct 6, 2020

FrozenGene commented Oct 6, 2020

giuseros commented Oct 6, 2020

Add dot product support for quantized convolution. #6445

Add dot product support for quantized convolution. #6445

Conversation

giuseros commented Sep 10, 2020

High level description of the submission

RFC

ZihengJiang commented Sep 16, 2020

u99127 commented Sep 17, 2020

giuseros commented Sep 18, 2020

giuseros commented Sep 18, 2020

mbaret left a comment

Choose a reason for hiding this comment

giuseros commented Sep 25, 2020

FrozenGene left a comment

Choose a reason for hiding this comment

u99127 left a comment

Choose a reason for hiding this comment

mbaret left a comment

Choose a reason for hiding this comment

mbaret left a comment

Choose a reason for hiding this comment

giuseros commented Oct 5, 2020

FrozenGene left a comment

Choose a reason for hiding this comment

FrozenGene Oct 6, 2020

Choose a reason for hiding this comment

giuseros Oct 6, 2020 • edited

Choose a reason for hiding this comment

FrozenGene Oct 6, 2020

Choose a reason for hiding this comment

FrozenGene commented Oct 6, 2020

giuseros commented Oct 6, 2020

giuseros Oct 6, 2020 •

edited