[QNN] Add hardswish int8 impl using table lookup by zhaoyang-star · Pull Request #11700 · apache/tvm

zhaoyang-star · 2022-06-14T06:13:45Z

Registered the hardswish unary elementwise op

AndrewZhaoLuo · 2022-06-16T22:03:32Z

I'll take a look today or tomorrow

zhaoyang-star · 2022-06-20T08:03:50Z

I'll take a look today or tomorrow

By now 2 gpu ci tasks and Hexagon tasks failed. Could you please help to figure out the error? Thanks in advance.
Faild task commands on gpu( they are fine on cpu):

python3 tests/scripts/ci.py gpu --tests tests/python/frontend/pytorch/test_fx_quant.py::test_deeplab_v3
python3 tests/scripts/ci.py gpu --tests tests/python/frontend/pytorch/qnn_test.py::test_quantized_module

AndrewZhaoLuo

LGTM, the hexagon was probably just flaky so you can rerun CI with an empty commit:

git commit -m 'jostle ci' --allow-empty and git push

As for the other errors, see the comment

python/tvm/relay/frontend/qnn_torch.py

zhaoyang-star · 2022-06-23T01:18:46Z

LGTM, the hexagon was probably just flaky so you can rerun CI with an empty commit:

git commit -m 'jostle ci' --allow-empty and git push

As for the other errors, see the comment

Thanks for your kind help. I have fix the typo and recommit with git commit -m 'jostle ci' --allow-empty. But tasks for Hexagon still failed. The error log:

LLVM ERROR: Cannot select: 0x451c698: i32 = fp_to_fp16 0x44eba68

It seems ci doesn't skip tasks for Hexagon? @AndrewZhaoLuo

AndrewZhaoLuo · 2022-06-23T03:12:13Z

I'm not sure, maybe @mehrdadh or @cconvey knows something?

Maybe you just need to rebase on main and try again? Not sure

mehrdadh · 2022-06-23T20:48:35Z

@zhaoyang-star I checked our CI on TVM main and I don't see this error. Can you reproduce the error locally using ci_hexagon docker image. It might be related to your PR

zhaoyang-star · 2022-06-24T01:13:35Z

@mehrdadh Thanks for your quick reply. Maybe it is my PR that cause the failure. @AndrewZhaoLuo
I will double check PR and try to reproduce the error locally.

zhaoyang-star · 2022-06-27T09:11:30Z

Have fixed error and triggle ci again. But the status of ci is pending for a few hours.
Is there some error on the way I submit PR? Thanks @AndrewZhaoLuo

AndrewZhaoLuo · 2022-06-27T16:32:33Z

Nah, it's just flaky -- I think there was some planned work with the CI machines so you got hit with the outage. Just jostle one more time and let's get this merged!

Edit: #11914 <-- due to this.

AndrewZhaoLuo · 2022-06-27T16:37:09Z

Also apparently you can do this:

AndrewZhaoLuo · 2022-06-27T16:37:13Z

@tvm-bot rerun

zhaoyang-star · 2022-06-28T01:46:27Z

One more question:
Theoretically looking Up Table will accelate qnn op because the original impl is dequantize -> fp32 calculate -> quantize, while LUT is created once and we just index the table during inference. But when I run a yolov5 int8 pytorch model which has 57 quantized::hardswish ops, the inference time is almost the same as before. Is there something I misunderstand? or something wrong with code? @AndrewZhaoLuo

AndrewZhaoLuo · 2022-06-28T17:10:22Z

Hmm, do you have a profiler report?

I am curious since I would expect runtimes to be better vs dq - fp32 - q. Do you have a repo to reproduce?

zhaoyang-star · 2022-06-30T10:07:36Z

Hmm, do you have a profiler report?

I am curious since I would expect runtimes to be better vs dq - fp32 - q. Do you have a repo to reproduce?

Based on tests/python/fronend/pytorch/test_fx_quant.py, I replaced all relu with hswish in resnet50.

only use one cpu core
benchmark int8 model

Quantized Model	Inference Time(msec)
resnet50(relu)	1149
resnet50(hswish) w/o LUT	1210
resnet50(hswish, LUT)	1171

About 3% speedup by using LUT. I also tried yolov5 with hswish model, which is about 9% speedup by LUT.

AndrewZhaoLuo · 2022-06-30T18:28:39Z

Hmm yeah this makes sense, i would expect LUT to be slower than ReLU as it requires more memory access.

I suspect perhaps the activation functions just don't take much time? Really ReLU is close to the fastest you can go probably. You can maybe see the upper value for speedup by removing all activations.

Still technically a little bit of improvement!

* v1 * [QNN] Add hardswish int8 impl using table lookup * format * format * fix * fix utest * fix ci error * jostle ci * triggle ci * remote nn * jostle ci * fix

zhaoyang-star · 2022-07-07T08:58:12Z

Hmm, do you have a profiler report?
I am curious since I would expect runtimes to be better vs dq - fp32 - q. Do you have a repo to reproduce?

Based on tests/python/fronend/pytorch/test_fx_quant.py, I replaced all relu with hswish in resnet50.

only use one cpu core

benchmark int8 model

Quantized Model Inference Time(msec)
resnet50(relu) 1149
resnet50(hswish) w/o LUT 1210
resnet50(hswish, LUT) 1171
About 3% speedup by using LUT. I also tried yolov5 with hswish model, which is about 9% speedup by LUT.

Maybe there is something wrong when I created the resnet50 with hswish.
I used a quantized YOLOv5s in which has hswish. The perf improved 50.2% ^_^

Quantized Model	Inference Time(msec)
YOLOv5s(hswish) w/o LUT	18.88
YOLOv5s(hswish, LUT)	12.57

* v1 * [QNN] Add hardswish int8 impl using table lookup * format * format * fix * fix utest * fix ci error * jostle ci * triggle ci * remote nn * jostle ci * fix

zhaoyang-star added 5 commits June 12, 2022 11:56

v1

31318fe

[QNN] Add hardswish int8 impl using table lookup

3c99d08

format

bf969eb

format

5dbea88

fix

f2f1e4a

AndrewZhaoLuo self-requested a review June 16, 2022 20:47

fix utest

37cf9b0

AndrewZhaoLuo approved these changes Jun 21, 2022

View reviewed changes

python/tvm/relay/frontend/qnn_torch.py Outdated Show resolved Hide resolved

zhaoyang-star added 2 commits June 22, 2022 12:37

fix ci error

8639ce5

jostle ci

147243e

zhaoyang-star closed this Jun 24, 2022

zhaoyang-star reopened this Jun 24, 2022

zhaoyang-star added 5 commits June 24, 2022 18:02

triggle ci

8f1b30c

remote nn

f971206

Merge branch 'main' into hardswish_int8

cddc26d

jostle ci

889cf5b

fix

4a790e3

AndrewZhaoLuo merged commit 97b3076 into apache:main Jun 28, 2022

zhaoyang-star deleted the hardswish_int8 branch June 29, 2022 00:57

zhaoyang-star mentioned this pull request Jun 29, 2022

[Tracking Issue] Add hardswish int8 impl via LookUpTable #11799

Closed

1 task

Conversation

zhaoyang-star commented Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndrewZhaoLuo commented Jun 16, 2022

Uh oh!

zhaoyang-star commented Jun 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhaoyang-star commented Jun 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndrewZhaoLuo commented Jun 23, 2022

Uh oh!

mehrdadh commented Jun 23, 2022

Uh oh!

zhaoyang-star commented Jun 24, 2022

Uh oh!

zhaoyang-star commented Jun 27, 2022

Uh oh!

AndrewZhaoLuo commented Jun 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndrewZhaoLuo commented Jun 27, 2022

Uh oh!

AndrewZhaoLuo commented Jun 27, 2022

Uh oh!

zhaoyang-star commented Jun 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndrewZhaoLuo commented Jun 28, 2022

Uh oh!

zhaoyang-star commented Jun 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndrewZhaoLuo commented Jun 30, 2022

Uh oh!

zhaoyang-star commented Jul 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhaoyang-star commented Jun 14, 2022 •

edited

Loading

zhaoyang-star commented Jun 20, 2022 •

edited

Loading

zhaoyang-star commented Jun 23, 2022 •

edited

Loading

AndrewZhaoLuo commented Jun 27, 2022 •

edited

Loading

zhaoyang-star commented Jun 28, 2022 •

edited

Loading

zhaoyang-star commented Jun 30, 2022 •

edited

Loading