Skip to content

[QNN] Add hardswish int8 impl using table lookup#11700

Merged
AndrewZhaoLuo merged 13 commits intoapache:mainfrom
zhaoyang-star:hardswish_int8
Jun 28, 2022
Merged

[QNN] Add hardswish int8 impl using table lookup#11700
AndrewZhaoLuo merged 13 commits intoapache:mainfrom
zhaoyang-star:hardswish_int8

Conversation

@zhaoyang-star
Copy link
Contributor

@zhaoyang-star zhaoyang-star commented Jun 14, 2022

Registered the hardswish unary elementwise op

@AndrewZhaoLuo @mbrookhart

@AndrewZhaoLuo AndrewZhaoLuo self-requested a review June 16, 2022 20:47
@AndrewZhaoLuo
Copy link
Contributor

I'll take a look today or tomorrow

@zhaoyang-star
Copy link
Contributor Author

zhaoyang-star commented Jun 20, 2022

I'll take a look today or tomorrow

By now 2 gpu ci tasks and Hexagon tasks failed. Could you please help to figure out the error? Thanks in advance.
Faild task commands on gpu( they are fine on cpu):

python3 tests/scripts/ci.py gpu --tests tests/python/frontend/pytorch/test_fx_quant.py::test_deeplab_v3
python3 tests/scripts/ci.py gpu --tests tests/python/frontend/pytorch/qnn_test.py::test_quantized_module

Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the hexagon was probably just flaky so you can rerun CI with an empty commit:

git commit -m 'jostle ci' --allow-empty and git push

As for the other errors, see the comment

@zhaoyang-star
Copy link
Contributor Author

zhaoyang-star commented Jun 23, 2022

LGTM, the hexagon was probably just flaky so you can rerun CI with an empty commit:

git commit -m 'jostle ci' --allow-empty and git push

As for the other errors, see the comment

Thanks for your kind help. I have fix the typo and recommit with git commit -m 'jostle ci' --allow-empty. But tasks for Hexagon still failed. The error log:

LLVM ERROR: Cannot select: 0x451c698: i32 = fp_to_fp16 0x44eba68

It seems ci doesn't skip tasks for Hexagon? @AndrewZhaoLuo

@AndrewZhaoLuo
Copy link
Contributor

I'm not sure, maybe @mehrdadh or @cconvey knows something?

Maybe you just need to rebase on main and try again? Not sure

@mehrdadh
Copy link
Member

@zhaoyang-star I checked our CI on TVM main and I don't see this error. Can you reproduce the error locally using ci_hexagon docker image. It might be related to your PR

@zhaoyang-star
Copy link
Contributor Author

@mehrdadh Thanks for your quick reply. Maybe it is my PR that cause the failure. @AndrewZhaoLuo
I will double check PR and try to reproduce the error locally.

@zhaoyang-star
Copy link
Contributor Author

Have fixed error and triggle ci again. But the status of ci is pending for a few hours.
Is there some error on the way I submit PR? Thanks @AndrewZhaoLuo

@AndrewZhaoLuo
Copy link
Contributor

AndrewZhaoLuo commented Jun 27, 2022

Nah, it's just flaky -- I think there was some planned work with the CI machines so you got hit with the outage. Just jostle one more time and let's get this merged!

Edit: #11914 <-- due to this.

@AndrewZhaoLuo
Copy link
Contributor

Also apparently you can do this:

@AndrewZhaoLuo
Copy link
Contributor

@tvm-bot rerun

@zhaoyang-star
Copy link
Contributor Author

zhaoyang-star commented Jun 28, 2022

One more question:
Theoretically looking Up Table will accelate qnn op because the original impl is dequantize -> fp32 calculate -> quantize, while LUT is created once and we just index the table during inference. But when I run a yolov5 int8 pytorch model which has 57 quantized::hardswish ops, the inference time is almost the same as before. Is there something I misunderstand? or something wrong with code? @AndrewZhaoLuo

@AndrewZhaoLuo
Copy link
Contributor

Hmm, do you have a profiler report?

I am curious since I would expect runtimes to be better vs dq - fp32 - q. Do you have a repo to reproduce?

@AndrewZhaoLuo AndrewZhaoLuo merged commit 97b3076 into apache:main Jun 28, 2022
@zhaoyang-star
Copy link
Contributor Author

zhaoyang-star commented Jun 30, 2022

Hmm, do you have a profiler report?

I am curious since I would expect runtimes to be better vs dq - fp32 - q. Do you have a repo to reproduce?

Based on tests/python/fronend/pytorch/test_fx_quant.py, I replaced all relu with hswish in resnet50.

  • only use one cpu core
  • benchmark int8 model
Quantized Model Inference Time(msec)
resnet50(relu) 1149
resnet50(hswish) w/o LUT 1210
resnet50(hswish, LUT) 1171

About 3% speedup by using LUT. I also tried yolov5 with hswish model, which is about 9% speedup by LUT.

@AndrewZhaoLuo
Copy link
Contributor

Hmm yeah this makes sense, i would expect LUT to be slower than ReLU as it requires more memory access.

I suspect perhaps the activation functions just don't take much time? Really ReLU is close to the fastest you can go probably. You can maybe see the upper value for speedup by removing all activations.

Still technically a little bit of improvement!

blackkker pushed a commit to blackkker/tvm that referenced this pull request Jul 7, 2022
* v1

* [QNN] Add hardswish int8 impl using table lookup

* format

* format

* fix

* fix utest

* fix ci error

* jostle ci

* triggle ci

* remote nn

* jostle ci

* fix
@zhaoyang-star
Copy link
Contributor Author

Hmm, do you have a profiler report?
I am curious since I would expect runtimes to be better vs dq - fp32 - q. Do you have a repo to reproduce?

Based on tests/python/fronend/pytorch/test_fx_quant.py, I replaced all relu with hswish in resnet50.

  • only use one cpu core
  • benchmark int8 model

Quantized Model Inference Time(msec)
resnet50(relu) 1149
resnet50(hswish) w/o LUT 1210
resnet50(hswish, LUT) 1171
About 3% speedup by using LUT. I also tried yolov5 with hswish model, which is about 9% speedup by LUT.

Maybe there is something wrong when I created the resnet50 with hswish.
I used a quantized YOLOv5s in which has hswish. The perf improved 50.2% ^_^

Quantized Model Inference Time(msec)
YOLOv5s(hswish) w/o LUT 18.88
YOLOv5s(hswish, LUT) 12.57

masahi pushed a commit to masahi/tvm that referenced this pull request Jul 15, 2022
* v1

* [QNN] Add hardswish int8 impl using table lookup

* format

* format

* fix

* fix utest

* fix ci error

* jostle ci

* triggle ci

* remote nn

* jostle ci

* fix
mikeseven pushed a commit to mikeseven/tvm that referenced this pull request Sep 27, 2023
* v1

* [QNN] Add hardswish int8 impl using table lookup

* format

* format

* fix

* fix utest

* fix ci error

* jostle ci

* triggle ci

* remote nn

* jostle ci

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants