-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay] Convert a fake quantized or QAT graph into QNN ops #8126
[Relay] Convert a fake quantized or QAT graph into QNN ops #8126
Conversation
Very nice!! cc @anijain2305 @electriclilies I wonder if "quantize" is the best verb for saying "rewrite fake quantized graphs into real integer-quantized ones". But I don't have a better alternative either. |
@masahi @anijain2305 Any thoughts on naming? I still don't love what I have, but I agree with Masa that I haven't been able to come up with anything better... |
I think the repetition of the word quantize is in |
ad608d4
to
2d52eed
Compare
Hmm, that's an interesting idea. To throw other random thoughts out: Any thoughts? Rebased to get around a weird threading bug in another CI test, if people have a naming preference I can refactor while the CI runs to make sure the pass it working. |
@mbrookhart I think Also, I did a quick google search and I think that the term affine space is used when talking about quantization in physics, but I didn't see any references to it in computer science literature. The only thing that comes up if you search "affine space" is stuff about vector spaces, and if you search "quantization affine space" you get physics papers. So I think if we do use the term affine space, we should be careful to explain what we mean by it in code comments and documentation since it's not a term that is commonly used. |
I'm a physicist, that must be why that term makes so much more sense to me :D |
But I'm happy to use |
I also prefer |
Thanks, this is nice addition and improves framework coverage very nicely. I agree that |
Awesome, thanks everyone, I'll refactor to that name. |
Refactor done. Thanks! |
1451a1b
to
92952bb
Compare
@masahi @electriclilies Please approve explicitly when you get a chance. And we can land this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, one nitpick is that you use the word "affine" in a bunch of the documentation and some of the internal names without defining it, which I do think is important since it is a term that isn't used in computer science often. I think at least defining it in the documentation of AffineType and in the documentation of register_fake_quantization_to_integer
would be good.
I don't think it's super important though so you could add the definitions in a later PR.
@mbrookhart Feel free to merge the PR as you decide if want to address Lily's comments in this or next PR. All good from my side. |
@electriclilies Thanks for the suggestions, I added a definition for completeness, I don't want to confuse users. I think that it is a fairly common term in the quantization literature, though, see, for instance, |
4617b8c
to
c150e4f
Compare
Thanks @masahi @anijain2305 @electriclilies |
* Convert a fake quantized or QAT graph into qnn ops * fix pylint * fix typos * use an identify function for some ops * rename the pass from quantize_fake_quantization to fake_quantization_to_integer * add definition for affine
* Convert a fake quantized or QAT graph into qnn ops * fix pylint * fix typos * use an identify function for some ops * rename the pass from quantize_fake_quantization to fake_quantization_to_integer * add definition for affine
Recently, we discovered that tf2onnx is exporting some int8 graphs as fake quantized/QAT models in ONNX, i.e, int8 ops are exported as dequantize->op->quantize.
This PR introduces a pass to convert those graphs into direct int8 ops inside relay. I've tested correctness of the resulting models on Inceptionv1 and ssd-mobilenet-v1 from the tensorflow lite model zoo imported via ONNX. Follow up work will analyze further models for more operations to include in this pass.
cc @AndrewZhaoLuo @masahi @jwfromm