Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale Parameter with Gradient #12

Closed
thuako opened this issue Apr 5, 2021 · 3 comments
Closed

Scale Parameter with Gradient #12

thuako opened this issue Apr 5, 2021 · 3 comments

Comments

@thuako
Copy link

thuako commented Apr 5, 2021

Hi, I want to mix your HAWQ-v3 and QNN which implement custom gradient in scale parameters, like PACT, QIL, LSQ.

I wonder if why didn't you tried to those scale paramter with gradient.

Is there any problem with training? or something else?

I would appreciate for you reply.

@Zhen-Dong
Copy link
Owner

Zhen-Dong commented Apr 26, 2021

Hi, thanks a lot for your interest.
We use the standard quantizer without optimizing the quantization range (aka the scale parameter), because we think it would be more general in terms of the algorithm. Otherwise, it's hard to tell whether the accuracy improvement comes from our method or the clipping/learnable quantizer methods.
Since these methods are orthogonal, I think combining them would not cause problems. Though the gain by adding gradient-based clipping methods may not be significant since we are using 4/8-bit, which are larger than binary/ternary where a wise quantization range is the key.
Hope these are helpful.

@thuako
Copy link
Author

thuako commented May 25, 2021

Thank you for reply :)

@thuako thuako closed this as completed May 25, 2021
@gihwan-kim
Copy link

@Zhen-Dong

Hi, thanks a lot for your interest. We use the standard quantizer without optimizing the quantization range (aka the scale parameter), because we think it would be more general in terms of the algorithm. Otherwise, it's hard to tell whether the accuracy improvement comes from our method or the clipping/learnable quantizer methods. Since these methods are orthogonal, I think combining them would not cause problems. Though the gain by adding gradient-based clipping methods may not be significant since we are using 4/8-bit, which are larger than binary/ternary where a wise quantization range is the key. Hope these are helpful.

Does it ok to use the same scaling factor even if input data is different?
I can't find calculating scaling factor code in quantized model.
It only use fixed scaling factors.

I think your method is using fixed scaling factor of input data and input of layers in tvm relay code
But, if it uses fixed scaling factor of input data. I think it will effect bad accuracy or same result of inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants