Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support dynamic quantization of pytorch #3058

Open
zaobao opened this issue Apr 2, 2024 · 0 comments
Open

support dynamic quantization of pytorch #3058

zaobao opened this issue Apr 2, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@zaobao
Copy link

zaobao commented Apr 2, 2024

Description

dynamic quantization (weights quantized with activations read/stored in floating point and quantized for compute)

Will this change the current api?

  • Model.quantize

Who will benefit from this enhancement?
Today, PyTorch supports the following backends for running quantized operators efficiently:

  • x86 CPUs with AVX2 support or higher (without AVX2 some operations have inefficient implementations), via x86 optimized by fbgemm and onednn (see the details at RFC)

  • ARM CPUs (typically found in mobile/embedded devices), via qnnpack

  • (early prototype) support for NVidia GPU via TensorRT through fx2trt (to be open sourced)

References

@zaobao zaobao added the enhancement New feature or request label Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant