support dynamic quantization of pytorch #3058

zaobao · 2024-04-02T13:00:12Z

Description

dynamic quantization (weights quantized with activations read/stored in floating point and quantized for compute)

Will this change the current api?

Who will benefit from this enhancement?
Today, PyTorch supports the following backends for running quantized operators efficiently:

x86 CPUs with AVX2 support or higher (without AVX2 some operations have inefficient implementations), via x86 optimized by fbgemm and onednn (see the details at RFC)
ARM CPUs (typically found in mobile/embedded devices), via qnnpack
(early prototype) support for NVidia GPU via TensorRT through fx2trt (to be open sourced)

zaobao added the enhancement New feature or request label Apr 2, 2024