Inference on a model prequantized in pytorch #770

chrschinab · 2020-09-07T14:11:24Z

Hallo,
I want to run inference on a resnet50 model prequantized to INT8 in pytorch . Is there a way to deploy this model in TensorRT? As far as I understood only a quantized ONNX model is accepted by TensorRT. However, currently, Pytorch does not support the conversion to quantized ONNX.
Is it possible to deploy a model prequantized in TFLite with TensorRT?

mk-nvidia · 2020-11-17T20:40:33Z

A future release of TensorRT will be able to import a prequantized model from PyTorch. These models can be quantized using the toolkit we released earlier.
We don’t yet support deploying a prequantized model from TFLite

skyw · 2020-11-17T20:42:15Z

Hi

Pytorch does support export (fake) quantized model to ONNX. Although per channel support hasn't been merged to master because it depends on ONNX opset13. See pytorch/pytorch#42835.

mk-nvidia added ask-the-experts Framework: PyTorch Quantization: QAT Quanitization-aware Training labels Nov 17, 2020

mk-nvidia closed this as completed Nov 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference on a model prequantized in pytorch #770

Inference on a model prequantized in pytorch #770

chrschinab commented Sep 7, 2020

mk-nvidia commented Nov 17, 2020

skyw commented Nov 17, 2020

Inference on a model prequantized in pytorch #770

Inference on a model prequantized in pytorch #770

Comments

chrschinab commented Sep 7, 2020

mk-nvidia commented Nov 17, 2020

skyw commented Nov 17, 2020