Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference on a model prequantized in pytorch #770

Closed
chrschinab opened this issue Sep 7, 2020 · 2 comments
Closed

Inference on a model prequantized in pytorch #770

chrschinab opened this issue Sep 7, 2020 · 2 comments

Comments

@chrschinab
Copy link

Hallo,
I want to run inference on a resnet50 model prequantized to INT8 in pytorch . Is there a way to deploy this model in TensorRT? As far as I understood only a quantized ONNX model is accepted by TensorRT. However, currently, Pytorch does not support the conversion to quantized ONNX.
Is it possible to deploy a model prequantized in TFLite with TensorRT?

@mk-nvidia
Copy link

A future release of TensorRT will be able to import a prequantized model from PyTorch. These models can be quantized using the toolkit we released earlier.
We don’t yet support deploying a prequantized model from TFLite

@skyw
Copy link

skyw commented Nov 17, 2020

Hi

Pytorch does support export (fake) quantized model to ONNX. Although per channel support hasn't been merged to master because it depends on ONNX opset13. See pytorch/pytorch#42835.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants