Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch 转 TFLite 使用 int8 量化 #274

Closed
Tongsen99 opened this issue Dec 22, 2023 · 4 comments
Closed

PyTorch 转 TFLite 使用 int8 量化 #274

Tongsen99 opened this issue Dec 22, 2023 · 4 comments
Labels
question Further information is requested

Comments

@Tongsen99
Copy link

converter = TFLiteConverter(
model,
dummy_input,
tflite_path=output_path,
strict_symmetric_check=True,
quantize_target_type='int8',
quantize_input_output_type = 'int8',
fuse_quant_dequant = True,
)
中 dummy_input 需要是 int8 类型吗?如果设置了 int8 类型会报 RuntimeError: expected scalar type Char but found Float;如果设置了 float32 类型会报 AssertionError: Unsupported types: input_type: float32, quantize_input_type: int8
请问怎样能够解决这个问题,完整实现 PyTorch 转 TFLite 使用 int8 量化(包括输入输出)呢?

@peterjc123 peterjc123 added the question Further information is requested label Dec 22, 2023
@peterjc123
Copy link
Collaborator

如果需要静态量化(qat或者ptq)请按照examples/quantization下面的样例代码来(你直接把浮点的模型传入TFLiteConverter是不能直接得到静态量化的模型的),如果是动态量化,参考examples/converter/dynamic.py

@Tongsen99
Copy link
Author

我使用的就是 examples/converter/dynamic.py 中的代码,然后我换了一个分类模型,希望得到 int8 量化后的 tflite。修改了需要传入的 dummy_input,dummy_input = torch.ones((1, 3, 64, 64)),但是设置 dummy_input 为 int8 或 float32 类型都会报错。

@peterjc123
Copy link
Collaborator

peterjc123 commented Dec 23, 2023

dynamic.py下面还有一系列参数 你没抄全
以及下面这两个参数是给全量化使用的,动态量化不要加这些参数
quantize_input_output_type = 'int8',
fuse_quant_dequant = True,

@peterjc123
Copy link
Collaborator

peterjc123 commented Dec 23, 2023

我使用的就是 examples/converter/dynamic.py 中的代码,然后我换了一个分类模型,希望得到 int8 量化后的 tflite。修改了需要传入的 dummy_input,dummy_input = torch.ones((1, 3, 64, 64)),但是设置 dummy_input 为 int8 或 float32 类型都会报错。

你这里int8量化如果是指的全量化模型,你要用examples/quantization下的qat.py或者post.py
关于不同量化技术的比较,可以看tflite的介绍

https://www.tensorflow.org/lite/performance/post_training_quantization?hl=zh-cn

简言之动态量化是weight only量化
静态量化是weight activation都一起量化了
全量化比较快,而且适用于npu等各个芯片
或者你可以安装一下netron,跑我们的样例代码,打开生成后的模型,就可以看出两个量化模式有什么区别了

Update: 重新看了你的第一条帖子,你要int8的量化输入输出,那你肯定需要的是全量化,请按照全量化的代码样例来走,谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants