New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The speed of lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite is very slow #3
Comments
The TensorFlowLite runtime is not optimized for x86_64 (amd) CPUs; inference on a RaspberryPi aarch64 INT8 is about 10 times faster. |
Already have. There is also a script for conversion. You don't have to do any unnecessary work. If there is a reason why you really want to inverse quantize the INT8 model, you need to use tensorflow-onnx. |
Big thanks for your suggestions. Actually, the website you mention is not what I really need. As you guess, I intend to inverse quantize INT8 model to float32 model. The reason is that the official float model has very bad accuracy for int8 quantization. Hence, I am thinking about getting better int8 quantization accuracy in such way, i.e, inverse quantize INT8 model to float32 model, and then do the int8 quantization. Moreover, I need to fine tune the thunder model based on such a float32 one. I tried the approach you suggest: Due to some reasons that cannot be overcome(some ops are not supported in related tools), I just succeed in converting to onnx float32. Based on the onnx model, I check the dequant layer to get the scale and zero_point data, and then inverse quantize to float32 simply using the formula: I finally applied all the extracted weights/bias to thunder model and save as a pth model. The model can be inferenced normally but the results are very bad. I have no idea what could be the reason. |
|
@PINTO0309 @liamsun2019 |
Sorry to bother. But it's not that easy.
TF_PATH = "./my_tf_model.pb" tf_rep = prepare(onnx_model) # creating TensorflowRep object where simplified.onnx is the model simplified by onnxsim using above model.onnx as the input. I cannot google any similar information. |
Hi Author,
Have you ever tried "lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite", the in8 quantized version? The inference speed is very slow compared to float version. I guess the quantize/dequantize operations in the model are the root cause, not sure.
The text was updated successfully, but these errors were encountered: