Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The speed of lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite is very slow #3

Closed
liamsun2019 opened this issue Nov 24, 2021 · 7 comments

Comments

@liamsun2019
Copy link

Hi Author,

Have you ever tried "lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite", the in8 quantized version? The inference speed is very slow compared to float version. I guess the quantize/dequantize operations in the model are the root cause, not sure.

@PINTO0309
Copy link

The TensorFlowLite runtime is not optimized for x86_64 (amd) CPUs; inference on a RaspberryPi aarch64 INT8 is about 10 times faster.

@liamsun2019
Copy link
Author

Got it, that should be the reason. Another question, I review the structure of the lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite, which is a quantization-aware trained model. I intend to extract the weights and bias value and convert them back to float type. For the weights, the quantization type is int8 while the bias looks like to be int32. For instance:
image

I have no idea about how to convert the bias back to float since there's only int8 quantization attribute in the corresponding layer such as:
image

Any suggestions? Thanks a lot.

@PINTO0309
Copy link

PINTO0309 commented Nov 25, 2021

Already have. There is also a script for conversion. You don't have to do any unnecessary work.
https://github.com/PINTO0309/PINTO_model_zoo/tree/main/115_MoveNet

If there is a reason why you really want to inverse quantize the INT8 model, you need to use tensorflow-onnx.
tflite INT8 -> tensorflow-onnx -> onnx Float32 -> tflite Float32

@liamsun2019
Copy link
Author

Big thanks for your suggestions. Actually, the website you mention is not what I really need. As you guess, I intend to inverse quantize INT8 model to float32 model. The reason is that the official float model has very bad accuracy for int8 quantization. Hence, I am thinking about getting better int8 quantization accuracy in such way, i.e, inverse quantize INT8 model to float32 model, and then do the int8 quantization. Moreover, I need to fine tune the thunder model based on such a float32 one.

I tried the approach you suggest:
tflite INT8 -> tensorflow-onnx -> onnx Float32 -> tflite Float32

Due to some reasons that cannot be overcome(some ops are not supported in related tools), I just succeed in converting to onnx float32. Based on the onnx model, I check the dequant layer to get the scale and zero_point data, and then inverse quantize to float32 simply using the formula:
R = (Q - Z) * S
where R is the float32 value, Q is the quantized int8 value, and Z, S are the zero point and scale respectively.

e.g:
image

I finally applied all the extracted weights/bias to thunder model and save as a pth model. The model can be inferenced normally but the results are very bad. I have no idea what could be the reason.

@PINTO0309
Copy link

(some ops are not supported in related tools) This error is easy to work around. I won't go into details in this discussion because I've explained it so many times here and there that it's disgusting.
PINTO0309/PINTO_model_zoo#150

@Kazuhito00
Copy link
Owner

@PINTO0309
Thank you very much for your answer.

@liamsun2019
I think the original question has been resolved.
Inverse quantization and post-quantization accuracy improvements are not supported by this repository and will be closed.

@liamsun2019
Copy link
Author

Sorry to bother. But it's not that easy.

  1. convert tflite to onnx:
    python3.6 -m tf2onnx.convert --opset 13--tflite lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite --output model.onnx --inputs-as-nchw serving_default_input:0
    The opset must be set to 13 since per-axis quantization for some layers must be supported. Above command can be executed normally on my side.

  2. convert onnx to tf
    from onnx_tf.backend import prepare
    import onnx
    import os
    import sys

TF_PATH = "./my_tf_model.pb"
ONNX_PATH = "./simplified.onnx"
onnx_model = onnx.load(ONNX_PATH)

tf_rep = prepare(onnx_model) # creating TensorflowRep object
tf_rep.export_graph(TF_PATH)

where simplified.onnx is the model simplified by onnxsim using above model.onnx as the input.
The following error is reported:
onnx_tf_prefix_truediv_1;truediv_1/y1_prequant' is not a valid scope name

I cannot google any similar information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants