The speed of lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite is very slow #3

liamsun2019 · 2021-11-24T01:57:23Z

Hi Author,

Have you ever tried "lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite", the in8 quantized version? The inference speed is very slow compared to float version. I guess the quantize/dequantize operations in the model are the root cause, not sure.

PINTO0309 · 2021-11-24T02:31:24Z

The TensorFlowLite runtime is not optimized for x86_64 (amd) CPUs; inference on a RaspberryPi aarch64 INT8 is about 10 times faster.

liamsun2019 · 2021-11-25T10:47:49Z

Got it, that should be the reason. Another question, I review the structure of the lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite, which is a quantization-aware trained model. I intend to extract the weights and bias value and convert them back to float type. For the weights, the quantization type is int8 while the bias looks like to be int32. For instance:

I have no idea about how to convert the bias back to float since there's only int8 quantization attribute in the corresponding layer such as:

Any suggestions? Thanks a lot.

PINTO0309 · 2021-11-25T10:50:54Z

Already have. There is also a script for conversion. You don't have to do any unnecessary work.
https://github.com/PINTO0309/PINTO_model_zoo/tree/main/115_MoveNet

If there is a reason why you really want to inverse quantize the INT8 model, you need to use tensorflow-onnx.
tflite INT8 -> tensorflow-onnx -> onnx Float32 -> tflite Float32

liamsun2019 · 2021-11-29T06:28:09Z

Big thanks for your suggestions. Actually, the website you mention is not what I really need. As you guess, I intend to inverse quantize INT8 model to float32 model. The reason is that the official float model has very bad accuracy for int8 quantization. Hence, I am thinking about getting better int8 quantization accuracy in such way, i.e, inverse quantize INT8 model to float32 model, and then do the int8 quantization. Moreover, I need to fine tune the thunder model based on such a float32 one.

I tried the approach you suggest:
tflite INT8 -> tensorflow-onnx -> onnx Float32 -> tflite Float32

Due to some reasons that cannot be overcome(some ops are not supported in related tools), I just succeed in converting to onnx float32. Based on the onnx model, I check the dequant layer to get the scale and zero_point data, and then inverse quantize to float32 simply using the formula:
R = (Q - Z) * S
where R is the float32 value, Q is the quantized int8 value, and Z, S are the zero point and scale respectively.

e.g:

I finally applied all the extracted weights/bias to thunder model and save as a pth model. The model can be inferenced normally but the results are very bad. I have no idea what could be the reason.

PINTO0309 · 2021-11-29T06:42:08Z

(some ops are not supported in related tools) This error is easy to work around. I won't go into details in this discussion because I've explained it so many times here and there that it's disgusting.
PINTO0309/PINTO_model_zoo#150

Kazuhito00 · 2021-11-29T07:24:51Z

@PINTO0309
Thank you very much for your answer.

@liamsun2019
I think the original question has been resolved.
Inverse quantization and post-quantization accuracy improvements are not supported by this repository and will be closed.

liamsun2019 · 2021-11-29T07:36:51Z

Sorry to bother. But it's not that easy.

convert tflite to onnx:
python3.6 -m tf2onnx.convert --opset 13--tflite lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite --output model.onnx --inputs-as-nchw serving_default_input:0
The opset must be set to 13 since per-axis quantization for some layers must be supported. Above command can be executed normally on my side.
convert onnx to tf
from onnx_tf.backend import prepare
import onnx
import os
import sys

TF_PATH = "./my_tf_model.pb"
ONNX_PATH = "./simplified.onnx"
onnx_model = onnx.load(ONNX_PATH)

tf_rep = prepare(onnx_model) # creating TensorflowRep object
tf_rep.export_graph(TF_PATH)

where simplified.onnx is the model simplified by onnxsim using above model.onnx as the input.
The following error is reported:
onnx_tf_prefix_truediv_1;truediv_1/y1_prequant' is not a valid scope name

I cannot google any similar information.

Kazuhito00 closed this as completed Nov 29, 2021

PINTO0309 mentioned this issue Nov 29, 2021

Convert onnx to tensorflow fails PINTO0309/PINTO_model_zoo#162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The speed of lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite is very slow #3

The speed of lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite is very slow #3

liamsun2019 commented Nov 24, 2021

PINTO0309 commented Nov 24, 2021

liamsun2019 commented Nov 25, 2021

PINTO0309 commented Nov 25, 2021 •

edited

liamsun2019 commented Nov 29, 2021

PINTO0309 commented Nov 29, 2021

Kazuhito00 commented Nov 29, 2021

liamsun2019 commented Nov 29, 2021

The speed of lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite is very slow #3

The speed of lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite is very slow #3

Comments

liamsun2019 commented Nov 24, 2021

PINTO0309 commented Nov 24, 2021

liamsun2019 commented Nov 25, 2021

PINTO0309 commented Nov 25, 2021 • edited

liamsun2019 commented Nov 29, 2021

PINTO0309 commented Nov 29, 2021

Kazuhito00 commented Nov 29, 2021

liamsun2019 commented Nov 29, 2021

PINTO0309 commented Nov 25, 2021 •

edited