Add extra output for inference #23

liamsun2019 · 2021-12-30T01:21:59Z

Hi Author,

I need to add some extra output tensors which are used for inference. These tensors are not referenced during training but just for inference after the conversion to tflite. My naive intention is to put some operations in forward as many as possible so as to relief the loading of post processing which has to be implemented by c/c++ code.

For instance, some matrix ops such as reshape/sigmoid/multiply better be done by GPU/NPU instead of with CPU.

I add some logic in forward to implement this requirement and the training goes well but the conversion to tflite fails with following error message:
File "/usr/local/lib/python3.6/dist-packages/torch/nn/quantized/modules/functional_modules.py", line 160, in mul
r = ops.quantized.mul(x, y, scale=self.scale, zero_point=self.zero_point)
RuntimeError: Mul operands should have same data type.

Is there any feasible way or workaround for this scenario?
The script attached. Thanks.
movenet_qat.zip

liamsun2019 · 2021-12-30T01:26:31Z

source code snippet:

center = ret[head]
center_max = torch.sigmoid(center)
center_max = self.maxpool(center_max)
center_peaks = (center_max == center).float()
center = center * center_peaks
ret['filtered_hm'] = center

where ret['filtered_hm'] is one of the extra outputs. The error message is supposed to be related that.

liamsun2019 · 2021-12-30T01:39:54Z

If I changed the codes like following:
center = ret[head]
center_max = torch.sigmoid(center)
center_max = self.maxpool(center_max)
ret['hm_hmax'] = center_max

Another error rises up:
assert tensor.q_zero_point() == 128, "As for symmetric quantization, "
AssertionError: As for symmetric quantization, the zero point of the u8 tensors should be 128. This could happen if you didn't train
the model after QAT preparation. Attached the script.
movenet_qat.zip

liamsun2019 · 2021-12-30T01:46:38Z

Above experiments are based on the recent version.

liamsun2019 · 2021-12-30T02:03:58Z

BTW，the following line
center = ret[head]

better rewritten as:
center = ret[head].clone()

to avoid being overridden. It does not influence the experiment results.

peterjc123 · 2021-12-31T03:24:38Z

@liamsun2019

I add some logic in forward to implement this requirement and the training goes well but the conversion to tflite fails with following error message:
File "/usr/local/lib/python3.6/dist-packages/torch/nn/quantized/modules/functional_modules.py", line 160, in mul
r = ops.quantized.mul(x, y, scale=self.scale, zero_point=self.zero_point)
RuntimeError: Mul operands should have same data type.

This is because the graph rewriter for quantization doesn't properly handle type casting functions like .float(). At this point, you may rewrite it yourself.

The diff to the model I made it to work is shown below.

317c317,318
<         mul_1 = self.float_functional_simple_13.mul(hm_3, float_1)
---
>         fake_dequant_0 = self.fake_dequant_0(hm_3)
>         mul_1 = fake_dequant_0 * float_1
340,341c341,342
<         fake_dequant_0 = self.fake_dequant_0(hm_3)
<         fake_dequant_1 = self.fake_dequant_1(mul_1)
---
>         # fake_dequant_1 = self.fake_dequant_1(mul_1)
>         fake_dequant_1 = mul_1

If I changed the codes like following: center = ret[head] center_max = torch.sigmoid(center) center_max = self.maxpool(center_max) ret['hm_hmax'] = center_max

Another error rises up: assert tensor.q_zero_point() == 128, "As for symmetric quantization, " AssertionError: As for symmetric quantization, the zero point of the u8 tensors should be 128. This could happen if you didn't train the model after QAT preparation. Attached the script. movenet_qat.zip

The error message is legit. For symmetric QAT, you must train the model for at least one iteration or invoke the forward function of the model at least once, otherwise the zero point will remain zero, which leads to this error.

from movenet_qat import MoveNet_qat
model = MoveNet_qat()

dummy_input = torch.ones((1, 3, 256, 256), dtype=torch.float32)

# QAT prep
quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'rewrite_graph': False, ...})
qat_model = quantizer.quantize()

# Invoke once
qat_model(dummy_input) 

# Conversion goes here

liamsun2019 · 2021-12-31T03:47:53Z

Big Thanks. I'll try it out later and let you know the results.

liamsun2019 · 2022-01-05T03:15:24Z

One more question, is there a simple way for forward to output different tensors under different conditions? For instance, I need o1, o2 for training while o3 and o4 for inference. In the stage of conversion to tflite, I just want o3 and o4 to be output. Do I have to manually edit the .py file to achieve this ?

liamsun2019 · 2022-01-05T04:02:29Z

The error message is legit. For symmetric QAT, you must train the model for at least one iteration or invoke the forward function of the model at least once, otherwise the zero point will remain zero, which leads to this error.

==> Based on my experiments, it shows that this error only rises up after I add some extra outputs to the forward operation. If I remove these extra outputs, the error disappear. I set max epoch to 2 to conduct the experiments.

liamsun2019 · 2022-01-05T04:05:50Z

My script attached
movenet_qat.zip

liamsun2019 · 2022-01-05T04:12:51Z

My simple guess is since the added outputs do not join in training, so the tracer(just guess) might not trace them correctly.

liamsun2019 · 2022-01-06T04:07:06Z

Any updates?
^_^

peterjc123 · 2022-01-06T06:56:12Z

Any updates? ^_^

Sorry for late reply, we were working on something else.

The error message is legit. For symmetric QAT, you must train the model for at least one iteration or invoke the forward function of the model at least once, otherwise the zero point will remain zero, which leads to this error.

==> Based on my experiments, it shows that this error only rises up after I add some extra outputs to the forward operation. If I remove these extra outputs, the error disappear. I set max epoch to 2 to conduct the experiments.

As can be seen in
https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/observer.py#L272-L273 and https://github.com/pytorch/pytorch/blob/402f2934bf380964a403d2e139ec529d1f5bac0e/torch/ao/quantization/utils.py#L148-L176, if you don't run inference once, the min, max values of the observers will remain -inf and inf, so that the scale and the zero point will be set 1 and 0 accordingly, which leads to the failed asserts in the converter. There's nothing more I can say without the details of your experiment.

One more question, is there a simple way for forward to output different tensors under different conditions? For instance, I need o1, o2 for training while o3 and o4 for inference. In the stage of conversion to tflite, I just want o3 and o4 to be output. Do I have to manually edit the .py file to achieve this ?

The problem is already covered in our FAQ. So, the brief answer is yes, because it's how tracing works.

liamsun2019 · 2022-01-06T07:04:20Z

Got it, I will try the proposed way in FAQ. Big thanks for your help.

liamsun2019 · 2022-01-06T08:58:51Z

I followed the method in FAQ:

Generate the script for inference. (Looks that quantizer.quantize() will force setting training mode and I have to hack my code to generate the script)
QAT train the model and get the qat_last_model.pth
Based on the script for inference, convert to tflite like following:
if name == "main":
qat_model = MoveNet_qat()
qat_model.load_state_dict(torch.load('qat_last_model.pth'), strict=False)

dummy_input = torch.ones((1, 3, 256, 256), dtype=torch.float32)
with torch.no_grad():
qat_model.eval()
qat_model.cpu()
qat_model(dummy_input.to('cpu'))
torch.backends.quantized.engine = 'qnnpack'
converter = TFLiteConverter(qat_model, dummy_input, tflite_path="test.tflite", asymmetric=False)
converter.convert()

The tflite can be generated. But the weights/bias in it are already converted to float32. I actually need a QAT tflite whose weights/bias are supposed to be int8/int32. How could I achieve it?

peterjc123 · 2022-01-06T09:04:06Z

@liamsun2019 You need to go through quantizer again although your model is already QAT rewritten (because a QAT rewritten model is still a float model, not a quantized one).

qat_model = MoveNet_qat()
qat_model.load_state_dict(torch.load('qat_last_model.pth'), strict=False)

dummy_input = torch.ones((1, 3, 256, 256), dtype=torch.float32)
quantizer = QATQuantizer(qat_model, dummy_input, work_dir='out', config={'rewrite_graph': False, ...})
qat_model = quantizer.quantize()

peterjc123 · 2022-01-06T09:04:50Z

See https://github.com/alibaba/TinyNeuralNetwork/blob/main/examples/qat/qat.py#L30 and https://github.com/alibaba/TinyNeuralNetwork/tree/main/examples/qat#the-quantization-process-in-pytorch for more details.

liamsun2019 · 2022-01-06T09:43:43Z

if name == "main":
model = MoveNet_qat()
model.load_state_dict(torch.load('qat_last_model.pth'), strict=False)
dummy_input = torch.ones((1, 3, 256, 256), dtype=torch.float32)
quantizer = QATQuantizer(model, dummy_input, work_dir='./', config={'backend': "qnnpack", 'force_overwrite': False, 'asymmetric': False, 'per_tensor': False, 'rewrite_graph': False})
qat_model = quantizer.quantize()
qat_model(dummy_input.to('cpu'))

with torch.no_grad():
    qat_model.eval()
    qat_model.cpu()
    qat_model = torch.quantization.convert(qat_model)
    torch.backends.quantized.engine = 'qnnpack'
    converter = TFLiteConverter(qat_model, dummy_input, tflite_path="ohyeah.tflite", asymmetric=False)
    converter.convert()

The following error rises up:
assert tensor.q_zero_point() == 128, "As for symmetric quantization, "
AssertionError: As for symmetric quantization, the zero point of the u8 tensors should be 128. This could happen if you didn't train the model after QAT preparation.

I do not need training anymore but looks like that I have to. Any suggestions?

peterjc123 · 2022-01-06T10:11:13Z

@liamsun2019 Would you please share the code of the class MoveNet_qat?

liamsun2019 · 2022-01-06T10:17:39Z

Sure, FYR
test.zip

peterjc123 · 2022-01-07T02:41:16Z

@liamsun2019 I can reproduce locally. Looking into it now.

peterjc123 · 2022-01-07T03:26:39Z

It seems the problem is on torch.sigmoid. A tensor with the qscheme of per tensor and symmetric will become per tensor and affine after running through this op, so we need to insert a (re)quantize op after it.

liamsun2019 · 2022-01-07T03:58:19Z

A tensor with the qscheme of per tensor and symmetric will become per tensor and affine after running through this op, so we need to insert a (re)quantize op after it.
==> I apply qscheme of per channel and symmetric, instead of per tensor. So this issue also exists for qscheme of per channel and symmetric?

peterjc123 · 2022-01-07T04:08:01Z

Just wonder what your target platform is. Take NNAPI as an example, it supports the common qschemes:

ANEURALNETWORKS_TENSOR_QUANT8_ASYMM (uint8)
ANEURALNETWORKS_TENSOR_QUANT8_ASYMM_SIGNED (int8)
ANEURALNETWORKS_TENSOR_QUANT8_SYMM (int8 with zero point=0)
ANEURALNETWORKS_TENSOR_QUANT8_SYMM_PER_CHANNEL (int8 with zero point=0)

For ops that support per channel (e.g. Conv2D), you should use (4) for the weight and (2) or (3) for the input. As for other ops, you use (2) or (3). But I don't the support for (2) is broad enough, usually there is only support for (3).

peterjc123 · 2022-01-07T04:12:15Z

A tensor with the qscheme of per tensor and symmetric will become per tensor and affine after running through this op, so we need to insert a (re)quantize op after it. ==> I apply qscheme of per channel and symmetric, instead of per tensor. So this issue also exists for qscheme of per channel and symmetric?

The activations are always using the per tensor qscheme. That's why I ask the previous question. If your target platform has support for (2), then we may just cancel the limitation. But if it only supports (3), then we need to insert the requantize nodes during QAT graph rewriting.

liamsun2019 · 2022-01-07T05:48:25Z

For per-channel QAT，our target platform supports asymmetric_affine int8 for activation. And weights supports perchannel_symmetric_affine int8

peterjc123 · 2022-01-07T06:25:32Z

For per-channel QAT，our target platform supports asymmetric_affine int8 for activation. And weights supports perchannel_symmetric_affine int8

OK, we will work on it.

peterjc123 · 2022-01-07T08:00:51Z

@liamsun2019 I've uploaded the related changes. You may have to use the following line instead for defining the converter object.

converter = TFLiteConverter(qat_model, dummy_input, tflite_path="ohyeah.tflite", asymmetric=True, quantize_target_type='int8')

liamsun2019 · 2022-01-07T08:03:50Z

OK. Will update and try it out.

liamsun2019 · 2022-01-10T02:22:26Z

I tried the recent version and can get the outputs for inference. Thanks a lot.

peterjc123 · 2022-01-10T02:30:26Z

@liamsun2019 Looks like this issue is resolved. I'll close it. Please feel free to open a new issue when you encounter new problems. Again, thanks for supporting our project.

peterjc123 · 2022-01-10T03:11:24Z

@liamsun2019 FYI, we've decoupled asymmetric and per_tensor in the quantizer so you are now free to do asymmetric per-channel quantization. Please read here for more details.

liamsun2019 · 2022-01-10T03:14:33Z

My understanding is that you support asym per-channel QAT now, right ?

peterjc123 · 2022-01-10T03:18:51Z

My understanding is that you support asym per-channel QAT now, right ?

Yes. But actually this should be
OPs that support per-channel, weight , symmetric, int8, per-channel
OPs that support per-channel, activation , asymmetric, int8, per-tensor
Other OPs, weight, symmetric, int8, per-tensor
Other OPs, activation , asymmetric, int8, per-tensor
for config={'asymmetric': True, 'per_tensor': False}.

Previously, we have
OPs that support per-channel, weight , symmetric, int8, per-channel
OPs that support per-channel, activation , symmetric, int8, per-tensor
Other OPs, weight, symmetric, int8, per-tensor
Other OPs, activation , symmetric, int8, per-tensor
for config={'asymmetric': False, 'per_tensor': False}.

liamsun2019 · 2022-01-10T06:15:14Z

I just tried with config:

quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'backend': "qnnpack", 'force_overwrite': True, 'asymmetric': True, 'per_tensor': False, 'rewrite_graph': True})

and converter:
converter = TFLiteConverter(qat_model, dummy_input, tflite_path='out' + '/qat_model.tflite', asymmetric=True)

The following error is output:
assert tensor.q_zero_point() == asym_s8_offset, "As for asymmetric quantization, "
RuntimeError: Expected quantizer->qscheme() == kPerTensorAffine to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

With above message, looks like some pytorch ops do not support asym per-channel. As what you mentioned before, all the activations only suppport per-tensor qat, so probably asym per-channel schema cannot be applied to my case.

peterjc123 · 2022-01-10T06:48:32Z

@liamsun2019 Please try the following configuration.

quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'backend': "qnnpack", 'force_overwrite': True, 'asymmetric': True, 'per_tensor': False, 'rewrite_graph': True})

converter = TFLiteConverter(qat_model, dummy_input, tflite_path='out' + '/qat_model.tflite', asymmetric=True, quantize_target_type='int8')

liamsun2019 · 2022-01-10T07:02:45Z

Yes, it works this way and the resulted QAT tflite seems to always hold weights in int8 data type.

peterjc123 · 2022-01-10T07:56:38Z

@liamsun2019 As far as I know, the model with u8 weights or inputs doesn't support per-channel quantization.

liamsun2019 · 2022-01-11T01:34:40Z

I got it. But, based on the recent version, I encounterd following error when doing symmetric per-channel QAT which is fine with old version:

WARNING (tinynn.converter.base) Symmetric quantized model with uint8 is unsupported in most backends of TFLite
Traceback (most recent call last):
File "main.py", line 243, in
main_quant(opt)
File "main.py", line 234, in main_quant
converter.convert()
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+5fe672e315215dc3914d264d5f7756d783b5addb-py3.6.egg/tinynn/converter/base.py", line 285, in convert
self.init_operations()
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+5fe672e315215dc3914d264d5f7756d783b5addb-py3.6.egg/tinynn/converter/base.py", line 250, in init_operations
converter.parse(node, attrs, args, self.common_graph)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+5fe672e315215dc3914d264d5f7756d783b5addb-py3.6.egg/tinynn/converter/operators/torch/quantized.py", line 116, in parse
self.parse_common(graph_converter)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+5fe672e315215dc3914d264d5f7756d783b5addb-py3.6.egg/tinynn/converter/operators/torch/quantized.py", line 74, in parse_common
weight_tensor = self.create_attr_tensor(weight)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+5fe672e315215dc3914d264d5f7756d783b5addb-py3.6.egg/tinynn/converter/operators/torch/base.py", line 198, in create_attr_tensor
return tfl.Tensor(tensor, name, has_buffer=True, asymmetric=self.asymmetric, q_type=self.q_type)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+5fe672e315215dc3914d264d5f7756d783b5addb-py3.6.egg/tinynn/converter/operators/tflite/base.py", line 198, in init
asym_s8_offset = tensor.q_zero_point()
RuntimeError: Expected quantizer->qscheme() == kPerTensorAffine to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

I wonder if something wrong degrades the latest codes? My config is:

quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'backend': "qnnpack", 'force_overwrite': True, 'asymmetric': False, 'per_tensor': False, 'rewrite_graph': True})

converter = TFLiteConverter(qat_model, dummy_input, tflite_path='out' + '/qat_model.tflite', asymmetric=False)

I wonder if the usage is changed or not. Above config works well for old version.

liamsun2019 · 2022-01-11T01:38:44Z

One thing needs to be pointed out that I used the exactly same outputs for training and inference for above experiment.

liamsun2019 · 2022-01-11T01:50:03Z

Looks like that I need to set quantize_target_type=int8 explicitly which is not a must before.

peterjc123 · 2022-01-11T01:52:34Z

Looks like that I need to set quantize_target_type=int8 explicitly which is not a must before.

Yes, the changes in 9b656ce are not backward compatible. You have to do it now when you use per-channel quantization.

liamsun2019 · 2022-01-11T02:02:30Z

Thanks. Another question that may not be related with the issue. I notice 'Dequantize' nodes exist for some QAT tflite models while the converted tflite model via tinynn does not.

I just wonder how can this happen?

peterjc123 · 2022-01-11T02:53:39Z

@liamsun2019 Just curious how you get the model? It seems that you use onnx2tf and then TFLiteConverter from official TF to get the first model.
As for the first model, you convert via dynamic range quantization, in which it tries to quantized all weights and biases. But since Conv2D is not a op that supports this kind of inference (it's called Hybrid kernels internally), they will be converted back to floating point. That's why you see the dequantize nodes in the graph. So it only reduces the size of the model.
For the second model, it's converted via quantization-aware training. As you can see, the weights and biases are quantized, so they will actually go through the quantized kernels, so you are likely to see a speedup in model inference.

liamsun2019 · 2022-01-11T06:34:20Z

Big thanks for your detailed illustration. I made a mistake when introducing the 2 graphs. In fact, the 1st graph comes from a tflite model that's not in QAT representation but I took for that it's QAT.

dinghuanghao added help wanted Extra attention is needed question Further information is requested and removed help wanted Extra attention is needed labels Dec 30, 2021

peterjc123 closed this as completed Jan 10, 2022

Add extra output for inference #23

Add extra output for inference #23

Comments

liamsun2019 commented Dec 30, 2021

liamsun2019 commented Dec 30, 2021

liamsun2019 commented Dec 30, 2021

liamsun2019 commented Dec 30, 2021

liamsun2019 commented Dec 30, 2021

peterjc123 commented Dec 31, 2021

liamsun2019 commented Dec 31, 2021

liamsun2019 commented Jan 5, 2022

liamsun2019 commented Jan 5, 2022

liamsun2019 commented Jan 5, 2022

liamsun2019 commented Jan 5, 2022

liamsun2019 commented Jan 6, 2022

peterjc123 commented Jan 6, 2022

liamsun2019 commented Jan 6, 2022

liamsun2019 commented Jan 6, 2022

peterjc123 commented Jan 6, 2022 • edited

peterjc123 commented Jan 6, 2022 • edited

liamsun2019 commented Jan 6, 2022

peterjc123 commented Jan 6, 2022

liamsun2019 commented Jan 6, 2022

peterjc123 commented Jan 7, 2022 • edited

peterjc123 commented Jan 7, 2022

liamsun2019 commented Jan 7, 2022

peterjc123 commented Jan 7, 2022 • edited

peterjc123 commented Jan 7, 2022 • edited

liamsun2019 commented Jan 7, 2022

peterjc123 commented Jan 7, 2022

peterjc123 commented Jan 7, 2022

liamsun2019 commented Jan 7, 2022

liamsun2019 commented Jan 10, 2022

peterjc123 commented Jan 10, 2022

peterjc123 commented Jan 10, 2022

liamsun2019 commented Jan 10, 2022

peterjc123 commented Jan 10, 2022 • edited

liamsun2019 commented Jan 10, 2022

peterjc123 commented Jan 10, 2022

liamsun2019 commented Jan 10, 2022

peterjc123 commented Jan 10, 2022

liamsun2019 commented Jan 11, 2022

liamsun2019 commented Jan 11, 2022

liamsun2019 commented Jan 11, 2022

peterjc123 commented Jan 11, 2022 • edited

liamsun2019 commented Jan 11, 2022

peterjc123 commented Jan 11, 2022

liamsun2019 commented Jan 11, 2022

peterjc123 commented Jan 6, 2022 •

edited

peterjc123 commented Jan 6, 2022 •

edited

peterjc123 commented Jan 7, 2022 •

edited

peterjc123 commented Jan 7, 2022 •

edited

peterjc123 commented Jan 7, 2022 •

edited

peterjc123 commented Jan 10, 2022 •

edited

peterjc123 commented Jan 11, 2022 •

edited