New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extra output for inference #23
Comments
source code snippet: center = ret[head] where ret['filtered_hm'] is one of the extra outputs. The error message is supposed to be related that. |
If I changed the codes like following: Another error rises up: |
Above experiments are based on the recent version. |
BTW,the following line better rewritten as: to avoid being overridden. It does not influence the experiment results. |
This is because the graph rewriter for quantization doesn't properly handle type casting functions like The diff to the model I made it to work is shown below. 317c317,318
< mul_1 = self.float_functional_simple_13.mul(hm_3, float_1)
---
> fake_dequant_0 = self.fake_dequant_0(hm_3)
> mul_1 = fake_dequant_0 * float_1
340,341c341,342
< fake_dequant_0 = self.fake_dequant_0(hm_3)
< fake_dequant_1 = self.fake_dequant_1(mul_1)
---
> # fake_dequant_1 = self.fake_dequant_1(mul_1)
> fake_dequant_1 = mul_1
The error message is legit. For symmetric QAT, you must train the model for at least one iteration or invoke the from movenet_qat import MoveNet_qat
model = MoveNet_qat()
dummy_input = torch.ones((1, 3, 256, 256), dtype=torch.float32)
# QAT prep
quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'rewrite_graph': False, ...})
qat_model = quantizer.quantize()
# Invoke once
qat_model(dummy_input)
# Conversion goes here |
Big Thanks. I'll try it out later and let you know the results. |
One more question, is there a simple way for forward to output different tensors under different conditions? For instance, I need o1, o2 for training while o3 and o4 for inference. In the stage of conversion to tflite, I just want o3 and o4 to be output. Do I have to manually edit the .py file to achieve this ? |
The error message is legit. For symmetric QAT, you must train the model for at least one iteration or invoke the forward function of the model at least once, otherwise the zero point will remain zero, which leads to this error. ==> Based on my experiments, it shows that this error only rises up after I add some extra outputs to the forward operation. If I remove these extra outputs, the error disappear. I set max epoch to 2 to conduct the experiments. |
My script attached |
My simple guess is since the added outputs do not join in training, so the tracer(just guess) might not trace them correctly. |
Any updates? |
Sorry for late reply, we were working on something else.
As can be seen in
The problem is already covered in our FAQ. So, the brief answer is yes, because it's how tracing works. |
Got it, I will try the proposed way in FAQ. Big thanks for your help. |
I followed the method in FAQ:
The tflite can be generated. But the weights/bias in it are already converted to float32. I actually need a QAT tflite whose weights/bias are supposed to be int8/int32. How could I achieve it? |
@liamsun2019 You need to go through quantizer again although your model is already QAT rewritten (because a QAT rewritten model is still a float model, not a quantized one). qat_model = MoveNet_qat()
qat_model.load_state_dict(torch.load('qat_last_model.pth'), strict=False)
dummy_input = torch.ones((1, 3, 256, 256), dtype=torch.float32)
quantizer = QATQuantizer(qat_model, dummy_input, work_dir='out', config={'rewrite_graph': False, ...})
qat_model = quantizer.quantize() |
if name == "main":
The following error rises up: I do not need training anymore but looks like that I have to. Any suggestions? |
@liamsun2019 Would you please share the code of the class |
Sure, FYR |
@liamsun2019 I can reproduce locally. Looking into it now. |
It seems the problem is on |
A tensor with the qscheme of per tensor and symmetric will become per tensor and affine after running through this op, so we need to insert a (re)quantize op after it. |
Just wonder what your target platform is. Take NNAPI as an example, it supports the common qschemes:
For ops that support per channel (e.g. Conv2D), you should use (4) for the weight and (2) or (3) for the input. As for other ops, you use (2) or (3). But I don't the support for (2) is broad enough, usually there is only support for (3). |
The activations are always using the per tensor qscheme. That's why I ask the previous question. If your target platform has support for (2), then we may just cancel the limitation. But if it only supports (3), then we need to insert the requantize nodes during QAT graph rewriting. |
For per-channel QAT,our target platform supports asymmetric_affine int8 for activation. And weights supports perchannel_symmetric_affine int8 |
OK, we will work on it. |
@liamsun2019 I've uploaded the related changes. You may have to use the following line instead for defining the converter object. converter = TFLiteConverter(qat_model, dummy_input, tflite_path="ohyeah.tflite", asymmetric=True, quantize_target_type='int8') |
OK. Will update and try it out. |
I tried the recent version and can get the outputs for inference. Thanks a lot. |
@liamsun2019 Looks like this issue is resolved. I'll close it. Please feel free to open a new issue when you encounter new problems. Again, thanks for supporting our project. |
@liamsun2019 FYI, we've decoupled |
My understanding is that you support asym per-channel QAT now, right ? |
Yes. But actually this should be Previously, we have |
I just tried with config: quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'backend': "qnnpack", 'force_overwrite': True, 'asymmetric': True, 'per_tensor': False, 'rewrite_graph': True}) and converter: The following error is output: With above message, looks like some pytorch ops do not support asym per-channel. As what you mentioned before, all the activations only suppport per-tensor qat, so probably asym per-channel schema cannot be applied to my case. |
@liamsun2019 Please try the following configuration. quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'backend': "qnnpack", 'force_overwrite': True, 'asymmetric': True, 'per_tensor': False, 'rewrite_graph': True})
converter = TFLiteConverter(qat_model, dummy_input, tflite_path='out' + '/qat_model.tflite', asymmetric=True, quantize_target_type='int8') |
Yes, it works this way and the resulted QAT tflite seems to always hold weights in int8 data type. |
@liamsun2019 As far as I know, the model with u8 weights or inputs doesn't support per-channel quantization. |
I got it. But, based on the recent version, I encounterd following error when doing symmetric per-channel QAT which is fine with old version: WARNING (tinynn.converter.base) Symmetric quantized model with uint8 is unsupported in most backends of TFLite I wonder if something wrong degrades the latest codes? My config is: quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'backend': "qnnpack", 'force_overwrite': True, 'asymmetric': False, 'per_tensor': False, 'rewrite_graph': True}) converter = TFLiteConverter(qat_model, dummy_input, tflite_path='out' + '/qat_model.tflite', asymmetric=False) I wonder if the usage is changed or not. Above config works well for old version. |
One thing needs to be pointed out that I used the exactly same outputs for training and inference for above experiment. |
Looks like that I need to set quantize_target_type=int8 explicitly which is not a must before. |
Yes, the changes in 9b656ce are not backward compatible. You have to do it now when you use per-channel quantization. |
@liamsun2019 Just curious how you get the model? It seems that you use onnx2tf and then TFLiteConverter from official TF to get the first model. |
Big thanks for your detailed illustration. I made a mistake when introducing the 2 graphs. In fact, the 1st graph comes from a tflite model that's not in QAT representation but I took for that it's QAT. |
Hi Author,
I need to add some extra output tensors which are used for inference. These tensors are not referenced during training but just for inference after the conversion to tflite. My naive intention is to put some operations in forward as many as possible so as to relief the loading of post processing which has to be implemented by c/c++ code.
For instance, some matrix ops such as reshape/sigmoid/multiply better be done by GPU/NPU instead of with CPU.
I add some logic in forward to implement this requirement and the training goes well but the conversion to tflite fails with following error message:
File "/usr/local/lib/python3.6/dist-packages/torch/nn/quantized/modules/functional_modules.py", line 160, in mul
r = ops.quantized.mul(x, y, scale=self.scale, zero_point=self.zero_point)
RuntimeError: Mul operands should have same data type.
Is there any feasible way or workaround for this scenario?
The script attached. Thanks.
movenet_qat.zip
The text was updated successfully, but these errors were encountered: