Quantizing ONNX-compliant model to Q7 for use with CMSIS-NN functions #963

dtch1997 · 2020-07-09T20:05:11Z

This is a question rather than an issue. I don't really know anything about quantization, need to use it for an embedded systems project. Would be happy to learn from more experienced folks.

I'm attempting to convert an ONNX model into equivalent CMSIS-NN C++ code. I'm following this tutorial: https://developer.arm.com/solutions/machine-learning-on-arm/developer-material/how-to-guides/converting-a-neural-network-for-arm-cortex-m-with-cmsis-nn/single-page

Right now I'm doing quantization. I used the quantization script at https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/quantization to quantize model into Q7 format. This converts a float32 weight into an int8 weight, and float32 scale and zero-point values which can be used to (approximately) reconstruct the original weights.

My questions are:

Does CMSIS-NN need the scale and zero-point values in function calls (e.g. arm_convolve_HWC_q7_fast), and if not, then why not?
Does CMSIS-NN actually need the scale and zero-point of intermediate activations? Why (according to the tutorial) do these need to be quantized as well?
Why (according to tutorial) do I need to "compute" the Q-format? What's wrong with just forcing everything to be Q7?

The text was updated successfully, but these errors were encountered:

majianjia · 2020-07-10T18:50:18Z

To answer your questions

the one you quoted arm_convolve_HWC_q7_fast() does not support zero-point offset and only support scale which is the power of 2 (internally done by shifting).
However, the ones with _s suffix support zero-point offset and scales. (Current and later development, compatible with TensorFlow lite micro)
From my experiences, zero points are normally very close to zero (most cases < 10). And I see not much in the accuracy differences. So I think zero-point offset it is not necessary. Scales are the most important part because it extends the range of q7 to whatever it is.
You need to compute them to find a balance, which provides the maximum resolution and still contains the maximum number in a batch of data (a weight, a bias, or an activation).
In fact, the whole quantization is all about it. There are different strategies, such as saturated (contain the max(abs(data))) or non-saturated (allow some extreme to be saturated).

When you said 'forcing everything to be Q7', you implicitly considered the range of -128 to 127, which is an int8 or a Q7.0.
For example, your data is in range -1 to 1, if you force them to int8, the range of -128 -127 is lost. in this case, you will need a Q0.7 to store them, with 1 represented by the -128, then now you have 128 levels for -1 to 0.

There are many other Q7 formats (Q3.4, Q1.6 ...). And yes, you need to compute them to see which is best fitting your data.

However, you still can force all data to a fixed point, sometimes it sometimes can provide fair results in not-so-deep networks.

dtch1997 · 2020-07-14T17:01:57Z

@majianjia thanks, I think I understand a bit better now. A major source of my confusion was that I thought Q-format refers to 8 bit or 16 bit. But actually the Q-format includes the position of the decimal point as well. In this light it makes sense to "compute the Q-format" as we want to know where to put the decimal point.

My next question is: Suppose I have a quantized layer, e.g. Conv. The activations and weights are quantized and the scale is a power of two for each. Then how do I convert this to a CMSIS-NN layer? If I use arm_convolve_HWC_q7_fast(), what should I provide for bias_shift and out_shift?

majianjia · 2020-07-14T18:02:01Z

@dtch1997
This is the most confusing part when people first looking at it.
This answer #327 (comment) is the original answer by the original author of this CMSIS-NN.

If you search my id in the issues, there are other related topics that might help.

A simple explanation is these 2 arguments are not stood by themself, but also stand for the relation between input, output, weights and bias, and their Q-formats.

Detail:
Remember when 2 Q-format data are multiplying, the result's decimal bits and integer bits are both the sum of the 2 data.
Q3.4 x Q2.5 = Q5.7 (which is a Q12)
Then cut Q5.7 -> Q5.2 by right shift and store in and Q7.

What you can see from here is we only use the decimal bits to do the calculation because in Q7, always have integer bits + decimal bit = 7.

A convolutional or fully connected layer can be explained in
Ouptu = Input * Weight + Bias;

For example, if you have data like these:
Input data is Q0.7
Weights are Q3.4
Biases are Q1.6
Output data is Q2.5

For multiplication (Input*Weights),
Output (right) shift = input_decimal + weight_decimal - output_decimal
7+4-5 = 6

For addition (Bias), Both data must be converted to the same Q format to be added together.
In most of the cases, biases are smaller values comparing to the output. (Will lead to negative left shift, but has a workaround.)
Bias (left) shift = bias_decimal - output_decimal
6 - 5 = 1

Per the 2 equations, for you to determine both arguments, you will need the Q format of all input, weights, bias, and output.
You could acquire the Q format of weights and bias directly by their array.
For the input and output, you will need to run the model layer-by-layer for some test data (i.e. some images, or sensor data), then get the output of each layer to determine the Q format of them.

dtch1997 · 2020-07-20T17:42:15Z

Okay I think I finally understood. The last source of my confusion is that CMSIS-NN quantization is different from other frameworks' quantization.

In Tensorflow or PyTorch, values are quantized by choosing a "scale" and "zero point". In CMSIS-NN values are quantized by simply rounding the number to the nearest fixed-point number. @majianjia maybe this can be added to documentation somewhere.

majianjia · 2020-07-20T17:47:26Z

Okay I think I finally understood. The last source of my confusion is that CMSIS-NN quantization is different from other frameworks' quantization.

In Tensorflow or PyTorch, values are quantized by choosing a "scale" and "zero point". In CMSIS-NN values are quantized by simply rounding the number to the nearest fixed-point number. @majianjia maybe this can be added to documentation somewhere.

You may check the new part of CMSIS-NN, thoes file names ended with _s suffix. They are supporting the zero and scales. I am not the ARM team member but there are some tutorials explaining the numbers

JonatanAntoni added NN review labels Jul 14, 2020

dtch1997 closed this as completed Jul 20, 2020

Alex-Riviello mentioned this issue May 8, 2022

is this project running on NUCLEO f446re ? Alex-Riviello/KWS_MCU#1

Open

Sudo42b mentioned this issue Mar 6, 2024

Inconsistency of CMSIS-NN Quantization Method(Q-format) with ARM Documentation ARM-software/CMSIS-NN#115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantizing ONNX-compliant model to Q7 for use with CMSIS-NN functions #963

Quantizing ONNX-compliant model to Q7 for use with CMSIS-NN functions #963

dtch1997 commented Jul 9, 2020

majianjia commented Jul 10, 2020 •

edited

Loading

dtch1997 commented Jul 14, 2020 •

edited

Loading

majianjia commented Jul 14, 2020 •

edited

Loading

dtch1997 commented Jul 20, 2020

majianjia commented Jul 20, 2020

Quantizing ONNX-compliant model to Q7 for use with CMSIS-NN functions #963

Quantizing ONNX-compliant model to Q7 for use with CMSIS-NN functions #963

Comments

dtch1997 commented Jul 9, 2020

majianjia commented Jul 10, 2020 • edited Loading

dtch1997 commented Jul 14, 2020 • edited Loading

majianjia commented Jul 14, 2020 • edited Loading

dtch1997 commented Jul 20, 2020

majianjia commented Jul 20, 2020

majianjia commented Jul 10, 2020 •

edited

Loading

dtch1997 commented Jul 14, 2020 •

edited

Loading

majianjia commented Jul 14, 2020 •

edited

Loading