##### Dequantization: What is it?

Dequantization is the process of converting already quantized values back to their floating-point values after the model makes its predictions on what it is working with. Starting with floating-point values, these values are more accurate and precise, while uint values are slightly less accurate. There's a caveat, however. Floating-point values take up more memory than uint values on account of being more precise. In the process of dequantization, the model makes its predictions with the uint values and converts them back to floating-point. This acts as a way to save memory, while also getting accurate results as the results will all be accurate (or as accurate as they can be) at the end of each inference.

In [1]:
import numpy as np
# Dequantization Formula example (These are all example values)
output_data = np.array([130, 128, 140], dtype=np.uint8)
scalar = 0.00390625
zero_point = 128

final_float_values = (output_data - zero_point) * scalar

A softmax is applied to these output values which has a maximum of 1.0, or 100% after converting the softmax values to a percentage by multiplying by 100; this is how the dequantization to confidence value process works. Below is an example of how to use numpy to get the softmax of an array of raw output data.

**Note**:  These values should sum up to about 1.0, or as stated above, 100 when converted to percentages. If it isn't 1, then it should be VERY close to it otherwise the data isn't accurate and something is wrong.

In [5]:
import numpy as np
# Softmax
def softmax(raw_data):
    logits = np.exp(raw_data - np.max(raw_data))
    return logits / logits.sum()


logits = np.array([2.0, 1.0, 0.1])
probability = softmax(logits)
print(probability, probability.sum())

[0.65900114 0.24243297 0.09856589] 1.0
