# L2-B - Linear Quantization I: Get the Scale and Zero Point

In this lesson, continue to learn about fundamentals of linear quantization, and implement your own Linear Quantizer.

Run the next cell to import all of the functions you have used before in the previous lesson(s) of `Linear Quantization I` to follow along with the video.

- To access the `helper.py` file, you can click `File --> Open...`, on the top left.

In [2]:
import torch

from helper import linear_q_with_scale_and_zero_point, linear_dequantization, plot_quantization_errors

### a dummy tensor to test the implementation
test_tensor=torch.tensor(
    [[191.6, -13.5, 728.6],
     [92.14, 295.5,  -184],
     [0,     684.6, 245.5]]
)

## Finding `Scale` and `Zero Point` for Quantization

In [3]:
q_min = torch.iinfo(torch.int8).min
q_max = torch.iinfo(torch.int8).max

In [4]:
q_min

-128

In [5]:
q_max

127

In [6]:
# r_min = test_tensor.min()
r_min = test_tensor.min().item()

In [7]:
r_min

-184.0

In [8]:
r_max = test_tensor.max().item()

In [9]:
r_max

728.5999755859375

In [11]:
scale = (r_max - r_min) / (q_max - q_min)

In [12]:
scale

3.578823433670343

In [13]:
zero_point = q_min - (r_min / scale)

In [14]:
zero_point

-76.58645490333825

In [15]:
zero_point = int(round(zero_point))

In [16]:
zero_point

-77

- Now, put all of this in a function.

In [17]:
def get_q_scale_and_zero_point(tensor, dtype=torch.int8):
    
    q_min, q_max = torch.iinfo(dtype).min, torch.iinfo(dtype).max
    r_min, r_max = tensor.min().item(), tensor.max().item()

    scale = (r_max - r_min) / (q_max - q_min)

    zero_point = q_min - (r_min / scale)

    # clip the zero_point to fall in [quantized_min, quantized_max]
    if zero_point < q_min:
        zero_point = q_min
    elif zero_point > q_max:
        zero_point = q_max
    else:
        # round and cast to int
        zero_point = int(round(zero_point))
    
    return scale, zero_point

- Test the implementation using the `test_tensor` defined earlier.
```Python
[[191.6, -13.5, 728.6],
 [92.14, 295.5,  -184],
 [0,     684.6, 245.5]]
```

In [18]:
new_scale, new_zero_point = get_q_scale_and_zero_point(
    test_tensor)

In [19]:
new_scale

3.578823433670343

In [20]:
new_zero_point

-77

## Quantization and Dequantization with Calculated `Scale` and `Zero Point`

- Use the calculated `scale` and `zero_point` with the functions `linear_q_with_scale_and_zero_point` and `linear_dequantization`.

In [None]:
quantized_tensor = linear_q_with_scale_and_zero_point(
    test_tensor, new_scale, new_zero_point)

In [None]:
dequantized_tensor = linear_dequantization(quantized_tensor,
                                           new_scale, new_zero_point)

- Plot to see how the Quantization Error looks like after using calculated `scale` and `zero_point`.

In [None]:
plot_quantization_errors(test_tensor, quantized_tensor, 
                         dequantized_tensor)

In [None]:
(dequantized_tensor-test_tensor).square().mean()

The original tensor and the dequantized tensor are very similar.
The quantization error tensor looks also way much better (around 1.5 instead of 170 earlier, when considering random scale and zero point numbers.)

### Put Everything Together: Your Own Linear Quantizer

- Now, put everything togther to make your own Linear Quantizer.

The linear quantization function will only take a tensor and will return to you the quantized tensor, the scale and the zero point.
In this function, we will use the two functions that we coded before.

So, to get q_scales and the zero point, we just call the "get_q_scales_and_zero_point" function and we pass the tensor and also the d-type.

Then after getting the scale and the zero point, we can perform the quantization of the tensor.

We use the "linear_q_scale_and_zero_point" function we've coded before. We pass the tensor and the scale, the zero point, and the d-type.

The "linear_quantization" function returns the quantized tensor, the scale and the zero point.

In [None]:
def linear_quantization(tensor, dtype=torch.int8):
    # Given the tensor and desired quantized dtype, calculate the scale and zero point
    scale, zero_point = get_q_scale_and_zero_point(tensor, 
                                                   dtype=dtype)
    
    # Given the tensor, scale, zero point and desired quantized dtype, calculate the quantized tensor
    quantized_tensor = linear_q_with_scale_and_zero_point(tensor,
                                                          scale, 
                                                          zero_point, 
                                                          dtype=dtype)
    
    return quantized_tensor, scale , zero_point

- Test your implementation on a random matrix.

In [21]:
r_tensor = torch.randn((4, 4))

In [22]:
r_tensor

tensor([[ 0.7373, -0.5947,  1.1786,  0.6481],
        [ 1.0361, -0.2536, -1.5978,  1.3848],
        [-0.8686, -0.7569, -0.3600, -0.8526],
        [ 0.0768,  1.4115, -0.9894,  0.3041]])

In [None]:
quantized_tensor, scale, zero_point = linear_quantization(r_tensor)

In [None]:
quantized_tensor

In [None]:
scale

In [None]:
zero_point

In [None]:
dequantized_tensor = linear_dequantization(quantized_tensor,
                                           scale, zero_point)

In [None]:
plot_quantization_errors(r_tensor, quantized_tensor,
                         dequantized_tensor)

In [None]:
(dequantized_tensor-r_tensor).square().mean()