# Linear Quantization

Linear quantization is a process used to reduce the number of bits required to represent numbers in a model, thereby reducing the model size and speeding up computation. It involves mapping a continuous range of values (like floating-point numbers) to a discrete set of values (like integers). This is particularly useful in deploying models on resource-constrained devices where memory and compute power are limited.

<figure>
    <img src="https://raw.githubusercontent.com/arkeodev/pytorch-tutorial/main/Quantization/images/symmetric_vs_asymetric_linear_quantization.png" width="300" height="400">
    <figcaption>Symmetric vs Asymmetric Linear Quantization</figcaption>
</figure>

### Key Concepts in Linear Quantization

1. **Scale and Zero Point:**
   - **Scale:** This is a factor used to map the floating-point range to the integer range. It determines the granularity of the quantized values.
   - **Zero Point:** This is the integer value that corresponds to the zero value in the floating-point range. It allows the representation of negative numbers when using unsigned integers.

2. **Quantization Formula:**
   $$
   Q_x = \text{round}\left(\frac{x}{\text{scale}} + \text{zero\_point}\right)
   $$
   Here, $( x )$ is the original floating-point value, $( Q_x )$ is the quantized integer value, and the `round` function rounds the result to the nearest integer.

3. **Dequantization Formula:**
   $$
   x = \text{scale} \times (Q_x - \text{zero\_point})
   $$
   This formula maps the quantized integer back to the original floating-point value.

### Symmetric Quantization

In symmetric quantization, the quantization range is symmetric around zero. This means that the range of positive and negative values is equal, and the zero point is typically zero. Symmetric quantization is simple and computationally efficient because the quantization and dequantization processes do not require adding a zero point.

- **Formula:**
  $$
  Q_x = \text{round}\left(\frac{x}{\text{scale}}\right)
  $$
  $$
  x = \text{scale} \times Q_x
  $$

- **Advantages:**
  - Easier to implement and understand.
  - Requires fewer calculations (no need for zero point adjustment).
  - Often sufficient for weights in neural networks where the distribution is roughly symmetric around zero.

- **Disadvantages:**
  - May not be efficient for data where the distribution is not symmetric around zero, potentially leading to a loss of precision.

### Asymmetric Quantization

In asymmetric quantization, the quantization range is not symmetric around zero. This allows for a better fit for data distributions that are skewed or do not center around zero. The zero point is a non-zero integer value, enabling the representation of both positive and negative ranges more effectively.

- **Formula:**
  $$
  Q_x = \text{round}\left(\frac{x}{\text{scale}} + \text{zero\_point}\right)
  $$
  $$
  x = \text{scale} \times (Q_x - \text{zero\_point})
  $$

- **Advantages:**
  - More flexible and can handle a wider range of data distributions.
  - Better suited for activations in neural networks where data can be non-symmetric and skewed.

- **Disadvantages:**
  - More complex to implement.
  - Requires additional calculations for the zero point adjustment.

### Example in PyTorch

Here's how you can implement symmetric and asymmetric quantization in PyTorch:

In [None]:
import torch

# Example tensor
tensor = torch.tensor([1.2123, 2.3535, -1.1674, -2.4335, 0.5444, -0.3590])

# Symmetric Quantization
def symmetric_quantize(tensor, scale):
    return torch.round(tensor / scale)

def symmetric_dequantize(tensor, scale):
    return tensor * scale

scale = 0.1
quantized_tensor_sym = symmetric_quantize(tensor, scale)
dequantized_tensor_sym = symmetric_dequantize(quantized_tensor_sym, scale)

print("Symmetric Quantization")
print("Original tensor:", tensor)
print("Quantized tensor:", quantized_tensor_sym)
print("Dequantized tensor:", dequantized_tensor_sym)

# Asymmetric Quantization
def asymmetric_quantize(tensor, scale, zero_point):
    return torch.round(tensor / scale + zero_point)

def asymmetric_dequantize(tensor, scale, zero_point):
    return scale * (tensor - zero_point)

scale = 0.1
zero_point = 128
quantized_tensor_asym = asymmetric_quantize(tensor, scale, zero_point)
dequantized_tensor_asym = asymmetric_dequantize(quantized_tensor_asym, scale, zero_point)

print("\nAsymmetric Quantization")
print("Original tensor:", tensor)
print("Quantized tensor:", quantized_tensor_asym)
print("Dequantized tensor:", dequantized_tensor_asym)

Symmetric Quantization
Original tensor: tensor([ 1.2123,  2.3535, -1.1674, -2.4335,  0.5444, -0.3590])
Quantized tensor: tensor([ 12.,  24., -12., -24.,   5.,  -4.])
Dequantized tensor: tensor([ 1.2000,  2.4000, -1.2000, -2.4000,  0.5000, -0.4000])

Asymmetric Quantization
Original tensor: tensor([ 1.2123,  2.3535, -1.1674, -2.4335,  0.5444, -0.3590])
Quantized tensor: tensor([140., 152., 116., 104., 133., 124.])
Dequantized tensor: tensor([ 1.2000,  2.4000, -1.2000, -2.4000,  0.5000, -0.4000])
