# What is quantized?

"Quantized" in the PyTorch context refers to a technique that converts floating-point numerical representations into lower-precision, typically integer, representations. This process is especially useful in deep learning and neural network deployment on devices with limited resources (such as mobile phones or embedded systems) because it:

1.  Reduces Memory Footprint: Lower bit-width numbers (e.g., 8-bit integers instead of 32-bit floats) use significantly less memory.

2.  Speeds Up Computations: Operations on lower-precision arithmetic can be faster, especially if the underlying hardware supports these operations natively.

3.  Potentially Decreases Power Consumption: Lower precision arithmetic is generally more energy efficient, essential for power-constrained environments.

### How Quantization Works
Quantization involves mapping a wide range of floating-point values to a smaller set of integer values using two key parameters:

-   Scale: A floating-point multiplier that determines how a change in the integer value corresponds to a change in the original floating-point value.

-   Zero Point: An integer that ensures that zero in the floating-point range is exactly representable in the quantized space.

### Types of Quantization in PyTorch
PyTorch supports several quantization techniques:

-   Post-Training Quantization: Once a model has been trained, you can convert its weights (and sometimes activations) from floating-point to quantized representations. This method doesn't require retraining.

-   Dynamic Quantization: Weights are quantized ahead of time, but activations are quantized on the fly during inference. This is particularly effective for models with large fully connected layers.

-   Quantization-Aware Training (QAT): Here, the model is trained with simulated quantization effects so that it learns to handle the reduced numerical precision. This often leads to better accuracy in the final quantized model.

# We learn later more about quantized tensors.