# Quantization for Neural Networks

After the small asymetric quantization example, In this notebook, we will see how to quantize a Neural Network (NN).

## Post Training Quantization (PTQ)

The PQT will involve training a regular model and then quantizing it.

To do so, we will use observer to determine alpha, beta, scale and zero factors, whilst simply running inference. Just like we did in the f32 to int8 vector quantization example.

This will be done using pytorch.

## Quantization Aware Training (QAT)

For this, you will have to wait until the the next lecture, where we will use Brevitas, a superset of pytorch, to do QAT

In [2]:
import torch
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import torch.nn as nn
import matplotlib.pyplot as plt
_ = torch.manual_seed(0)

# NOTE BEFORE GOING FURTHER

At this point in the lecture, student were not familiarized with the PyTorch so we will not enter in the details here. The goal here is to get an idea on how we can quantize a NN after training and the effects of such a thing on the model precision.

In [3]:
# IMPORT THE DATA
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Data preparation
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform, download=True)

train_loader = DataLoader(dataset=train_dataset, batch_size=100, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=100, shuffle=False)

In [4]:
# DEFINE THE MODEL
# This example will be more elaborated in the second lecture, along side a full QAT example in Brevitas
# it is okay not to understand everythong herre at this point

class SimpleClassifier(nn.Module):
    def __init__ (self):
        super(SimpleClassifier, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(28*28, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )

    def forward(self, x):
        return self.model(x)

In [6]:
# DECLARE THE MODEL AND OPTIMIZATION PARAMETERS
import torch.optim as optim

model = SimpleClassifier()
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [7]:
# TRAIN THE MODEL
for epoch in range(5):
    for i, (images, labels) in enumerate(train_loader):
        # Flatten the image
        images = images.reshape(-1, 28*28)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    
    print(f'Epoch [{epoch+1}/5], Loss: {loss.item():.4f}')

print("Training finished!")

Epoch [1/5], Loss: 0.0129
Epoch [2/5], Loss: 0.0082
Epoch [3/5], Loss: 0.0080
Epoch [4/5], Loss: 0.3438
Epoch [5/5], Loss: 0.0090
Training finished!


In [11]:
# Testing loop
import torch
model.eval()
correct = 0
with torch.no_grad():
    for data, target in test_loader:
        # Flatten the image
        data = data.reshape(-1, 28*28)
        output = model(data)
        pred = output.argmax(dim=1, keepdim=True)
        correct += pred.eq(target.view_as(pred)).sum().item()

accuracy = 100. * correct / len(test_loader.dataset)
print(f'Test Accuracy: {accuracy:.2f}%')

Test Accuracy: 97.28%


# NOW LET'S ANALYSE !

In [13]:
import os

# GET MODEL SIZE
def get_size(model):
    torch.save(model.state_dict(), "model_before_PTQ.p")
    size = os.path.getsize("model_before_PTQ.p")/1e3
    os.remove("model_before_PTQ.p")
    return(size)

print("size of the model before PTQ : ", get_size(model), "KB")

# GET MODEL DATA (note : we'll see this is a different API call for QAT using Brevitas)


size of the model before PTQ :  408.789 KB
