# Post-Training Quantization

Let's start with importing some libraries that we will need for this tutorial.

In [1]:
import torch
import torchvision
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
import torch.optim as optim
from torchao.quantization import quantize_, Int8WeightOnlyConfig
from torchao.utils import get_model_size_in_bytes
import argparse
import os
import copy

W0917 14:40:27.716000 88795 torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.


If you're running into issues importing these libraries, check which python env the jupyter kernel is using!

In [2]:
import sys
print(sys.executable)

/usr/local/bin/python3


### Formatting the Training Dataset

We will start by normalizing our MNIST dataset with the pre-calculated mean values

In [5]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = datasets.MNIST(
    root='./data', train=True, download=True, transform=transform
)
test_dataset = datasets.MNIST(
    root='./data', train=False, download=True, transform=transform
)

100%|██████████| 9.91M/9.91M [00:10<00:00, 968kB/s] 
100%|██████████| 28.9k/28.9k [00:00<00:00, 408kB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 3.61MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 4.45MB/s]


Then we'll load it into these data loaders for easy access to loading the dataset when the time comes.

In [7]:
batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

### Neural Network Class

Let's define our neural network class now! We'll keep it super simple with 2 layers. Since each image is 28x28 pixels, we will have our input be of dimension 28x28. We want to output which digit the image is and since there are 10 different possible digits, we will use an output of dimension 10.

In [8]:
class Network(torch.nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.fc1 = torch.nn.Linear(28*28, 128)
        self.fc2 = torch.nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = self.fc1(x)
        x = torch.relu(x)
        x = self.fc2(x)
        return x

### Training the Model

Now that we have defined our model, let's train it using the dataset and see what the type and format of the weights are. (Expecting 32-bit floating point numbers)

In [9]:
learning_rate = 1e-4
epochs = 200

In [10]:
model_fp32 = Network()
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.SGD(model_fp32.parameters(), lr=learning_rate)

In [11]:
for epoch in range(epochs):
    model_fp32.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model_fp32(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

    if epoch % 10 == 0 or epoch == epochs - 1:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

Epoch [1/200], Loss: 2.1704
Epoch [11/200], Loss: 0.8577
Epoch [21/200], Loss: 0.4357
Epoch [31/200], Loss: 0.6525
Epoch [41/200], Loss: 0.4972
Epoch [51/200], Loss: 0.2630
Epoch [61/200], Loss: 0.5114
Epoch [71/200], Loss: 0.3419
Epoch [81/200], Loss: 0.3347
Epoch [91/200], Loss: 0.2990
Epoch [101/200], Loss: 0.2004
Epoch [111/200], Loss: 0.3688
Epoch [121/200], Loss: 0.4618
Epoch [131/200], Loss: 0.5337
Epoch [141/200], Loss: 0.2206
Epoch [151/200], Loss: 0.4631
Epoch [161/200], Loss: 0.1493
Epoch [171/200], Loss: 0.3395
Epoch [181/200], Loss: 0.3261
Epoch [191/200], Loss: 0.1842
Epoch [200/200], Loss: 0.4990


In [12]:
print(model_fp32)

Network(
  (fc1): Linear(in_features=784, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=10, bias=True)
)


Now, lets quantize the model to INT8 and see what happens to the weights. 

In [13]:
model_int8 = copy.deepcopy(model_fp32)
quantize_(model_int8, Int8WeightOnlyConfig())

In [14]:
print(model_int8)

Network(
  (fc1): Linear(in_features=784, out_features=128, weight=AffineQuantizedTensor(shape=torch.Size([128, 784]), block_size=(1, 784), device=cpu, _layout=PlainLayout(), tensor_impl_dtype=torch.int8, quant_min=None, quant_max=None))
  (fc2): Linear(in_features=128, out_features=10, weight=AffineQuantizedTensor(shape=torch.Size([10, 128]), block_size=(1, 128), device=cpu, _layout=PlainLayout(), tensor_impl_dtype=torch.int8, quant_min=None, quant_max=None))
)


### Evaluating the Model

So far, we have trained our model using 32-bit floating point weights/biases and created a quantized version using INT8. Let's see what the accuracy change is between the two and how much the model size has changed as well!

In [15]:
model_fp32.eval()

fp32_correct = 0
fp32_total = 0

with torch.no_grad():
    for data, target in test_loader:
        output = model_fp32(data)
        _, predicted = torch.max(output.data, 1)
        fp32_total += target.size(0)
        fp32_correct += (predicted == target).sum().item()

In [16]:
print(f"Test Accuracy FP32: {100 * fp32_correct / fp32_total:.2f}%")
print(f"FP32 SIZE: {get_model_size_in_bytes(model_fp32) / 1e6:.2f} MB")

Test Accuracy FP32: 92.70%
FP32 SIZE: 0.41 MB


In [17]:
model_int8.eval()

int8_correct = 0
int8_total = 0

with torch.no_grad():
    for data, target in test_loader:
        output = model_int8(data)
        _, predicted = torch.max(output.data, 1)
        int8_total += target.size(0)
        int8_correct += (predicted == target).sum().item()

In [18]:
print(f"Test Accuracy INT8: {100 * int8_correct / int8_total:.2f}%")
print(f"INT8 SIZE: {get_model_size_in_bytes(model_int8) / 1e6:.2f} MB")

Test Accuracy INT8: 92.68%
INT8 SIZE: 0.10 MB


### Results

From the comparison, we see that the accuracy reduced slightly but the model size has decreased heavily. This is due to the fact that we have reduced the precision from FP32 to INT8 which is around a reduction of 4x in bytes. Our model sizes reflect this accurately. 