# Brevitas

Brevitas alows us to train quantized NN. This tool is very useful and future tools used in the course are based on this.

This notebook serves as an introducion to quantizing NNs through the crration (and training) on a fully quantized MNIST classifier.

I opteed for a more robust architecture this time to avoid low precision.

In [1]:
from torch.nn import Module, Flatten
import torch.nn.functional as F

import brevitas.nn as qnn
from brevitas.quant import Int8Bias

### The model itself.

- As it is a fully quantized model, we introduduce a quntidentity to quantize the input (4 bit activation)
- All the data passing through this network will be quantized until the output ass all operation are int

In [2]:
class QuantWeightActBiasLeNet(Module):
    def __init__(self):
        super(QuantWeightActBiasLeNet, self).__init__()
        self.quant_inp = qnn.QuantIdentity(bit_width=4, return_quant_tensor=True)
        self.fc1   = qnn.QuantLinear(28*28, 128, bias=True, weight_bit_width=4, bias_quant=Int8Bias)
        self.relu = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
        self.fc2   = qnn.QuantLinear(128, 10, bias=True, weight_bit_width=4, bias_quant=Int8Bias)


    def forward(self, x):
        out = self.quant_inp(x)
        out = out.reshape(out.shape[0], -1)
        out = self.relu(self.fc1(out))
        out = self.fc2(out)
        return out

quant_weight_act_bias_lenet = QuantWeightActBiasLeNet()


### Some inspections

Lets play around with the layers, see what they have that's so special !

In [4]:
model = QuantWeightActBiasLeNet()
model

QuantWeightActBiasLeNet(
  (quant_inp): QuantIdentity(
    (input_quant): ActQuantProxyFromInjector(
      (_zero_hw_sentinel): StatelessBuffer()
    )
    (act_quant): ActQuantProxyFromInjector(
      (_zero_hw_sentinel): StatelessBuffer()
      (fused_activation_quant_proxy): FusedActivationQuantProxy(
        (activation_impl): Identity()
        (tensor_quant): RescalingIntQuant(
          (int_quant): IntQuant(
            (float_to_int_impl): RoundSte()
            (tensor_clamp_impl): TensorClamp()
            (delay_wrapper): DelayWrapper(
              (delay_impl): _NoDelay()
            )
          )
          (scaling_impl): ParameterFromRuntimeStatsScaling(
            (stats_input_view_shape_impl): OverTensorView()
            (stats): _Stats(
              (stats_impl): AbsPercentile()
            )
            (restrict_scaling): _RestrictValue(
              (restrict_value_impl): FloatRestrictValue()
            )
            (clamp_scaling): _ClampValue(
            

In [5]:
print(model.fc1.weight)
print(model.fc1.quant_weight())
print(model.fc1.quant_weight().int())
print(model.fc1.quant_weight().int().dtype)

Parameter containing:
tensor([[-0.0326, -0.0227, -0.0050,  ...,  0.0253, -0.0092,  0.0329],
        [ 0.0050, -0.0345,  0.0211,  ...,  0.0196, -0.0355, -0.0124],
        [ 0.0065, -0.0054,  0.0175,  ..., -0.0345,  0.0011, -0.0011],
        ...,
        [ 0.0343, -0.0034, -0.0246,  ...,  0.0229, -0.0110, -0.0022],
        [ 0.0330,  0.0281,  0.0260,  ..., -0.0251,  0.0294, -0.0145],
        [ 0.0013, -0.0068, -0.0140,  ..., -0.0218,  0.0356, -0.0237]],
       requires_grad=True)
QuantTensor(value=tensor([[-0.0306, -0.0204, -0.0051,  ...,  0.0255, -0.0102,  0.0306],
        [ 0.0051, -0.0357,  0.0204,  ...,  0.0204, -0.0357, -0.0102],
        [ 0.0051, -0.0051,  0.0153,  ..., -0.0357,  0.0000, -0.0000],
        ...,
        [ 0.0357, -0.0051, -0.0255,  ...,  0.0204, -0.0102, -0.0000],
        [ 0.0306,  0.0306,  0.0255,  ..., -0.0255,  0.0306, -0.0153],
        [ 0.0000, -0.0051, -0.0153,  ..., -0.0204,  0.0357, -0.0255]],
       grad_fn=<MulBackward0>), scale=tensor(0.0051, grad_fn=<Div

### Training and testing

sameprinciples as studied previously

In [6]:
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Data preparation
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform, download=True)

train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=100, shuffle=False)



In [7]:
from torch import nn
import torch.optim as optim

# Model, loss function, and optimizer
model = QuantWeightActBiasLeNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [8]:
# Training loop
for epoch in range(5):  # Train for 5 epochs
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
    
    print(f'Epoch {epoch+1}, Loss: {loss.item()}')



  return super(Tensor, self).rename(names)


Epoch 1, Loss: 0.0402638241648674
Epoch 2, Loss: 0.15108023583889008
Epoch 3, Loss: 0.05071258917450905
Epoch 4, Loss: 0.02540937438607216
Epoch 5, Loss: 0.11885809898376465


In [9]:
# Testing loop
import torch
model.eval()
correct = 0
with torch.no_grad():
    for data, target in test_loader:
        output = model(data)
        pred = output.argmax(dim=1, keepdim=True)
        correct += pred.eq(target.view_as(pred)).sum().item()

accuracy = 100. * correct / len(test_loader.dataset)
print(f'Test Accuracy: {accuracy:.2f}%')


Test Accuracy: 97.52%


As we can see, the accuracy dropped compared to last example, but only ~0.1%

# Learn more

You can learn more about quantizing your model here : [Quant getting started](https://xilinx.github.io/brevitas/getting_started.html)

This documentation will introduction you to weight-only quantization all the way to full quantization in a simple lighthearted way .

We will also have a lot of tie during the lab where we'll take time to slow down and look at what's happenning. Stay tuned !