# Introduction

In this notebook I aim to do the following things:
- Train a deep learning model on images of chest X-rays.
- Compress the model  using quantization.
- Discuss between the compressed and uncompressed models.

# Dataset

I have used the [Tuberculosis (TB) Chest X-ray Database](https://www.kaggle.com/datasets/tawsifurrahman/tuberculosis-tb-chest-xray-dataset) from Kaggle.

It contains X-ray images of normal and TB infected chest.

We will now load the dataset from Kaggle into this colab environment.

In [1]:
!pip -q install kaggle

In [2]:
from google.colab import files
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"najeebahmadbhuiyan","key":"383163a2bb49482a8a9d34b5428d92a4"}'}

In [6]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/

In [7]:
!chmod 600 ~/.kaggle/kaggle.json

In [8]:
!kaggle datasets download -d tawsifurrahman/tuberculosis-tb-chest-xray-dataset

Downloading tuberculosis-tb-chest-xray-dataset.zip to /content
 97% 646M/663M [00:06<00:00, 111MB/s] 
100% 663M/663M [00:06<00:00, 108MB/s]


In [9]:
!unzip -q tuberculosis-tb-chest-xray-dataset.zip

# Training the Model

Now, we will train the model here. First, we will import the dependencies.

In [27]:
import os
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
from torchvision.datasets import ImageFolder
from tqdm import tqdm
from pathlib import Path
from sklearn.metrics import confusion_matrix, precision_recall_fscore_support
from sklearn.metrics import accuracy_score

In [11]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

We will prepare the dataset for training. We will divide our dataset into training, validation and test set.

In [12]:
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

In [13]:
full_dataset = ImageFolder(root="/content/TB_Chest_Radiography_Database", transform=transform)

In [14]:
train_size = int(0.7 * len(full_dataset))
val_size = int(0.15 * len(full_dataset))
test_size = len(full_dataset) - train_size - val_size

train_dataset, val_dataset, test_dataset = random_split(
    full_dataset, [train_size, val_size, test_size]
)

In [15]:
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=2)

We will now create a custom CNN model to train. We can use any other CNN model if we want, but in that case we need to know the model architecture.

In [16]:
class CustomCNN(nn.Module):
    def __init__(self):
        super(CustomCNN, self).__init__()

        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1)
        self.batch_norm1 = nn.BatchNorm2d(32)
        self.relu1 = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.batch_norm2 = nn.BatchNorm2d(64)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        self.batch_norm3 = nn.BatchNorm2d(128)
        self.relu3 = nn.ReLU()
        self.maxpool3 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv4 = nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1)
        self.batch_norm4 = nn.BatchNorm2d(256)
        self.relu4 = nn.ReLU()
        self.maxpool4 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv5 = nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1)
        self.batch_norm5 = nn.BatchNorm2d(512)
        self.relu5 = nn.ReLU()
        self.maxpool5 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.dropout1 = nn.Dropout(0.3)
        self.dropout2 = nn.Dropout(0.3)

        self.flatten = nn.Flatten()

        self.fc1 = nn.Linear(512 * 7 * 7, 256)
        self.fc2 = nn.Linear(256, 2)

    def forward(self, x):
        x = self.maxpool1(self.relu1(self.batch_norm1(self.conv1(x))))
        x = self.maxpool2(self.relu2(self.batch_norm2(self.conv2(x))))
        x = self.maxpool3(self.relu3(self.batch_norm3(self.conv3(x))))
        x = self.maxpool4(self.relu4(self.batch_norm4(self.conv4(x))))
        x = self.maxpool5(self.relu5(self.batch_norm5(self.conv5(x))))
        x = self.dropout1(x)
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        return x

In [17]:
anantaJalil = CustomCNN().to(device)

In [18]:
print(anantaJalil)

CustomCNN(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (batch_norm1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu1): ReLU()
  (maxpool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (batch_norm2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu2): ReLU()
  (maxpool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (batch_norm3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu3): ReLU()
  (maxpool3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv4): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (batch_norm4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=

In [19]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(anantaJalil.parameters(), lr=0.001)

We created a function `train` to train our model.

In [21]:
def train(model, train_loader, val_loader, criterion, optimizer, device, num_epochs=100, patience=3):
    train_losses = []  # To store training losses during each epoch
    val_losses = []    # To store validation losses during each epoch
    train_accuracies = []  # To store training accuracies during each epoch
    val_accuracies = []    # To store validation accuracies during each epoch
    train_true_labels = []  # To store true labels during training
    train_predicted_labels = []  # To store predicted labels during training
    val_true_labels = []  # To store true labels during validation
    val_predicted_labels = []  # To store predicted labels during validation
    best_val_loss = float('inf')
    counter = 0  # Counter for how many epochs the validation loss hasn't improved

    for epoch in range(num_epochs):
        model.train()
        train_true_labels_epoch = []
        train_predicted_labels_epoch = []

        for inputs, labels in tqdm(train_loader, desc=f"Epoch {epoch + 1}/{num_epochs}"):
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            _, predicted = torch.max(outputs, 1)
            train_true_labels_epoch.extend(labels.cpu().numpy())
            train_predicted_labels_epoch.extend(predicted.cpu().numpy())

        # Aggregate true and predicted labels for the entire training set
        train_true_labels.extend(train_true_labels_epoch)
        train_predicted_labels.extend(train_predicted_labels_epoch)

        # Calculate training accuracy and loss
        train_accuracy = accuracy_score(train_true_labels, train_predicted_labels)
        train_loss = criterion(outputs, labels)

        train_accuracies.append(train_accuracy)
        train_losses.append(train_loss.item())

        # Validation loop
        model.eval()
        val_true_labels_epoch = []
        val_predicted_labels_epoch = []

        with torch.no_grad():
            total_correct = 0
            total_samples = 0
            val_loss = 0.0
            for inputs, labels in val_loader:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                _, predicted = torch.max(outputs, 1)
                total_samples += labels.size(0)
                total_correct += (predicted == labels).sum().item()
                val_loss += criterion(outputs, labels).item()

                val_true_labels_epoch.extend(labels.cpu().numpy())
                val_predicted_labels_epoch.extend(predicted.cpu().numpy())

            # Aggregate true and predicted labels for the entire validation set
            val_true_labels.extend(val_true_labels_epoch)
            val_predicted_labels.extend(val_predicted_labels_epoch)

            accuracy = total_correct / total_samples
            avg_val_loss = val_loss / len(val_loader)

            val_accuracies.append(accuracy)
            val_losses.append(avg_val_loss)

            print(f"Epoch {epoch + 1}/{num_epochs}, Validation Accuracy: {accuracy:.4f}, Validation Loss: {avg_val_loss:.4f}")

            # Check for improvement in validation loss
            if avg_val_loss < best_val_loss:
                best_val_loss = avg_val_loss
                counter = 0
            else:
                counter += 1

            # Check if early stopping criteria are met
            if counter >= patience:
                print(f"Early stopping at epoch {epoch + 1} due to no improvement in validation loss.")
                break


We create a function `print_size_of_model` to see the size of it in MB.

In [24]:
def print_size_of_model(model):
    torch.save(model.state_dict(), "temp_delme.pth")
    size_kb = os.path.getsize("temp_delme.pth") / 1e3
    size_mb = size_kb / 1e3
    print('Size (MB):', size_mb)
    os.remove('temp_delme.pth')

To test our function, we create a function called `test`.

In [25]:
def test(model, test_loader, device, total_iterations: int = None):
    correct = 0
    total = 0
    test_true_labels = []  # To store true labels during testing
    test_predicted_labels = []  # To store predicted labels during testing
    iterations = 0
    model.eval()
    with torch.no_grad():
        for inputs, labels in tqdm(test_loader, desc='Testing'):
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            test_true_labels.extend(labels.cpu().numpy())
            test_predicted_labels.extend(predicted.cpu().numpy())
            for idx, i in enumerate(outputs):
                if torch.argmax(i) == labels[idx]:
                    correct += 1
                total += 1
            iterations += 1
            if total_iterations is not None and iterations >= total_iterations:
                break
    accuracy = accuracy_score(test_true_labels, test_predicted_labels)
    print(f'Accuracy: {accuracy}')

Now, we will train and save the model. Here, if the model is already trained then it will load the saved model to use.

In [26]:
MODEL_FILENAME = 'original_model.pt'

In [28]:
if Path(MODEL_FILENAME).exists():
    anantaJalil.load_state_dict(torch.load(MODEL_FILENAME))
    print('Loaded model from disk')
else:
    train(anantaJalil, train_loader, val_loader, criterion, optimizer, device, num_epochs=100, patience=3)
    # Save the model to disk
    torch.save(anantaJalil.state_dict(), MODEL_FILENAME)

Epoch 1/100: 100%|██████████| 92/92 [11:47<00:00,  7.69s/it]


Epoch 1/100, Validation Accuracy: 0.9730, Validation Loss: 0.1869


Epoch 2/100: 100%|██████████| 92/92 [11:45<00:00,  7.67s/it]


Epoch 2/100, Validation Accuracy: 0.9127, Validation Loss: 0.3722


Epoch 3/100: 100%|██████████| 92/92 [11:55<00:00,  7.77s/it]


Epoch 3/100, Validation Accuracy: 0.9905, Validation Loss: 0.0395


Epoch 4/100: 100%|██████████| 92/92 [11:44<00:00,  7.66s/it]


Epoch 4/100, Validation Accuracy: 0.9857, Validation Loss: 0.0337


Epoch 5/100: 100%|██████████| 92/92 [11:43<00:00,  7.64s/it]


Epoch 5/100, Validation Accuracy: 0.9619, Validation Loss: 0.1051


Epoch 6/100: 100%|██████████| 92/92 [11:38<00:00,  7.60s/it]


Epoch 6/100, Validation Accuracy: 0.9952, Validation Loss: 0.0125


Epoch 7/100: 100%|██████████| 92/92 [11:37<00:00,  7.59s/it]


Epoch 7/100, Validation Accuracy: 0.9937, Validation Loss: 0.0155


Epoch 8/100: 100%|██████████| 92/92 [11:39<00:00,  7.61s/it]


Epoch 8/100, Validation Accuracy: 0.9937, Validation Loss: 0.0187


Epoch 9/100: 100%|██████████| 92/92 [11:37<00:00,  7.59s/it]


Epoch 9/100, Validation Accuracy: 0.9841, Validation Loss: 0.0869
Early stopping at epoch 9 due to no improvement in validation loss.


Training is done.

# Testing & Seeing the Model Size

We now see the weights of the first convolutional layer of the model after training. Also we will see the data type of the weights.

In [34]:
# Printing the weights matrix of the model before quantization
print('Weights before quantization')
print(anantaJalil.conv1.weight)
print(anantaJalil.conv1.weight.dtype)

Weights before quantization
Parameter containing:
tensor([[[[-1.4485e-01, -1.3680e-01,  1.4645e-01],
          [-3.4057e-02,  4.0066e-02,  1.4653e-01],
          [-3.9360e-02, -3.1954e-02,  1.9848e-01]],

         [[-4.3736e-02, -5.3364e-02, -7.6015e-02],
          [-1.1375e-02, -9.7157e-02,  3.7182e-02],
          [-2.4842e-02, -1.9482e-02,  1.8802e-01]],

         [[-1.3220e-01,  4.8379e-02,  6.2325e-02],
          [ 3.2245e-02, -1.4307e-01, -9.6930e-02],
          [-4.3221e-02,  1.1859e-01, -1.1901e-01]]],


        [[[ 4.6476e-02,  2.8481e-02, -1.4417e-01],
          [ 1.6459e-01, -4.2325e-03, -5.4186e-02],
          [ 1.9815e-02, -3.2094e-02,  1.5083e-01]],

         [[ 1.8349e-01, -1.1144e-01, -1.6888e-01],
          [-4.6052e-02, -1.5621e-01,  5.9176e-02],
          [ 1.5712e-01, -1.4579e-02,  5.5457e-02]],

         [[-8.7788e-03, -3.6236e-02,  2.2953e-02],
          [ 1.8419e-01, -1.1851e-02,  9.2437e-02],
          [-1.8030e-01,  5.1614e-02, -1.5324e-01]]],


        [[[ 1.26

As we see, the weights are all float32. We will later quantize it to int8 and it will reduce the overall size of the model without affecting it's accuracy.

In [30]:
print('Size of the model before quantization')
print_size_of_model(anantaJalil)

Size of the model before quantization
Size (MB): 31.995810000000002


So we see, the original model `anantaJalil` is 31.99MB. Let's test the model and see its performance.

In [29]:
test(anantaJalil, test_loader, device)

Testing: 100%|██████████| 20/20 [01:07<00:00,  3.37s/it]

Accuracy: 0.9841269841269841





# Quantizing the Model

We will be quantizing our model now in some steps.

## Insert min-max observers in the model

We create `QuantizedCustomCNN` which the same as `CustomCNN` with min-max obersevers into it.

In [50]:
class QuantizedCustomCNN(nn.Module):
    def __init__(self):
        super(QuantizedCustomCNN, self).__init__()

        self.quant = torch.quantization.QuantStub() # 1

        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1)
        self.batch_norm1 = nn.BatchNorm2d(32)
        self.relu1 = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.batch_norm2 = nn.BatchNorm2d(64)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        self.batch_norm3 = nn.BatchNorm2d(128)
        self.relu3 = nn.ReLU()
        self.maxpool3 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv4 = nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1)
        self.batch_norm4 = nn.BatchNorm2d(256)
        self.relu4 = nn.ReLU()
        self.maxpool4 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv5 = nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1)
        self.batch_norm5 = nn.BatchNorm2d(512)
        self.relu5 = nn.ReLU()
        self.maxpool5 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.dropout1 = nn.Dropout(0.3)
        self.dropout2 = nn.Dropout(0.3)

        self.flatten = nn.Flatten()

        self.fc1 = nn.Linear(512 * 7 * 7, 256)
        self.fc2 = nn.Linear(256, 2)

        self.dequant = torch.quantization.DeQuantStub() # 2

    def forward(self, x):
        x = self.quant(x) # 1
        x = self.maxpool1(self.relu1(self.batch_norm1(self.conv1(x))))
        x = self.maxpool2(self.relu2(self.batch_norm2(self.conv2(x))))
        x = self.maxpool3(self.relu3(self.batch_norm3(self.conv3(x))))
        x = self.maxpool4(self.relu4(self.batch_norm4(self.conv4(x))))
        x = self.maxpool5(self.relu5(self.batch_norm5(self.conv5(x))))
        x = self.dropout1(x)
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        x = self.dequant(x) # 2
        return x


In [37]:
anantaJalilJr = QuantizedCustomCNN().to(device)
# Copy weights from unquantized model
anantaJalilJr.load_state_dict(anantaJalil.state_dict())
anantaJalilJr.eval()

anantaJalilJr.qconfig = torch.ao.quantization.default_qconfig
anantaJalilJr = torch.ao.quantization.prepare(anantaJalilJr) # Insert observers
anantaJalilJr

QuantizedCustomCNN(
  (quant): QuantStub(
    (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)
  )
  (conv1): Conv2d(
    3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
    (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)
  )
  (batch_norm1): BatchNorm2d(
    32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
    (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)
  )
  (relu1): ReLU()
  (maxpool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(
    32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
    (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)
  )
  (batch_norm2): BatchNorm2d(
    64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
    (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)
  )
  (relu2): ReLU()
  (maxpool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=Fal

Here, we see the min-max obeserver inside the model which is important in post-training quantization.

In [39]:
test(anantaJalilJr, test_loader, device)

Testing: 100%|██████████| 20/20 [01:17<00:00,  3.90s/it]

Accuracy: 0.9841269841269841





## Calibrating the Model using Test Set

In [40]:
print(f'Check statistics of the various layers')
anantaJalilJr

Check statistics of the various layers


QuantizedCustomCNN(
  (quant): QuantStub(
    (activation_post_process): MinMaxObserver(min_val=-2.1179039478302, max_val=2.640000104904175)
  )
  (conv1): Conv2d(
    3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
    (activation_post_process): MinMaxObserver(min_val=-3.366504669189453, max_val=3.813620090484619)
  )
  (batch_norm1): BatchNorm2d(
    32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
    (activation_post_process): MinMaxObserver(min_val=-19.95576286315918, max_val=21.162017822265625)
  )
  (relu1): ReLU()
  (maxpool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(
    32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
    (activation_post_process): MinMaxObserver(min_val=-8.581221580505371, max_val=11.276447296142578)
  )
  (batch_norm2): BatchNorm2d(
    64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
    (activation_post_process): MinMaxObserver(min_val=-16.4288406372

## Quantize the Model Using the Collected Statistics

By inserting min-max observer, we got the min-max values while calibrating using the test set. Now we will quantize it.


In [41]:
anantaJalilJr = torch.ao.quantization.convert(anantaJalilJr)

In [42]:
print(f'Check statistics of the various layers')
anantaJalilJr

Check statistics of the various layers


QuantizedCustomCNN(
  (quant): Quantize(scale=tensor([0.0375]), zero_point=tensor([57]), dtype=torch.quint8)
  (conv1): QuantizedConv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), scale=0.056536413729190826, zero_point=60, padding=(1, 1))
  (batch_norm1): QuantizedBatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu1): ReLU()
  (maxpool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): QuantizedConv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), scale=0.15635959804058075, zero_point=55, padding=(1, 1))
  (batch_norm2): QuantizedBatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu2): ReLU()
  (maxpool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv3): QuantizedConv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), scale=0.4456050395965576, zero_point=59, padding=(1, 1))
  (batch_norm3): QuantizedBatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, 

Now, looking at the statistics of various layers, we see that each convolutional layer and fully-connected layers have `scale` and `zero-point` values. Also it has become `QuantizedConv2d` for the convolutional layer and `QuantizedLinear` for the fully-connected layers.

In [43]:
# Print the weights matrix of the model after quantization
print('Weights after quantization')
print(torch.int_repr(anantaJalilJr.conv1.weight()))

Weights after quantization
tensor([[[[ -82,  -77,   83],
          [ -19,   23,   83],
          [ -22,  -18,  112]],

         [[ -25,  -30,  -43],
          [  -6,  -55,   21],
          [ -14,  -11,  106]],

         [[ -75,   27,   35],
          [  18,  -81,  -55],
          [ -24,   67,  -67]]],


        [[[  26,   16,  -81],
          [  93,   -2,  -31],
          [  11,  -18,   85]],

         [[ 103,  -63,  -95],
          [ -26,  -88,   33],
          [  89,   -8,   31]],

         [[  -5,  -20,   13],
          [ 104,   -7,   52],
          [-102,   29,  -86]]],


        [[[  71,  -19,  -46],
          [  81,  -14,   25],
          [  -9, -104,  -39]],

         [[  87,  -95,    2],
          [  70,  -71,  -15],
          [  46,   37,   84]],

         [[ -77,   44,   19],
          [  -8,   55,   20],
          [  66,  -15,  -90]]],


        [[[   6,   75,   43],
          [ -99,  -92,   12],
          [ -95,  -86,  -54]],

         [[ -32,  -56,   71],
          [ -42, 

Looking at the weights of first convolutional layer we see that due to quantization it has now been converted to `int8`.

As far as I know, while running the quantized model, it will run the weights in `float32` although it will store it in `int8` which will reduce the size. Thus, while testing the quantized model, we will see the dequantized `float32` weights from the `int8` weights.

In [44]:
print('Original weights: ')
print(anantaJalil.conv1.weight)
print('')
print(f'Dequantized weights: ')
print(torch.dequantize(anantaJalilJr.conv1.weight()))
print('')

Original weights: 
Parameter containing:
tensor([[[[-1.4485e-01, -1.3680e-01,  1.4645e-01],
          [-3.4057e-02,  4.0066e-02,  1.4653e-01],
          [-3.9360e-02, -3.1954e-02,  1.9848e-01]],

         [[-4.3736e-02, -5.3364e-02, -7.6015e-02],
          [-1.1375e-02, -9.7157e-02,  3.7182e-02],
          [-2.4842e-02, -1.9482e-02,  1.8802e-01]],

         [[-1.3220e-01,  4.8379e-02,  6.2325e-02],
          [ 3.2245e-02, -1.4307e-01, -9.6930e-02],
          [-4.3221e-02,  1.1859e-01, -1.1901e-01]]],


        [[[ 4.6476e-02,  2.8481e-02, -1.4417e-01],
          [ 1.6459e-01, -4.2325e-03, -5.4186e-02],
          [ 1.9815e-02, -3.2094e-02,  1.5083e-01]],

         [[ 1.8349e-01, -1.1144e-01, -1.6888e-01],
          [-4.6052e-02, -1.5621e-01,  5.9176e-02],
          [ 1.5712e-01, -1.4579e-02,  5.5457e-02]],

         [[-8.7788e-03, -3.6236e-02,  2.2953e-02],
          [ 1.8419e-01, -1.1851e-02,  9.2437e-02],
          [-1.8030e-01,  5.1614e-02, -1.5324e-01]]],


        [[[ 1.2613e-01, -

Therefore, we see the weights of the original model versus the dequantized weights of the quantized model (converted from `int8` to `float32`) remains almost similiar.

In [45]:
print('Size of the model after quantization')
print_size_of_model(anantaJalilJr)

Size of the model after quantization
Size (MB): 8.031906


In [47]:
print('Testing the model after quantization')
test(anantaJalilJr, test_loader, device)

Testing the model after quantization


Testing: 100%|██████████| 20/20 [00:42<00:00,  2.13s/it]

Accuracy: 0.973015873015873





So finally, we see that the size of the quantized model is 8.03MB which is about 4x smaller than the original model which sized 31.99MB! Also we see the accuracy has not dropped significantly.