# Edge ML Homework: Optimizing `MobileNetV2` with **Quantization** and **Pruning**

## Pruning and Quantization of `MobileNetV2`

This notebook demonstrates a complete workflow for optimizing a `MobileNetV2` model using quantization and pruning. The goal is to reduce the model's size and inference time while maintaining its performance on the ImageNet dataset.


## Instructions
- Using the notebook examples discussed in class and apply model pruning techniques to a `MobileNetv2` quantized model.

- The objective is to perform pruning while preserving (as much as you can) original model accuracy.

- The submission will be a Google Collab Notebook file.

- Your notebook must perform the accuracy test of the original quantized model and the pruned model.

- Runtime must use CPU (not GPU).

- Accuracy will be based on on ImageNet Dataset Validation directory (This will be provided).

- Image files must be stored on your google drive using the following code and data path:

    ```py
    from google.colab import drive
    drive.mount('/content/drive')
    data_path = "/content/drive/My Drive/colab_files/imagenet/"
    ```

- Notebook must be documented using Jupyter Notebooks Markdown format.

- On the documentation, explain how you did the pruning and why you decided on the current approach.

## Approach used




---
### Load and Quantize the Model

The selected `mobilenet_v2` model is initialized with a copy to serve as a baseline for comparison once the results of the pruned and quantized model have been obtained.






In [50]:
import torch
import torchvision
import torch.ao.quantization as quantization
import torch.nn.utils.prune as prune
import copy

# Load the model with updated weights argument
original_model = torchvision.models.mobilenet_v2(weights=torchvision.models.MobileNet_V2_Weights.IMAGENET1K_V1)
reduced_model = copy.deepcopy(original_model)

quantized_model = torch.ao.quantization.quantize_dynamic(
    reduced_model,
    {torch.nn.Linear},
    dtype=torch.qint8)

print("Model converted to quantized form.")

Model converted to quantized form.


---
### Evaluation Funtion

**Purpose**: The `evaluate` function measures a model's performance on a validation dataset by calculating loss, accuracy, and inference time. It also provides real-time feedback by printing progress every 10 batches.



In [51]:
import time
import numpy as np

def evaluate(model, data_loader, loss_history):
    model.eval()

    total_samples = len(data_loader.dataset)
    correct_samples = 0
    total_loss = 0
    times = []

    with torch.no_grad():
        for i, (data, target) in enumerate(data_loader):
            start_time = time.time()
            output = torch.nn.functional.log_softmax(model(data), dim=1)
            end_time = time.time()
            times.append(1000 * (end_time - start_time))
            loss = torch.nn.functional.nll_loss(output, target, reduction='sum')
            _, pred = torch.max(output, dim=1)
            total_loss += loss.item()
            correct_samples += pred.eq(target).sum()

            # Print progress every 10 batches
            if (i + 1) % 10 == 0:
                print(f"Processed {i + 1} batches out of {len(data_loader)}")

    avg_inference = np.mean(times)
    std_dev_inference = np.std(times)
    min_inference = np.min(times)
    max_inference = np.max(times)

    avg_loss = total_loss / total_samples
    loss_history.append(avg_loss)
    print('\tAverage test loss: ' + '{:.4f}'.format(avg_loss) +
          '\tAccuracy: ' + '{:5}'.format(correct_samples) + '/' +
          '{:5}'.format(total_samples) + ' (' +
          '{:4.2f}'.format(100.0 * correct_samples / total_samples) + '%)' +
          '\tAverage inference time: ' + '{:.4f}ms'.format(avg_inference) + '\n')

---
### Load the ImageNet Dataset
**Purpose**: Initialize all the necessary image preprocessing for the Validation Set like resizing, cropping, normalization, and conversion to tensors for efficient model evaluation. The Validation Set from ImageNet is loaded onto the notebook to reduce processing time during pruning and fine-tuning.


In [52]:
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from google.colab import drive
from torch.utils.data import Subset
drive.mount('/content/drive')

data_path = "/content/drive/My Drive/ColabNotebooks/imagenet"
imagenet_val = datasets.ImageNet(
	root=data_path,
	split='val',
    transform=transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225])
	])
)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


---
## Data Loader for Model

**Purpose**: Creates three `Subsets` with random samples from a given seed for further evaluation.

- **460** samples for training and validation (split **60%-40%**)
- **150** samples for testing.

Three `DataLoaders` are then initialized:

  - **Training DataLoader (`data_loader_train`)**
  - **Validation DataLoader (`data_loader_val`)**
  - **Testing DataLoader (`data_loader_test`)**



In [53]:
from torch.utils.data import Subset
import torch

max_sample_train = 460
max_sample_test = 150

# Random seeds to avoid overlap
all_indexes = torch.randperm(len(imagenet_val), generator=torch.Generator().manual_seed(99)).tolist()

train_indexes = all_indexes[:int(max_sample_train * 0.6)]  # 60% of training samples for training
val_indexes = all_indexes[int(max_sample_train * 0.6):max_sample_train]  # 40% of training samples for validation

# Test indexes are non-overlapping
test_indexes = [idx for idx in all_indexes if idx not in train_indexes and idx not in val_indexes][:max_sample_test]

dataset_train_subset = Subset(imagenet_val, train_indexes)
dataset_val_subset = Subset(imagenet_val, val_indexes)
dataset_test_subset = Subset(imagenet_val, test_indexes)

# DataLoaders
data_loader_train = torch.utils.data.DataLoader(
    dataset_train_subset,
    batch_size=2,
    shuffle=True,
    num_workers=2
)

data_loader_val = torch.utils.data.DataLoader(
    dataset_val_subset,
    batch_size=20,
    shuffle=True,
    num_workers=2
)

data_loader_test = torch.utils.data.DataLoader(
    dataset_test_subset,
    batch_size=25,
    shuffle=True,
    num_workers=2
)

# Print DataLoader stats
print(f"Training DataLoader created with {len(data_loader_train.dataset)} samples.")
print(f"Validation DataLoader created with {len(data_loader_val.dataset)} samples.")
print(f"Testing DataLoader created with {len(data_loader_test.dataset)} samples.")


Training DataLoader created with 276 samples.
Validation DataLoader created with 184 samples.
Testing DataLoader created with 150 samples.


---
### Iterative Pruning Function

**Purpose**: Dynamically prune the model using an iteration to find the optimal sparsity threshold while monitoring accuracy. Applies `L1-norm-based unstructured pruning` to the weight of the convolutional layer. It removes a fraction of weights (amount is determined by the current sparsity level) based on their magnitudes.

**Why?:** In my case, structured pruning could have made it more challenging to observe the accuracy drop reaching its threshold. This format was not suitable for structured pruning as it involves erasing entire layers, increasing unpredictability. Random pruning was too inconsistent.

In [54]:
import torch
import torch.nn.utils.prune as prune
import copy

def iterative_pruning(model, data_loader, step, max_sparsity):
    pruned_model = copy.deepcopy(model)
    sparsity_levels = []

    print("\n--- Iterative Pruning ---\n")
    for sparsity in torch.arange(step, max_sparsity + step, step):
        for name, module in pruned_model.named_modules():
            if isinstance(module, torch.nn.Conv2d):
                prune.l1_unstructured(module, name="weight", amount=sparsity.item())
                prune.remove(module, "weight")
        sparsity_levels.append(sparsity.item())

        # Evaluate pruned model
        loss = []
        print(f"\n--- Sparsity Level: {sparsity.item() * 100:.0f}% ---\n")
        evaluate(pruned_model, data_loader, loss)

    return pruned_model, sparsity_levels

---
### Fine-Tuning Function

**Purpose**: Failed attempt to implement`Post-Training Pruning` to recover accuracy by training the pruned model. Since then, I have also tried to apply `Quantization Aware Training` `(QAT)`, but after training, the evaluation function does not work because a specific backend function does not support that type of quantized model.

I will leave the funtion here for future testing.


In [55]:
import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler
from torch.nn.utils import clip_grad_norm_

def fine_tune_model(model, train_loader, val_loader, epochs, lr, loss_history=None):
    if loss_history is None:
        loss_history = []

    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = torch.nn.CrossEntropyLoss()

    for epoch in range(epochs):
        model.train()
        total_loss = 0

        # Iterate through the training data
        for data, target in train_loader:
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            clip_grad_norm_(model.parameters(), max_norm=1.0)
            optimizer.step()
            total_loss += loss.item()

        print(f"Epoch {epoch + 1}/{epochs}, Loss: {total_loss:.4f}")
        print(f"Learning Rate after epoch {epoch + 1}: {optimizer.param_groups[0]['lr']:.6f}")
        loss_history.append(total_loss)
        print(f"Validation Metrics After Epoch {epoch + 1}:")
        evaluate(model, val_loader, loss_history)


---
### Pruning Size Comparison

**Purpose**: Evaluates the storage efficiency of pruning by comparing the sizes of the model before and after applying pruning. It performs iterative pruning of the quantized model, progressively increasing the sparsity levels, and tracks the size of the original model and the pruned model in `megabytes` `(MB)`. In this case I have seen that a good threshold for prunnign this model is a `15%` sparsity, without having major drops in accuracy.

**Note:** Despite implementing the `prune.remove(module, "weight")` function to eliminate weights with value 0, the model size (MB) remains unchanged.

In [56]:
loss_history = []

pruned_model = copy.deepcopy(quantized_model)

pruned_model, sparsity_levels = iterative_pruning(
    pruned_model, data_loader_val, step=0.05, max_sparsity=0.1
)

original_size = sum(p.numel() for p in original_model.parameters())
pruned_size = sum(p.numel() for p in pruned_model.parameters())
print(f"Original Model Size: {original_size / (1024 ** 2):.2f} MB")
print(f"Pruned Model Size: {pruned_size / (1024 ** 2):.2f} MB")


--- Iterative Pruning ---


--- Sparsity Level: 5% ---

Processed 10 batches out of 10
	Average test loss: 0.8737	Accuracy:   147/  184 (79.89%)	Average inference time: 899.2363ms


--- Sparsity Level: 10% ---

Processed 10 batches out of 10
	Average test loss: 0.9582	Accuracy:   143/  184 (77.72%)	Average inference time: 984.3173ms


--- Sparsity Level: 15% ---

Processed 10 batches out of 10
	Average test loss: 1.1911	Accuracy:   131/  184 (71.20%)	Average inference time: 838.4227ms

Original Model Size: 3.34 MB
Pruned Model Size: 2.12 MB


---
### Fine-tuning Activation

**Purpose**: Failed Fine-tuning, that was suposed to adjusts the weights of the pruned model using a new training dataset and a specified number of epochs, allowing the model to adapt to the new data and improve its performance... at least in theory.

In [57]:
loss_history = []

tuned_model = copy.deepcopy(pruned_model)

fine_tune_model(
    tuned_model,
    data_loader_train,
    data_loader_val,
    epochs=5,
    lr=1e-3,
    loss_history=loss_history
)

Epoch 1/5, Loss: 606.7614
Learning Rate after epoch 1: 0.001000
Validation Metrics After Epoch 1:
Processed 10 batches out of 10
	Average test loss: 1.3007	Accuracy:   125/  184 (67.93%)	Average inference time: 878.2687ms

Epoch 2/5, Loss: 618.6003
Learning Rate after epoch 2: 0.001000
Validation Metrics After Epoch 2:
Processed 10 batches out of 10
	Average test loss: 1.1599	Accuracy:   129/  184 (70.11%)	Average inference time: 870.9489ms

Epoch 3/5, Loss: 591.8172
Learning Rate after epoch 3: 0.001000
Validation Metrics After Epoch 3:
Processed 10 batches out of 10
	Average test loss: 1.1663	Accuracy:   135/  184 (73.37%)	Average inference time: 762.3266ms

Epoch 4/5, Loss: 594.2369
Learning Rate after epoch 4: 0.001000
Validation Metrics After Epoch 4:
Processed 10 batches out of 10
	Average test loss: 1.1700	Accuracy:   132/  184 (71.74%)	Average inference time: 867.8891ms

Epoch 5/5, Loss: 607.5869
Learning Rate after epoch 5: 0.001000
Validation Metrics After Epoch 5:
Processed 


---
### Final Evaluation of the Quantizied,  Pruned and Fine-Tuned models

**Purpose**: This script performs the final evaluation of the quantized, pruned, and fine-tuned models on the test dataset. Additionally, it compares the sizes of the original, quantized, pruned, and fine-tuned models to assess the impact of quantization and pruning on model storage (MB)

In [58]:
import os

torch.save(quantized_model.state_dict(), "quantized_model.pth")
model_size = os.path.getsize("quantized_model.pth") / (1024 ** 2)
print(f"Quantized Model File Size: {model_size:.2f} MB")

# Final evaluation of pruned and fine-tuned model
print("\n--- Final Metrics for the Quantized Model---\n")
evaluate(quantized_model, data_loader_test, loss_history)
print("\n--- Final Metrics for Quantized & Pruned Model ---\n")
evaluate(pruned_model, data_loader_test, loss_history)
print("\n--- Final Metrics for Quantized, Pruned Model & Fine-Tuned Model ---\n")
evaluate(tuned_model, data_loader_test, loss_history)


# Compare model sizes
original_size = sum(p.numel() for p in original_model.parameters())
quantized_size = sum(p.numel() for p in quantized_model.parameters())
pruned_size = sum(p.numel() for p in pruned_model.parameters())
tuned_size = sum(p.numel() for p in tuned_model.parameters())

print(f"Original Model Size: {original_size / (1024 ** 2):.2f} MB")
print(f"Quantized Model Size: {quantized_size / (1024 ** 2):.2f} MB")
print(f"Pruned Model Size: {pruned_size / (1024 ** 2):.2f} MB")
print(f"Tuned Model Size: {tuned_size / (1024 ** 2):.2f} MB")



Quantized Model File Size: 9.94 MB

--- Final Metrics for the Quantized Model---

	Average test loss: 1.1679	Accuracy:   105/  150 (70.00%)	Average inference time: 1202.4129ms


--- Final Metrics for Quantized & Pruned Model ---

	Average test loss: 1.3917	Accuracy:   104/  150 (69.33%)	Average inference time: 1106.6957ms


--- Final Metrics for Quantized, Pruned Model & Fine-Tuned Model ---

	Average test loss: 1.5561	Accuracy:    96/  150 (64.00%)	Average inference time: 1168.1273ms

Original Model Size: 3.34 MB
Quantized Model Size: 2.12 MB
Pruned Model Size: 2.12 MB
Tuned Model Size: 2.12 MB
