# ESE 5390 Project Code

The provided project starter code walks through a comparative study of three post-training pruning methods:

* **Global Weight Pruning**
* **Layer-wise Weight Pruning**
* **Layer-wise Output Channel Pruning**

Following this starter code, you will further enhance the **efficiency and accuracy** of layer-wise channel pruning by implementing one of the two advanced techniques:

* **Training-Aware Pruning:**
    Apply regularization during training to promote sparsity. Refer to Wen et al.’s _"Learning Structured Sparsity in Deep Neural Networks"_, though you are encouraged to explore other regularization techniques.

* **Iterative Pruning:**
    Repeatedly prune and retrain the network to restore accuracy. You may refer to Han et al.’s _"Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization, and Huffman Coding."_ However, feel free to develop innovative variations of this approach.

The grading will reflect how well you understand the literature, design and execute experiments, present your findings, and document your project in a comprehensive technical report.

 **Bonus points** are available for extending the project scope through additional explorations (see **Bonus Opportunities**).

## Environment

We strongly recommend using GPU or TPU to avoid excessive runtimes.

In [None]:
import torch
import torch.nn as nn
import torch.nn.utils.prune as prune
import torchvision
import time
import numpy as np
import copy
import sys
import os
from torchvision import datasets, models, transforms
from pathlib import Path
import random
import matplotlib.pyplot as plt
print("PyTorch Version: ", torch.__version__)
print("Torchvision Version: ", torchvision.__version__)
Path("./data").mkdir(parents=True, exist_ok=True)
data_dir = Path('./data')
net_fn = Path('./net')

# see if gdown is already downloaded
try:
    import gdown
    print("gdown is already installed, skipping installing command to save time......")
except ImportError:
    !conda install -y gdown

# Seed for reproducibility 
torch.manual_seed(0)
np.random.seed(0)

In [None]:
print("PyTorch Version: ", torch.__version__)
print("Torchvision Version: ", torchvision.__version__)
Path("./data").mkdir(parents=True, exist_ok=True)
data_dir = Path('./data')
net_fn = Path('./net')

# see if gdown is already downloaded
try:
    import gdown
    print("gdown is already installed, skipping installing command to save time......")
except ImportError:
    !conda install -y gdown

In [None]:
REPOS = [
    [
        "1ZLyAcimq4sdZ0tl5yINudwSD843pAoOJ",
        "1EXwrSw6BWKMC4ovPRUyfuqeObuqsR-R5",
        "1rFIAJ9aLZrRCOeijo5mI0zWiEl3OMJ3j"
    ],
    [
        "1VLe11mOwsetC4IlL3wy8QexTat8XCool",
        "1_QwUcr3gmnjPFbKj0VrfOayVEWjRBS4Z",
        "1fgf9elhD7EhbKJn2NMnQS1iwQXesQZ3g"
    ],
    [
        "1BKC4kCB9sbwfRuhrAJWVEmbLoHx2vJWC",
        "1qhrcYgGKRf3Wt8YDU6aVRhzJDFHeqerP",
        "1JXym04uAoNGGpkBRzIn0yStZmRX0SmNu"
    ]
]

FILES = [
    Path("./data/ILSVRC2012_devkit_t12.tar.gz"),
    Path("./data/ILSVRC2012_devkit_t3.tar.gz"),
    Path("./data/ILSVRC2012_img_val.tar")
]

# Shuffle the order of repos
random.seed(int(time.time()))
random.shuffle(REPOS)

for repo_index, repo in enumerate(REPOS):
    missing_files = [str(file) for file in FILES if not file.is_file()]
    if not missing_files:
        print("All files are present. Skipping further downloads.")
        break

    print(f"Attempting download from repo {repo_index}")
    print(f"Missing files: {missing_files}")

    for file, url in zip(FILES, repo):
        if not file.is_file():
            print(f"Downloading {file}...")
            !gdown "{url}" -O "{str(file)}"

missing_files = [str(file) for file in FILES if not file.is_file()]
if missing_files:
    print(f"Failed to download all files. Still missing: {missing_files}")
    print(f"Contact TA team through EdStem with the output of this code block!", file=sys.stderr)
else:
    print("All files successfully downloaded.")

In [None]:
# Create transform to preprocess data
val_transform = transforms.Compose([
    transforms.Resize(224),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Create validation dataset
val_dataset = datasets.ImageNet('./data', split='val', transform=val_transform)

# Create validation dataloader
val_dataloader = torch.utils.data.DataLoader(val_dataset, shuffle=False, num_workers=2)

print(f'Number of validation images: {len(val_dataset)}')

In [None]:
# Put your reusable functions here.
# You can copy functions from previous labs and tutorials.

## 1. Global Weight Pruning

- Perform pruning using `prune.global_unstructured` on a pretrained VGG16 using various sparsity targets.
- The models should be pruned away at 5% increments from 5% to 45% total pruning of the model.
- Save a copy of each pruned model in a dictionary.
- Plot the top-1 accuracy against weight pruned from 0% to 45% (use the original VGG16 for 0%).

In [None]:
orig_model = #TODO: load pretrained model
prune_rate_list = range(5, 50, 5) # List of prune rate to test from 5-45 inclusive with step of 5
global_pruning = {} # Dictionary to store global pruning results
for prune_rate in prune_rate_list:
    model = #TODO: load pretrained model
    parameter_to_prune = (
        (models.features[0], 'weight'), # conv1 of VGG16
        #TODO: Add more layers to prune
    )
    #TODO: Prune model
    #TODO: make the pruning permanent to increase speed
    global_pruning[prune_rate_list] = {} # Dictionary to store accuracy results and model for each prune rate
    global_pruning[prune_rate]['model'] = model # Copy pruned model to dictionary
    # TODO Run validation on the pruned model
    global_pruning[prune_rate]['top1_acc'] = None # TODO fill with top1 accuracy
    global_pruning[prune_rate]['top1_acc_rel'] = None # Percent accuracy compared to original model
    print(f'Top1 accuracy for prune amount {prune_rate}%: {global_pruning[prune_rate]["top1_acc"]}%')
    print(f'Top1 accuracy (rel) for prune amount {prune_rate}%: {global_pruning[prune_rate]["top1_acc_rel"]}%')

# TODO plot the results

## 2. Layer-wise Weight Pruning

- Perform pruning using `prune.l1_unstructured` on each layer of pretrained VGG16 using various sparsity targets.
- Each layer should be pruned away at 5% increments from 5% to 45% total pruning of the model.
- Save a copy of each pruned model in a dictionary.
- Plot the top-1 accuracy against weight pruned from 0% to 45% (use the original VGG16 for 0%).
- What do you observe? Why does layer-wise pruning perform better/worse than global pruning?

In [None]:
orig_model = #TODO: load pretrained model
prune_rate_list = range(5, 50, 5) # List of prune rate to test from 5-45 inclusive with step of 5
layer_pruning = {} # Dictionary to store layer pruning results
for prune_rate in prune_rate_list:
    model = #TODO: load pretrained model
    convs_to_prune = () #TODO: Add conv layers to prune
    linears_to_prune = () #TODO: Add linear layers to prune
    #TODO: Prune model
    layer_pruning[prune_rate_list] = {} # Dictionary to store accuracy results and model for each prune rate
    layer_pruning[prune_rate]['model'] = model # Copy pruned model to dictionary
    # TODO Run validation on the pruned model
    layer_pruning[prune_rate]['top1_acc'] = None # TODO fill with top1 accuracy
    layer_pruning[prune_rate]['top1_acc_rel'] = None # Percent accuracy compared to original model
    print(f'Top1 accuracy for prune amount {prune_rate}%: {layer_pruning[prune_rate]["top1_acc"]}%')
    print(f'Top1 accuracy (rel) for prune amount {prune_rate}%: {layer_pruning[prune_rate]["top1_acc_rel"]}%')

# TODO plot the results

## 3. Layer-wise Output Channel Pruning

- Perform pruning using `prune.ln_structured` on each layer of pretrained VGG16 using various sparsity targets.
- Prune along the output channels for conv layers and output dimension for linear layers
- Do not prune the last linear layer to preserve the number of predicted classes
- Each layer should be pruned away at 5% increments from 5% to 45% total pruning of the model.
- Save a copy of each pruned model in a dictionary.
- Plot the top-1 accuracy against weight pruned from 0% to 45% (use the original VGG16 for 0%).
- What do you observe? Why does layer-wise channel pruning perform better/worse than layer-wise unstructured pruning?

In [None]:
orig_model = #TODO: load pretrained model
prune_rate_list = range(5, 50, 5) # List of prune rate to test from 5-45 inclusive with step of 5
channel_pruning = {} # Dictionary to store channel pruning results
for prune_rate in prune_rate_list:
    model = #TODO: load pretrained model
    convs_to_prune = () #TODO: Add conv layers to prune
    linears_to_prune = () #TODO: Add linear layers to prune (except the last linear layer)
    #TODO: Prune model
    channel_pruning[prune_rate_list] = {} # Dictionary to store accuracy results and model for each prune rate
    channel_pruning[prune_rate]['model'] = model # Copy pruned model to dictionary
    # TODO Run validation on the pruned model
    channel_pruning[prune_rate]['top1_acc'] = None # TODO fill with top1 accuracy
    channel_pruning[prune_rate]['top1_acc_rel'] = None # Percent accuracy compared to original model
    print(f'Top1 accuracy for prune amount {prune_rate}%: {channel_pruning[prune_rate]["top1_acc"]}%')
    print(f'Top1 accuracy (rel) for prune amount {prune_rate}%: {channel_pruning[prune_rate]["top1_acc_rel"]}%')

# TODO plot the results

## 3.5 Layer-wise Output Channel Pruning (Continued)

In this section, we harden the channel pruning by removing the channels
- Collect the number of non-zero output channels for each layer after `prune.ln_structured`
- Instantiate a new model based on the number of non-zero output channels
- Remember that the input channels of next layer should match the output channels for current layer
- Pay special attention to the last convolution layer and first linear layer
- Copy over the the non-zero weights from the pruned model to the hardened model
- Calculate the run time and model size for the original VGG16 and the hardened VGG16s
- Plot the **relative** top-1 accuracy, run time and model size from 0% to 45% pruned VGG16
- What do you observe? What is the trade-off between accuracy, run time, and model size?

In [None]:
prune_rate_list = range(5, 50, 5) # List of prune rate to test from 5-45 inclusive with step of 5
hardened_pruning = {} # Dictionary to store hardened pruning results
for prune_rate in prune_rate_list:
    #TODO: Collect the number of non-zero channels for each layer
    #TODO: Instantiate a new model based on collected number of non-zero channels
    hardened_pruning[prune_rate_list] = {} # Dictionary to store accuracy results and model for each prune rate
    hardened_pruning[prune_rate]['model'] = model # Copy original model to dictionary
    hardened_pruning[prune_rate]['top1_acc'] = None # TODO fill with top1 accuracy
    hardened_pruning[prune_rate]['top1_acc_rel'] = None # TODO Percent accuracy compared to original model
    hardened_pruning[prune_rate]['run_time'] = None # TODO Collect run time of the hardened model
    hardened_pruning[prune_rate]['run_time_rel'] = None # TODO Collect run time (relative) of the hardened model
    hardened_pruning[prune_rate]['model_size'] = None # TODO Collect model size of the hardened model
    hardened_pruning[prune_rate]['model_size_rel'] = None # TODO Collect model size (relative) of the hardened model

In [None]:
#TODO plot the relative accuracy, relative run time, and relative model size

## 4. Training-Aware Pruning or Iterative Pruning

## 5. (Optional) Bonus Exploration