# Pruning

After learning, neural networks have modified and learned a set of parameters to perform our classification task. However, such parameters are costly to maintain and do not hold the same importance.

Wouldn't it be great could optimize our resource usage by dropping less important values ? This is where pruning comes into play.

Pruning is a technique that cuts off parameters/structures from a model to increase sparcity and decrease overall model size, similar to cutting leafs or branches from bushes and trees. This process can lead to smaller memory consumption with minimal accuracy reduction. Moreover, pruning the network may also provide a speedup since there will be less operations being performed.

The pruning process can be performed during the end of an epoch of training or after training is complete. Experimenting to find out which way works the best is part of the fun !

In [1]:
import sys
print(sys.executable)
print(sys.version)

c:\Users\danie\miniconda3\envs\torch_env\python.exe
3.9.23 | packaged by conda-forge | (main, Jun  4 2025, 17:49:16) [MSC v.1929 64 bit (AMD64)]


In [17]:
from ece662.pruning_helper import test_model, load_model
from ece662.data_utils import get_CINIC10_data
import os

Below we will load a pre-trained model for you to work on. If you prefer, you can save your own model from the previous Tensorflow/Pytorch task and load it here.

In [26]:
#This code may take a while to execute as it is training a network form scratch

data = get_CINIC10_data()
mode = 'torch'#torch or tensorflow

test_data = [data['X_test'],data['y_test']]

#path = os.path.join('/content/drive/My Drive/{}'.format(FOLDERNAME), f"ece662/models/{mode}.model")
path = os.path.join(r"C:\Users\danie\ECE\ECE662\ECE662_repo\assignment1\ece662\models", "torch.model")

model = load_model(path,mode=mode)
test_model(model,test_data,mode=mode)

  model.load_state_dict(torch.load(path))


Test Acc: 0.5151


## Unstructured Pruning

Unstructured Pruning is usually related to the pruning of weights in neural networks. The general idea is to select a set of weights according to a policy and setting them up to zero. 

Common policies are random weight selection or selecting the smallers weights. 
Unstructured Pruning can be performed in one or multiple layers within the same network.

Altough in theory Unstructured Pruning should decrease the number of operations performed during execution there should be explicit support within the framework or hardware to bypass such operations, otherwise it will just operated over zero.

### Perform Pruning

Using the model trained in the previous step using pytorch, perform unstructured pruning in the weights of the model by removing x% of the smallest weights. 

*   Increment global pruning by 10% until reaching total of 80% pruned weights
*   Perform inference at the end of each pruning and observe the impact into the accuracy.


Note: The percentages are related to the entire model, not per layer.



In [27]:
################################################################################
# TODO: Perform unstructured Pruning over the trained model using 3 different  
# prunning percentages.                                
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
import torch.nn.utils.prune as prune
import torch

#Total pruning percentages from 10% up to 80% incremented by 10%
pruning_steps = [0.1 * i for i in range(1, 9)] 

# Get list of all weights to prune (conv and linear layers)
parameters_to_prune = []
for name, module in model.named_modules():
    if isinstance(module, torch.nn.Conv2d) or isinstance(module, torch.nn.Linear):
        parameters_to_prune.append((module, 'weight'))

# Pruning incrementally
previous_amount = 0.0
for amount in pruning_steps:
    # Calculate incremental pruning for this step
    incremental_amount = amount - previous_amount
    
    prune.global_unstructured(
        parameters_to_prune,
        pruning_method=prune.L1Unstructured,
        amount=incremental_amount,
    )
    
    print(f'Pruned {int(amount*100)}% of total weights globally.')
    test_model(model, test_data, mode='torch')
    
    previous_amount = amount

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

Pruned 10% of total weights globally.
Test Acc: 0.5114
Pruned 20% of total weights globally.
Test Acc: 0.5091
Pruned 30% of total weights globally.
Test Acc: 0.5003
Pruned 40% of total weights globally.
Test Acc: 0.4978
Pruned 50% of total weights globally.
Test Acc: 0.4951
Pruned 60% of total weights globally.
Test Acc: 0.4902
Pruned 70% of total weights globally.
Test Acc: 0.4905
Pruned 80% of total weights globally.
Test Acc: 0.4918


## Inline Question 1:

What happened with the accuracy as the % of pruning increased ?
Why was that the case?


## Answer: 

[The accuracy decreased as pruning increased. This is because the more we prune, the less data we are using to train our model.]

## Structured Pruning

Structured Pruning consists of removing a bigger chunk of the network parameters at the same time. Instead of removing only a few weights, it is commonplace to remove entire neurons. 

For example, in Convolutional Layers, removing filters can be beneficial to improve performance as it greatly decreases the amount of computation performed. However, some of these changes may affect output dimensions which may be carried over to other parts of the network. Therefore, when performing structured pruning one must always be aware of which parameters are going to be affected.

Using the previously trained model in the CINIC-10, perform Structured Prunning only in the Convolution layers of the DNN.

In [25]:
################################################################################
# TODO: Perform unstrucuted Pruning over the trained model using 3 different  
# prunning percentages.                                
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
pruning_percentages = [0.2, 0.5, 0.8]

# Get parameters to prune once
parameters_to_prune = []
for name, module in model.named_modules():
    if isinstance(module, torch.nn.Conv2d) or isinstance(module, torch.nn.Linear):
        parameters_to_prune.append((module, 'weight'))

# Baseline accuracy before pruning
model.eval()
print("Before pruning:")
test_model(model, test_data, mode='torch')

for p in pruning_percentages:
    # Remove previous pruning masks to reset
    for module, name in parameters_to_prune:
        try:
            prune.remove(module, name)
        except ValueError:
            # This happens if prune not applied yet, so ignore
            pass
    
    # Apply global pruning again with new amount
    prune.global_unstructured(
        parameters_to_prune,
        pruning_method=prune.L1Unstructured,
        amount=p,
    )

    model.eval()
    print(f"Pruning {int(p*100)}% of weights globally...")
    test_model(model, test_data, mode='torch')

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

Before pruning:
Test Acc: 0.5211
Pruning 20% of weights globally...
Test Acc: 0.5209
Pruning 50% of weights globally...
Test Acc: 0.4948
Pruning 80% of weights globally...
Test Acc: 0.4242


## Inline Question 2:

What is the difference between performing Structured Pruning vs Dropout ? 
Why would it be beneficial to perform both techniques when developing a Neural Network?


## Answer: 

[Structured pruning permanently removes parts of the network, like whole neurons or filters, to make the model smaller and faster, and it is interesting to note that there is a much larger dropoff in accuracy at 80% structured pruning when compared to unstructured. On the other hand, dropout temporarily turns off random neurons during training to help prevent overfitting. Both techniques would be beneficial to developing a neural network as we prune to make the model smaller and faster and use dropout randomly to prevent overfitting.]
