# 2. Pruning

This notebook demonstrates how to prune a model using the `torh.torch.nn.utils.prune` and `torch-pruning` library. Pruning is a technique to reduce the size of a neural network by removing weights that are deemed unnecessary, which can lead to faster inference times and reduced memory usage.

There is 2 types of pruning:
- **Unstructured pruning**: Removes individual weights using an importance metric (e.g., low-magnitude weights are pruned). This can lead to sparse models, which reduce drastically the number of parameters but must rely on specialized hardware and/or libraries to take advantage of the sparsity during inference.
- **Structured pruning**: Removes entire channels or layers, using a metric measuring an entire channel or layer importance (e.g., low-magnitude channels are pruned). This leads to a more regular model that can be used on standard hardware without requiring specialized libraries.

Metrics used for pruning are typically based on the magnitude of weights, gradients, or other statistics that indicate the importance of a weight or a channel.

The process is defined as such:
* A Torch model is loaded.
* A pruning strategy is defined, which specifies how to prune the model (e.g., unstructured or structured pruning, and the importance metric to use).
* The model is pruned using the defined strategy.
* The model is exported PyTorch format for further optimization or deployment.

2 pruning methods will be used in this notebook, both for 2 models (image and audio classification):
* L1-magntiude unstructured pruning using `torch.torch.nn.utils.prune`.
* L1-magnitude structured pruning using `torch-pruning`.

# Setup

In [1]:
import numpy as np

from nnopt.model.eval import eval_model
from nnopt.model.prune import l1_unstructured_pruning, calculate_sparsity, l1_structured_pruning, prune_finetune_and_eval

from nnopt.recipes.mobilenetv2_cifar10 import get_mobilenetv2_cifar10_model, get_cifar10_datasets, save_mobilenetv2_cifar10_model, DEVICE, DTYPE

In [2]:
# MobilenetV2 CIFAR-10 model
mobilenetv2_cifar10_baseline = get_mobilenetv2_cifar10_model(version="baseline")

# CIFAR-10 datasets
cifar10_train_dataset, cifar10_val_dataset, cifar10_test_dataset = get_cifar10_datasets()

2025-06-10 15:34:43,429 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading MobileNetV2 model for CIFAR-10 from version: baseline at /home/pbeuran/repos/nnopt/models
2025-06-10 15:34:43,668 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading existing training and validation datasets...
2025-06-10 15:34:45,154 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading existing test dataset...


In [3]:
# Evaluate the adapted model on the validation set
mobilenetv2_cifar10_accuracy_baseline = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_test_dataset,
    batch_size=64,  # Adjust batch size as needed
    device=DEVICE,
    use_amp=True,
    dtype=DTYPE
)
print(f"Test accuracy of MobileNetV2 on CIFAR-10 (baseline): {mobilenetv2_cifar10_accuracy_baseline:.2f}")

2025-06-10 15:34:45,320 - nnopt.model.eval - INFO - Starting warmup for 5 batches...
[Warmup]: 100%|██████████| 5/5 [00:01<00:00,  4.15it/s]
2025-06-10 15:34:46,591 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [00:03<00:00, 41.66it/s, acc=0.9013, cpu=4.4%, gpu_mem=6.6/24.0GB (27.5%), gpu_util=38.0%, loss=0.1680, ram=9.0/30.9GB (34.5%), samples/s=630.1]  

Evaluation Complete: Avg Loss: 0.2823, Accuracy: 0.9013
Throughput: 8425.81 samples/sec | Avg Batch Time: 7.56 ms | Avg Sample Time: 0.12 ms
System Stats: CPU Usage: 11.50% | RAM Usage: 8.7/30.9GB (33.8%) | GPU 0 Util: 38.00% | GPU 0 Mem: 6.6/24.0GB (27.5%)
Test accuracy of MobileNetV2 on CIFAR-10 (baseline): 0.90





# L1 unstructured pruning

In [4]:
# Prune using L1 unstructured pruning, finetune, and evaluate for 0.7 pruning ratio
mobilenetv2_cifar10_07_pruned, mobilenetv2_cifar10_07_pruned_accuracy = prune_finetune_and_eval(
    model=get_mobilenetv2_cifar10_model(version="baseline"),
    train_dataset=cifar10_train_dataset,
    val_dataset=cifar10_val_dataset,
    test_dataset=cifar10_test_dataset,
    pruning_method="l1_unstructured_pruning",
    pruning_amount=0.7,
    batch_size=64,  # Adjust batch size as needed
    num_epochs=3,
    device=DEVICE,
    use_amp=True,
    dtype=DTYPE
)

2025-06-10 15:34:50,372 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading MobileNetV2 model for CIFAR-10 from version: baseline at /home/pbeuran/repos/nnopt/models
2025-06-10 15:34:50,460 - nnopt.model.prune - INFO - Starting pruning with method: l1_unstructured_pruning, amount: 0.70
2025-06-10 15:34:50,460 - nnopt.model.prune - INFO - Applying L1 unstructured pruning with amount: 0.70 for parameter 'weight' in layers: ['Linear', 'Conv2d']
2025-06-10 15:34:50,599 - nnopt.model.prune - INFO - Applied L1 unstructured pruning to 53 layers.
2025-06-10 15:34:50,599 - nnopt.model.prune - INFO - Evaluating the pruned model on the test dataset...
2025-06-10 15:34:50,618 - nnopt.model.eval - INFO - Starting warmup for 5 batches...
[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 13.50it/s]
2025-06-10 15:34:51,067 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [00:03<00:00, 41.91it/s, acc=0.1000, cpu=3.3%, gpu_mem=6.6/24.0GB (27.4%), gpu_util=40.0%, loss=2.342

Evaluation Complete: Avg Loss: 2.3339, Accuracy: 0.1000
Throughput: 7076.34 samples/sec | Avg Batch Time: 9.00 ms | Avg Sample Time: 0.14 ms
System Stats: CPU Usage: 12.50% | RAM Usage: 8.8/30.9GB (33.9%) | GPU 0 Util: 40.00% | GPU 0 Mem: 6.6/24.0GB (27.4%)


Epoch 1/3 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.34it/s, acc=0.5874, cpu=3.2%, gpu_mem=9.1/24.0GB (37.8%), gpu_util=67.0%, loss=1.1891, ram=9.2/30.9GB (35.4%), samples/s=123.6]  
Epoch 1/3 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 18.74it/s, acc=0.6548, cpu=3.0%, gpu_mem=9.1/24.0GB (37.8%), gpu_util=34.0%, loss=1.0651, ram=9.1/30.9GB (35.1%), samples/s=1043.3] 


Epoch 1/3, Train Loss: 1.1647, Train Acc: 0.5874, Train Throughput: 1867.10 samples/s | Val Loss: 0.9890, Val Acc: 0.6548, Val Throughput: 5955.01 samples/s | CPU Usage: 10.90% | RAM Usage: 8.9/30.9GB (34.4%) | GPU 0 Util: 34.00% | GPU 0 Mem: 9.1/24.0GB (37.8%)


Epoch 2/3 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.55it/s, acc=0.6780, cpu=4.0%, gpu_mem=9.1/24.0GB (37.8%), gpu_util=63.0%, loss=0.8514, ram=9.3/30.9GB (35.5%), samples/s=391.2]  
Epoch 2/3 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 18.33it/s, acc=0.7038, cpu=5.9%, gpu_mem=9.1/24.0GB (37.8%), gpu_util=32.0%, loss=1.2963, ram=9.3/30.9GB (35.5%), samples/s=1066.6] 


Epoch 2/3, Train Loss: 0.9184, Train Acc: 0.6780, Train Throughput: 1874.34 samples/s | Val Loss: 0.8364, Val Acc: 0.7038, Val Throughput: 6088.15 samples/s | CPU Usage: 11.00% | RAM Usage: 9.1/30.9GB (34.8%) | GPU 0 Util: 32.00% | GPU 0 Mem: 9.1/24.0GB (37.8%)


Epoch 3/3 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.34it/s, acc=0.7075, cpu=2.8%, gpu_mem=9.1/24.0GB (37.9%), gpu_util=68.0%, loss=1.4473, ram=9.2/30.9GB (35.2%), samples/s=393.4]  
Epoch 3/3 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 18.55it/s, acc=0.7194, cpu=3.0%, gpu_mem=9.1/24.0GB (37.9%), gpu_util=33.0%, loss=1.4356, ram=9.1/30.9GB (35.0%), samples/s=1031.3] 
2025-06-10 15:36:56,423 - nnopt.model.prune - INFO - Evaluating the pruned and finetuned model on the test dataset...
2025-06-10 15:36:56,426 - nnopt.model.eval - INFO - Starting warmup for 5 batches...


Epoch 3/3, Train Loss: 0.8296, Train Acc: 0.7075, Train Throughput: 1873.28 samples/s | Val Loss: 0.7939, Val Acc: 0.7194, Val Throughput: 5899.39 samples/s | CPU Usage: 11.40% | RAM Usage: 8.9/30.9GB (34.4%) | GPU 0 Util: 26.00% | GPU 0 Mem: 9.1/24.0GB (37.9%)


[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 14.02it/s]
2025-06-10 15:36:56,867 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [00:03<00:00, 41.91it/s, acc=0.8507, cpu=2.9%, gpu_mem=9.0/24.0GB (37.7%), gpu_util=39.0%, loss=0.3635, ram=9.1/30.9GB (35.0%), samples/s=2034.5] 
2025-06-10 15:37:00,618 - nnopt.model.prune - INFO - Making pruning permanent by removing reparameterization...
2025-06-10 15:37:00,621 - nnopt.model.prune - INFO - Made pruning permanent for 53 layers.
2025-06-10 15:37:00,622 - nnopt.model.prune - INFO - Removed pruning reparameterization from the pruned finetuned model.


Evaluation Complete: Avg Loss: 0.4270, Accuracy: 0.8507
Throughput: 6761.81 samples/sec | Avg Batch Time: 9.42 ms | Avg Sample Time: 0.15 ms
System Stats: CPU Usage: 12.00% | RAM Usage: 8.9/30.9GB (34.4%) | GPU 0 Util: 39.00% | GPU 0 Mem: 9.0/24.0GB (37.7%)


In [5]:
# Prune using L1 unstructured pruning, finetune, and evaluate for 0.9 pruning ratio
mobilenetv2_cifar10_09_pruned, mobilenetv2_cifar10_09_pruned_accuracy = prune_finetune_and_eval(
    model=get_mobilenetv2_cifar10_model(version="baseline"),
    train_dataset=cifar10_train_dataset,
    val_dataset=cifar10_val_dataset,
    test_dataset=cifar10_test_dataset,
    pruning_method="l1_unstructured_pruning",
    pruning_amount=0.9,
    batch_size=64,  # Adjust batch size as needed
    num_epochs=3,
    device=DEVICE,
    use_amp=True,
    dtype=DTYPE
)

2025-06-10 15:37:00,630 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading MobileNetV2 model for CIFAR-10 from version: baseline at /home/pbeuran/repos/nnopt/models
2025-06-10 15:37:00,720 - nnopt.model.prune - INFO - Starting pruning with method: l1_unstructured_pruning, amount: 0.90
2025-06-10 15:37:00,720 - nnopt.model.prune - INFO - Applying L1 unstructured pruning with amount: 0.90 for parameter 'weight' in layers: ['Linear', 'Conv2d']
2025-06-10 15:37:00,886 - nnopt.model.prune - INFO - Applied L1 unstructured pruning to 53 layers.
2025-06-10 15:37:00,886 - nnopt.model.prune - INFO - Evaluating the pruned model on the test dataset...
2025-06-10 15:37:00,905 - nnopt.model.eval - INFO - Starting warmup for 5 batches...
[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 12.76it/s]
2025-06-10 15:37:01,387 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [00:03<00:00, 41.77it/s, acc=0.1000, cpu=2.9%, gpu_mem=9.1/24.0GB (37.7%), gpu_util=40.0%, loss=2.422

Evaluation Complete: Avg Loss: 2.3474, Accuracy: 0.1000
Throughput: 7079.22 samples/sec | Avg Batch Time: 9.00 ms | Avg Sample Time: 0.14 ms
System Stats: CPU Usage: 10.20% | RAM Usage: 9.0/30.9GB (34.5%) | GPU 0 Util: 40.00% | GPU 0 Mem: 9.1/24.0GB (37.7%)


Epoch 1/3 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.50it/s, acc=0.2729, cpu=3.7%, gpu_mem=9.1/24.0GB (37.8%), gpu_util=64.0%, loss=2.1162, ram=9.2/30.9GB (35.4%), samples/s=371.0]  
Epoch 1/3 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 18.94it/s, acc=0.0914, cpu=3.1%, gpu_mem=9.1/24.0GB (37.8%), gpu_util=30.0%, loss=6.2295, ram=9.2/30.9GB (35.1%), samples/s=1062.9] 


Epoch 1/3, Train Loss: 1.9450, Train Acc: 0.2729, Train Throughput: 1891.24 samples/s | Val Loss: 7.6754, Val Acc: 0.0914, Val Throughput: 6145.71 samples/s | CPU Usage: 11.70% | RAM Usage: 9.0/30.9GB (34.5%) | GPU 0 Util: 27.00% | GPU 0 Mem: 9.1/24.0GB (37.8%)


Epoch 2/3 [Training]: 100%|██████████| 704/704 [00:35<00:00, 19.67it/s, acc=0.3773, cpu=6.8%, gpu_mem=9.1/24.0GB (37.8%), gpu_util=64.0%, loss=1.4583, ram=9.2/30.9GB (35.2%), samples/s=408.6]  
Epoch 2/3 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 18.64it/s, acc=0.1088, cpu=3.0%, gpu_mem=9.1/24.0GB (37.8%), gpu_util=33.0%, loss=3.6367, ram=9.2/30.9GB (35.2%), samples/s=1017.9] 


Epoch 2/3, Train Loss: 1.7029, Train Acc: 0.3773, Train Throughput: 1884.33 samples/s | Val Loss: 3.1028, Val Acc: 0.1088, Val Throughput: 5987.50 samples/s | CPU Usage: 10.50% | RAM Usage: 9.0/30.9GB (34.5%) | GPU 0 Util: 33.00% | GPU 0 Mem: 9.1/24.0GB (37.8%)


Epoch 3/3 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.48it/s, acc=0.4370, cpu=4.2%, gpu_mem=9.1/24.0GB (37.8%), gpu_util=66.0%, loss=1.6309, ram=9.2/30.9GB (35.2%), samples/s=413.1]  
Epoch 3/3 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 18.32it/s, acc=0.0948, cpu=2.9%, gpu_mem=9.1/24.0GB (37.8%), gpu_util=33.0%, loss=5.0449, ram=9.2/30.9GB (35.2%), samples/s=992.5]  
2025-06-10 15:39:05,946 - nnopt.model.prune - INFO - Evaluating the pruned and finetuned model on the test dataset...
2025-06-10 15:39:05,948 - nnopt.model.eval - INFO - Starting warmup for 5 batches...


Epoch 3/3, Train Loss: 1.5628, Train Acc: 0.4370, Train Throughput: 1877.92 samples/s | Val Loss: 4.5547, Val Acc: 0.0948, Val Throughput: 5895.16 samples/s | CPU Usage: 12.80% | RAM Usage: 9.0/30.9GB (34.5%) | GPU 0 Util: 33.00% | GPU 0 Mem: 9.1/24.0GB (37.8%)


[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 12.54it/s]
2025-06-10 15:39:06,451 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [00:03<00:00, 40.86it/s, acc=0.1066, cpu=0.0%, gpu_mem=9.1/24.0GB (37.8%), gpu_util=45.0%, loss=5.1304, ram=9.2/30.9GB (35.4%), samples/s=2060.8] 
2025-06-10 15:39:10,298 - nnopt.model.prune - INFO - Making pruning permanent by removing reparameterization...
2025-06-10 15:39:10,301 - nnopt.model.prune - INFO - Made pruning permanent for 53 layers.
2025-06-10 15:39:10,301 - nnopt.model.prune - INFO - Removed pruning reparameterization from the pruned finetuned model.


Evaluation Complete: Avg Loss: 4.4006, Accuracy: 0.1066
Throughput: 6510.83 samples/sec | Avg Batch Time: 9.78 ms | Avg Sample Time: 0.15 ms
System Stats: CPU Usage: 11.20% | RAM Usage: 9.0/30.9GB (34.5%) | GPU 0 Util: 45.00% | GPU 0 Mem: 9.1/24.0GB (37.8%)


In [6]:
print(f"Test accuracy of MobileNetV2 on CIFAR-10 (0.7 pruning): {mobilenetv2_cifar10_07_pruned_accuracy:.2f}")
print(f"Test accuracy of MobileNetV2 on CIFAR-10 (0.9 pruning): {mobilenetv2_cifar10_09_pruned_accuracy:.2f}")

Test accuracy of MobileNetV2 on CIFAR-10 (0.7 pruning): 0.85
Test accuracy of MobileNetV2 on CIFAR-10 (0.9 pruning): 0.11
