# Introduction to Model Efficiency

### Why Model Efficiency Matters

In the context of AI, efficiency is not just about speed; it's about making AI accessible and practical for everyday applications. The size and complexity of deep learning models have grown tremendously. While this leads to improved performance, it creates challenges for deployment, especially on mobile devices, embedded systems, or any setting where computational resources or power are limited. Model efficiency techniques aim to address these challenges without sacrificing accuracy.

Growing model complexity poses challenges:
- Storage: Large models consume more storage space.
- Inference Speed: Complex models take longer to process inputs.
- Energy Consumption: Computationally demanding models drain batteries quickly.
- Deployment: Resource-constrained devices (smartphones, IoT) struggle with large models.

## 1. Pruning
Pruning identifies and removes the less important weights of a neural network that have minimal impact on its output. This creates a sparser and more streamlined model. 
Unstructured pruning zero-out the least important weights, while structured pruning can be more aggressive by removing full sections of the network.

Benefits:
- Reduces model size
- Can improve inference speed

<img src="./imgs/pruning.webp" alt="drawing" width="450"/>

PyTorch includes functionality for both structured and unstructured pruning. Here, we'll show an example of unstructured pruning, which removes individual weights in the model. This example demonstrates how to randomly prune 30% of the connections in the first linear layer of the network by setting their weights to zero.

In [1]:
import torch
import torch.nn as nn
import torch.nn.utils.prune as prune

# Define a simple model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(128, 64)
        self.fc2 = nn.Linear(64, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = Net()
print("Original model:", model)

# Apply pruning to the first layer
prune.random_unstructured(model.fc1, name='weight', amount=0.3)

# Check the pruned model
print("Pruned model:", model)


Original model: Net(
  (fc1): Linear(in_features=128, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=10, bias=True)
)
Pruned model: Net(
  (fc1): Linear(in_features=128, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=10, bias=True)
)


## 2. Quantization
Quantization simplifies the model's mathematical operations, converting those high-precision calculations into something more manageable and, crucially, faster. It  reduces the storage footprint of models by using less precise data types such as representing model weights and activations using lower-precision numbers (e.g., 8-bit integers instead of 32-bit floating-point)

Benefits:
- Reduces model size
- Accelerates computation

<img src="./imgs/quantization.jpeg" alt="drawing" width="600"/>

Below is an example of quantization on the same simple model as before. This code dynamically quantizes the linear layers of the model to int8 precision, which is particularly useful for reducing model size and speeding up inference for AI applications.

In [6]:
import torch.quantization

import torch
torch.backends.quantized.engine = 'qnnpack'  # For ARM architectures


# Define a simple model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(128, 64)
        self.fc2 = nn.Linear(64, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = Net()
model.eval()  # Set the model to evaluation mode

# Specify the model and the sample input size for dynamic quantization
quantized_model = torch.quantization.quantize_dynamic(
    model, {nn.Linear}, dtype=torch.qint8
)

print("Quantized model:", quantized_model)


Quantized model: Net(
  (fc1): DynamicQuantizedLinear(in_features=128, out_features=64, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
  (fc2): DynamicQuantizedLinear(in_features=64, out_features=10, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
)


## 3. Knowledge Distillation
Knowledge distillation involves training a smaller (student) model to replicate the behavior of a larger (teacher) model. The teacher model produces "soft labels" (probabilistic outputs). Student model is trained to match the soft labels, not just the original dataset's hard labels.

Benefits:
- Compresses knowledge into a smaller, more efficient model
- Potential for higher accuracy than training the student directly on the dataset

<img src="./imgs/kd.jpeg" alt="drawing" width="600"/>

Below is a simplified example of how to set this up in PyTorch. In this example, we use a pretrained ResNet18 model from torchvision as the teacher model, demonstrating knowledge distillation to a simpler student model on the CIFAR-10 dataset. The ResNet18 model is pretrained on ImageNet, so we'll adapt it to work with CIFAR-10.

The teacher model is set to evaluation mode to ensure it does not update its weights during training.
The loss function (CrossEntropyLoss) and optimizer (SGD with learning rate 0.001 and momentum 0.9) are defined for training the student model.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torchvision.models import resnet18
from torch.utils.data import DataLoader

# Load CIFAR-10 dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=64, shuffle=True)

# Define a simpler student model
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 8, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc = nn.Linear(8 * 16 * 16, 10)

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = x.view(-1, 8 * 16 * 16)
        x = self.fc(x)
        return x

# Adjusting the ResNet model to CIFAR-10's resolution
class ResNetForCIFAR10(nn.Module):
    def __init__(self, pretrained=True):
        super(ResNetForCIFAR10, self).__init__()
        original_model = resnet18(pretrained=pretrained)
        self.features = nn.Sequential(*list(original_model.children())[:-2])
        self.adapted_avg_pool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Linear(512, 10)

    def forward(self, x):
        x = self.features(x)
        x = self.adapted_avg_pool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

teacher_model = ResNetForCIFAR10()
teacher_model.eval()  # Set the teacher model to evaluation mode

student_model = SimpleCNN()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(student_model.parameters(), lr=0.001, momentum=0.9)

# Training loop
for epoch in range(2):  # Iterate over 2 epochs
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        if i >= 100:  # Limit the number of batches to speed up training
            break
        inputs, labels = data

        optimizer.zero_grad()
        outputs_student = student_model(inputs)
        with torch.no_grad():
            outputs_teacher = teacher_model(inputs)

        loss = criterion(outputs_student, labels) + nn.KLDivLoss()(nn.functional.log_softmax(outputs_student, dim=1),
                                                                   nn.functional.softmax(outputs_teacher, dim=1))
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 50 == 49:  # Print every 50 mini-batches
            print(f'Epoch: {epoch + 1}, Batch: {i + 1}, Loss: {running_loss / 50}')
            running_loss = 0.0

print('Finished Training Student Model')


Files already downloaded and verified
Epoch: 1, Batch: 50, Loss: 2.2586119604110717
Epoch: 1, Batch: 100, Loss: 2.161914668083191
Epoch: 2, Batch: 50, Loss: 2.073851742744446
Epoch: 2, Batch: 100, Loss: 2.0345424127578737
Finished Training Student Model
