# Lab 5: **Invariance** and **Equivariance** at different layers of a *CNN*

Advanced Topics in Machine Learning -- Spring 2023, UniTS

<a target="_blank" href="https://colab.research.google.com/github/ganselmif/adv-ml-units/blob/main/notebooks/AdvML_UniTS_2023_Lab_05_CNN_Invariance_Equivariance.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/></a>

#### Overview of the *Lab*

In the following *Lab*, we will study the *invariance* and *equivariance* properties of specific layers within a *CNN*.

Recall the definitions -- respectively -- of **invariance** and **equivariance** of (the result of) function $f$ with respect to transformation (expressed in the form of an operator) $P_{\alpha}$ parametrized by $\alpha$:

- *Invariance*: $f(P_{\alpha} x) = f(x)\;\;\;\; \forall\alpha$
- *Equivariance*: $f(P_{\alpha} x) = P_{\alpha} f(x)\;\;\;\; \forall\alpha$

According to theory, the training of a *CNN* with pooling should lead to a network whose:

- *Convolutional* layers are *equivariant* to traslation;
- *Fully Connected* layers are *invariant* to traslation.

Due to the specific structure of convolutional layers, it is possible to show that the *equivariance* property gives rise to permuted activations in response to translation of inputs.

In [4]:
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader

from torchvision import datasets
from torchvision import transforms


The following *CNN* model is given, whose output -- for your convenience -- is a tuple, composed of the actual output of the network, the activation tensor after the second *convolutional* layer, and the activation tensor after the first *fully-connected* layer:

In [2]:
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.max_pool2d(x, 2)
        x = F.relu(x)
        x = self.conv2(x)
        conv2repr = x.clone().detach()
        x = F.max_pool2d(x, 2)
        x = F.relu(x)
        x = x.view(-1, 320)
        x = self.fc1(x)
        fc1repr = x.clone().detach()
        x = F.relu(x)
        x = self.fc2(x)
        x = F.log_softmax(x, dim=1)
        return x, conv2repr, fc1repr

Taking inspiration from previous *Labs*:

1. Train the model on the (non-augmented) *MNIST* dataset;
2. Prepare a test dataset composed of pairs of mutually traslated images;
3. Extract the activations of layers `conv2` and `fc1` and check whether they respect the invariance/equivariance property.

**Hint**: To test for *equivariance*, it may be useful to notice that **sorting** is invariant to permutations!

### 1. Train the model on the (non-augmented) *MNIST* dataset

In [5]:
# Hyperparameters
BATCH_SIZE = 64
EPOCHS = 2
LEARNING_RATE = 0.01
MOMENTUM = 0.0
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [6]:
# Defining transforms
augmentation = transforms.RandomAffine(degrees=(0, 180), translate=None, scale=None)
to_tensor = transforms.ToTensor()
normalization = transforms.Normalize(mean=0.1307, std=0.3081)

# Loading the datasets
train_dataset = datasets.MNIST(
    root="./data",
    train=True,
    transform=to_tensor,
    download=True,
)
test_dataset = datasets.MNIST(
    root="./data",
    train=False,
    transform=to_tensor,
    download=True,
)

train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=BATCH_SIZE, shuffle=False)

In [None]:
model = Model().to(device)

In [None]:
num_epochs = 5  # Number of times the whole (training) dataset is used for training
learning_rate = 0.001

criterion = nn.CrossEntropyLoss()  # Loss function
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [None]:
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs, _, _ = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i + 1) % 100 == 0:
            print(
                f"Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{len(train_loader)}], Loss: {loss.item():.4f}"
            )

In [None]:
total = 0
correct = 0
for i, (images,labels) in enumerate(test_loader):
    images = images.to(device)
    labels = labels.to(device)
    outputs, _, _ = model(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum().item()
acc = 100 * correct / total

### 2. Prepare a test dataset composed of pairs of mutually traslated images;

In [None]:
# Defining transforms
augmentation = transforms.RandomAffine(degrees=(30, 70), translate=(0.1, 0.3), scale=None)
to_tensor = transforms.ToTensor()
normalization = transforms.Normalize(mean=0.1307, std=0.3081)

# Loading the datasets
train_dataset = datasets.MNIST(
    root="./data",
    train=True,
    transform=transforms.Compose([augmentation, to_tensor, normalization])
    download=True,
)
test_dataset = datasets.MNIST(
    root="./data",
    train=False,
    transform=transforms.Compose([augmentation, to_tensor, normalization])
    download=True,
)

train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=BATCH_SIZE, shuffle=False)