<h1 align="center"><b>AI Lab: Computer Vision and NLP</b></h1>
<h3 align="center">Lessons 19-20: Convolutional Neural Networks</h3>

---

**C**onvolutional **N**eural **N**etworks (**CNN**s) are a specific type of neural networks that are composed by convolution layers and pooling layers. At the end of the CNN we still have a MLP. The purpose of the CNN is to extract features from the images (in fact, CNNs are usually used with images).

What are the pooling layers? They are layers which resize (usually downsize) the image [RECOVER ALL THEORY]

CNNs can be implemented with `pytorch` in the following way. First, we must import the packages:

In [2]:
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch import nn
import torchmetrics
import matplotlib.pyplot as plt

Then, import the dataset and detect if there is an nVidia GPU:

In [3]:
training_data = datasets.FashionMNIST(
    root="pytorch_datasets",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="pytorch_datasets",
    train=False,
    download=True,
    transform=ToTensor()
)

device = "cuda" if torch.cuda.is_available() else "cpu"

Now, we can proceed to create our CNN:

In [4]:
class OurCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.cnn = nn.Sequential(
            nn.Conv2d(1, 5, 3), # 1 is the input (because it's a gray-scale image),
                                # 5 is the output, 3 is the kernel size
            nn.ReLU(),
            nn.Conv2d(5, 10, 3),
            nn.ReLU()
        )
        self.mlp = nn.Sequential(
            nn.Linear(24 * 24 * 10, 10),
            nn.ReLU(),
            nn.Linear(10, 10)
        )

    def forward(self, x, debug=False):
        x = self.cnn(x)
        if debug: print(x.shape)
        x = torch.flatten(x, 1)
        if debug: print(x.shape)

Why do we need to compute the size of the feature map? We always have to remember that MLPs must have the layers coded in such a way that the output size of a layer equals the input layer of the next layer. The same goes for the `Conv2D()` layer. If we notice, the first `Conv2D()` layer has 1 as input size and 5 as output size, and the second layer has 5 as input size and 10 as output size.

In [5]:
model = OurCNN().to(device)

epochs = 2
batch_size = 16
learning_rate = 0.0001

# Loss function
loss_fn = nn.CrossEntropyLoss()

# Optimizer. We can use either SGD or AdamW, but AdamW is more performant than SGD
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

Before starting to train, let's try to test the model with a random tensor. Differently from `OpenCV` and `numpy`, in `PyTorch` a tensor is defined as follows:

$$
\text{tensor} \; = \; \begin{pmatrix}
    \text{number of channels}, & \text{width}, & \text{height}
\end{pmatrix}
$$

In [6]:
test_x = torch.rand((1, 28, 28))
test_y = model(test_x)

torch.Size([10, 24, 24])


Whenever there must be a training of images, there usually is a fourth parameter:

$$
\text{tensor} \; = \; \begin{pmatrix}
    \colorbox{#645e0d}{\text{batch size}}, & \text{number of channels}, & \text{width}, & \text{height}
\end{pmatrix}
$$

Let's now define training and testing loop:

In [9]:
metric = torchmetrics.Accuracy(task="multiclass", num_classes=10)

def train_loop(dataloader, model, loss_fn, optimizer, epoch=None, debug=True):
    """Trains an epoch of the model
    
    Parameters:
        - `dataloader`: the dataloader of the dataset
        - `model`: the model used
        - `loss_fn`: the loss function of the model
        - `optimizer`: the optimizer
        - `epoch`: the index of the epoch
    """
    size = len(dataloader)

    # Get the batch from the dataset
    for batch, (x, y) in enumerate(dataloader):
        # Move data to the device used
        x = x.to(device)
        y = y.to(device)

        # Compute the prediction and the loss
        pred = model(x)
        loss = loss_fn(pred, y)

        # Adjust the weights
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        # Print some information
        if batch % 20 == 0:
            loss_value, current_batch = loss.item(), (batch + 1) * len(x)
            if debug: print(f"→ Loss: {loss_value} [Batch {current_batch}/{size}, Epoch {epoch}/{epochs}]")
            accuracy = metric(pred, y)
            if debug: print(f"Accuracy of batch {current_batch}/{size}: {accuracy}")
        
    accuracy = metric.compute()
    print(f"=== The epoch {epoch}/{epochs} has finished training ===")
    if debug: print(f"→ Final accuracy of the epoch: {accuracy}")
    metric.reset()

def test_loop(dataloader, model, loss_fn, debug=True):
    size = len(dataloader)

    # Remove the Dropout and Batch normalization layers
    model.eval()

    # Disable the updating of the weights
    with torch.no_grad():
        for index, (x, y) in enumerate(dataloader):
            # Move the data to the device used for testing
            x = x.to(device)
            y = y.to(device)

            # Get the model prediction
            pred = model(x)

            # Get the accuracy score
            acc = metric(pred, y)
            if debug: print(f"→ Accuracy for image {index}: {acc}")
    acc = metric.compute()
    print(f"===    The testing loop has finished    ===")
    if debug: print(f"→ Final testing accuracy of the model: {acc}")
    metric.reset()

We can now train the model:

In [10]:
for epoch_ind in range(epochs):
    train_loop(training_data, model, loss_fn, optimizer, epoch_ind, debug=False)
    test_loop(test_data, model, loss_fn, debug=False)

print("=== The training has finished ===")

AttributeError: 'int' object has no attribute 'to'