<a href="https://colab.research.google.com/github/Dorijan9/DL_CW_1/blob/master/02_CNN_Introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Convolutional Neural Networks

This tutorial addresses the basic concepts regarding Convolutional Neural Networks and their implementation using the PyTorch framework.
Convolutional Neural Networks (CNNs) are a class of feed-forward artificial neural architecture. They are applied to analyse visual 2D imagery, meaning that we can feed images directly into a CNN without the need to flatten them into a 1D vector as done in the previous tutorial.
CNNs have revolutionised the field of computer vision in the last decade. In 2012 Alex Krizhevsky introduced the AlexNet architecture to win the ImageNet Challenge (one of the most important competitions on image classification within the Computer Vision community), by reducing the top-5 error more than 10 percentage points, which was an incredible improvement at that time. As of now, CNNs are used not only on image classification but in many other computer vision tasks.

![](https://cdn-5f733ed3c1ac190fbc56ef88.closte.com/wp-content/uploads/2017/03/alexnet_small-1.png)

The image above is from [cv-tricks' blog](https://cv-tricks.com/cnn/understand-resnet-alexnet-vgg-inception/) and shows the proposed AlexNet architecture. It is composed of 5 convolutional layers followed by 3 fully connected layers. Nowadays, we can find much deeper and more complex architectures, which outperform AlexNet on the ImageNet Challenge.







In [None]:
!pip install torchinfo
import random

import cv2
import matplotlib.pyplot as plt
import numpy as np
import torch
from torch import nn
from torch.nn import functional
from torch import optim
from torch.utils import data
import torchinfo
from torchvision import datasets, transforms

In [None]:
# Utility function to control randomness for reproducibility
def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

## CNN Structure

The basic pipeline of common CNNs consists of an image as input and a stack of convolutional layers that extract a feature representation from the input image. The final shape of the image representation is conditioned on the type of problem/task that the architecture is facing. For instance, the output of the last layer in a classification problem is a probability vector. Each dimension of the probability vector represents how likely is that the input image belongs to a specific class. However, the architecture design is up to us, and therefore, we could code a network that outputs a single value for regression problems, or that generates a new image map for semantic segmentation. Now, let's dig in a bit into CNN and introduce some layers that are widely used.

### 2D Convolutional Layer

The most common layer in any CNN architecture is the 2D convolutional layer. Convolutional layers are specifically designed to extract features from images or even extract features from previously extracted features. As shown in the following illustration, 2D convolutions apply the same filter to the full image. And therefore, due to its nature, 2D convolutional filters exploit the local information presented in images, making them a powerful tool for image analysis.

![](https://cdn-images-1.medium.com/max/800/1*Fw-ehcNBR9byHtho-Rxbtw.gif)

Image [source](https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1).

The latest deep learning frameworks have made possible the integration of convolutional layers easily on our architectures with only a single line of code. We will address here how 2D convolutions work since full understanding is needed to comprehend how any CNN operates. The following images and some explanations can be further explored on the original [Irhum Shafkat's blog](https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1) or in the [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/) book. Both are strongly recommended.

To understand 2D convolutions, we need to define first what a kernel is. Kernels are simply matrices of numbers. The numbers on the kernels are the so-called weights, and the weights on the kernels change as we train the network. Hence, network training aims to optimise the network's parameters (update the kernel weights) such that the cost function is minimised.

The 2D convolution operation takes the network's kernels and "slides" them over the input image (alike to a sliding window) as in the following image from [PyImageSearch](https://www.pyimagesearch.com/2015/03/23/sliding-windows-for-object-detection-with-python-and-opencv/) blog:

![](https://pyimagesearch.com/wp-content/uploads/2014/10/sliding_window_example.gif)

In each step, the network performs an element-wise multiplication with the elements that are currently on. The results of this elementwise multiplication are added to obtain the output value of the operation. CNNs repeat previous step for all the positions of the sliding window, composing at the end the feature map. This generated feature map can go through another 2D convolutional layer and create more powerful features.

>

![](https://cdn-images-1.medium.com/max/800/1*Zx-ZMLKab7VOCQTxdZ1OAw.gif)

>
The previous image shows the 2D convolution operation. Thus, the new feature values are the weighted sum of all the elements in the sliding window after the elementwise multiplication between input and kernel.  The bigger the size of the kernel is, the more feature elements contribute to the final output value. In contrast to fully connected layers, where a new feature value is a weighted sum over **all** input values, as mentioned, 2D convolutions compute features based on local areas. In other words, instead of looking at every input component, they consider only features coming from close locations.

In the above example, the input image on the left has a size of 5x5 and the dimension of the resulting feature map is 3x3, showing that the size of the output maps is not always equal to the input. Hence, the output size can be computed by doing:

$O = W - K + 1$,

where $O$ is the output height/length, $W$ is the input height/length and $K$ is the kernel size. The output size is not only conditioned on the input size but also on the kernel size. Check in the following code cell how the output feature map shape changes as you increase the kernel size. In Pytorch, we define the layer by using `Conv2D` from `torch.nn` (documentation [here](https://docs.pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)).









In [None]:
# Define a simple model with a single 2D convolutional layer
class SimpleConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3)

    def forward(self, x):
        return self.conv(x)

# Instantiate and run the model
input = torch.randn((1, 1, 100, 100))  # (batch_size, num_channels, height, width)
model = SimpleConvNet()
output = model(input)

# Compare input and output size
print('Input size: ({:}, {:}, {:})'.format(input.shape[1], input.shape[2], input.shape[3]))
print('Output size: ({:}, {:}, {:})'.format(output.shape[1], output.shape[2], output.shape[3]))

Moreover, the kernel size and the input size are not the only parameters affecting the output size. We are going to introduce two extra elements that change the size of the output map: the padding and the stride.

### Adding Padding to Input Features

In some tasks, such as [image translation](https://arxiv.org/pdf/1611.07004.pdf), we need the output size to be equal to the input size. The solution to that is using padding, where extra edges are added to the input features so that the dimension is not reduced after the convolutional layer. Normally those pixels have $0$ value (termed zero-padding), but depending on the application other methods could be used, e.g., reflection or symmetric padding.

![](https://cdn-images-1.medium.com/max/800/1*1okwhewf5KCtIPaFib4XaA.gif)

If padding is used, the new output size can be computed by doing:

$O = W - K + 2P+ 1$,

where $P$ is the padding value. $P$ must be set in concordance with the kernel size if dimensionality wants to be preserved. Padding can be added to the `Conv2d` layer by using the padding argument.

In [None]:
# Define model with 'same' padding
class SamePadConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding='same')

    def forward(self, x):
        return self.conv(x)

# Instantiate and run the model
input = torch.randn((1, 1, 100, 100))  # (batch_size, num_channels, height, width)
model = SamePadConvNet()
output = model(input)

# Compare input and output size
print('Input size: ({:}, {:}, {:})'.format(input.shape[1], input.shape[2], input.shape[3]))
print('Output size: ({:}, {:}, {:})'.format(output.shape[1], output.shape[2], output.shape[3]))

### Using Stride in Convolutional Layers

The stride operation allows the convolutional layers to skip some of the sliding windows explained above. Hence, instead of jumping one pixel apart, we can define the number of skipped elements before computing the weighting sum between the kernel's weights and input features. A stride of 1 means that features will be extracted from all windows a pixel apart, so basically, every single window is computed. A stride of 2 means that we are selecting windows 2 pixels apart, skipping every other window in the process. Strides reduce the number of computations and consequently the size of the output map. In practice, as we go deeper into the CNN, the spatial size of the feature map gets smaller while the number of channels increases. Moreover, we can further reduce the size of the feature map using pooling operations, which we introduce later in this tutorial.

![](https://cdn-images-1.medium.com/max/800/1*BMngs93_rm2_BpJFH2mS0Q.gif)

If strides are used, the new output size can be computed as:

$O = \dfrac{W - K + 2P}{S}+ 1$,

where $S$ is the stride value. The stride is set in the layer by using the `strides` argument.

In [None]:
# Define model with 'same' padding and stride=2
class StridedSamePadConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(
            in_channels=1,
            out_channels=1,
            kernel_size=3,
            stride=2,
            padding=1,  # 'same' padding for 3x3 kernel with stride 2
        )

    def forward(self, x):
        return self.conv(x)

# Instantiate and run the model
input = torch.randn((1, 1, 100, 100))  # (batch_size, num_channels, height, width)
model = StridedSamePadConvNet()
output = model(input)

# Compare input and output size
print('Input size: ({:}, {:}, {:})'.format(input.shape[1], input.shape[2], input.shape[3]))
print('Output size: ({:}, {:}, {:})'.format(output.shape[1], output.shape[2], output.shape[3]))

## Differences Between Kernel and Filters

The examples above take as input a single-channel image and compute a feature map with also one channel. However, when dealing with RGB images or feature maps, the input is no longer a single-channel map but, instead, they can have multiple channels. In the case of an RGB image, for each 2D convolution, we will need to define 3 kernels to interact with each of the image's channel colours. This group of kernels is called a filter. Thus, a filter is a collection of kernels that produces a single output.

As a regular practice when defining Deep Learning models, we increase the number of filters in each convolutional layer as we go deeper into the model. Due to the element-wise multiplication, the number of kernels on each filter must be the same that the number of channels in the input feature map.

The next figure shows how the convolution is performed when having three input channels. First, one filter uses its three independent kernels to convolve with the RGB channels of the input image:

![](https://cdn-images-1.medium.com/max/1000/1*8dx6nxpUh2JqvYWPadTwMQ.gif)

Next, each of the processed feature maps is added together to obtain a single channel:

![](https://cdn-images-1.medium.com/max/1000/1*CYB2dyR3EhFs1xNLK8ewiA.gif)

Finally, we add the bias term to obtain the feature map. There is a single bias for the full output channel map. This operation is repeated for all the filters inside the convolutional layer.

Now, we show how to use a `Conv2D` layer that takes an input image with 3 channels and generates an output map with 32 channels.


In [None]:
# Define the PyTorch model
class StridedConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(
            in_channels=3,        # 3 input channels (RGB)
            out_channels=32,      # 32 filters
            kernel_size=3,
            stride=2,
            padding=1             # 'same' padding for 3x3 kernel with stride 2
        )

    def forward(self, x):
        return self.conv(x)

# Instantiate and run the model
input = torch.randn((1, 3, 100, 100))  # (batch_size, num_channels, height, width)
model = StridedConvNet()
output = model(input)

# Compare input and output size
print('Input size: ({:}, {:}, {:})'.format(input.shape[1], input.shape[2], input.shape[3]))
print('Output size: ({:}, {:}, {:})'.format(output.shape[1], output.shape[2], output.shape[3]))

## Activation Functions

As seen in previous tutorials, after a `Linear` layer we usually can find an activation function. We introduce here how to use them after `Conv2D` layers. Those activation functions are a set of operators that maps the feature values to a new set of values, the mapping depends on the function at hand. The main reason for using activation functions is that they add non-linearities to the network, giving more expressive power to the network, which will be able to reproduce more complex functions.

*  **Sigmoid Function** sets the output in the range (0, 1). The sigmoid function is widely used in binary classification problems since its output can be taken as a probability value. `nn.Sigmoid(x)`:

>![](https://i.ibb.co/Ph8dsTv/sigmoid.png)

*  **Tanh Function** is a logistic function as sigmoid, but the range of the tanh function is (-1, 1). Contrary to sigmoid function, where the values close to 0 are set around 0.5, in the tanh function they will be still mapped around the 0 value. `nn.Tanh(x)`:

>![](https://i.ibb.co/68g7LpL/tanh.png)

*  **ReLU Function** is the most common activation function you can find in any current CNN as in general works better than the rest. The range of this function is in \[0, inf). It sets all negative values to 0 and hence is computationally easy to implement. As a drawback, during training some neurons *die*, meaning that the output is 0 for all available data points and no gradient is propagated there. `nn.ReLU(x)`:

>![](https://i.ibb.co/Zd9H8Z4/relu.png)

*  **LeakyReLU Function** is a modified version of the ReLU activation above, which attempts to solve the problem of dying neurons that ReLU has. While ReLU does not backpropagate negative values, Leaky ReLU smooths those values without setting them to 0. That allows the gradients to backpropagate through the network even for negative values. `nn.LeakyReLU(negative_slope=0.3)`:

>![](https://i.ibb.co/dmnJ6h1/leakyrelu.png)

*  **Softmax Function** is another widely activation function for multi-class classification problems and usually is employed as the last activation function in the classification model. This function sets all of the output elements to the range (0, 1). However, the softmax function does not take independently the input values to map it into its probability value. Softmax Function takes an un-normalized vector, $s$, and normalizes it into a probability distribution, $p$, following the softmax expression. As the output is a probability, the output elements add up to 1. `nn.Softmax(x, axis=-1)`. Thus, the output value $p_i$ is computed as:

> $p_{i} = \dfrac{e^{s_i}}{\sum_{\substack{j}}^{N} e^{s_j}}$

The following example shows the feature maps before and after of the ReLU activation function. All values that are negative are set to 0 after the activation function.




In [None]:
# Define model
class ConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(
            in_channels=1,
            out_channels=1,
            kernel_size=3,
            stride=1,
            padding='same',
        )
        self.relu = nn.ReLU()

    def forward(self, x):
        conv_out = self.conv(x)
        relu_out = self.relu(conv_out)
        return conv_out, relu_out

# Instantiate model
input = torch.randn((1, 1, 3, 3))
model = ConvNet()
output_conv, output_relu = model(input)

# Print results
print('Output Network without activation function')
print(output_conv[0, 0])  # print only the single channel

print('\nOutput Network after ReLU activation function')
print(output_relu[0, 0])

## Pooling Layer

It is a common practice to insert a pooling layer between convolutional layers in CNNs. In a standard CNN architecture, we set the feature sizes to become smaller progressively to reduce the computation in the networks, and to merge the information from different spatial locations. To reduce the feature map sizes, we can either use bigger stride size in the convolutional layers or we can use pooling layers. Pooling layers perform a spatial sliding window and apply an operation to reduce the spatial size. Those operations vary depending on the architecture, being the max, mean and min pooling the most typical ones. Here, we will explain the max pooling, although all the others work similarly. Max pooling keeps only the max value in a neighbourhood, where the neighbourhood is defined by the size of the kernel. Let's visualise it, the next example shows the result of a Max Pooling layer with a 2x2 kernel and a stride of 2.

![](https://i.ibb.co/Xp454S4/MaxPool.png)

As in convolutional layers, the final size is conditioned to the stride size of the pooling layer. However, contrary to convolutional layers, pooling layers operate independently on each of the input channels, without modifying the depth of the feature maps. To use max pooling in our model we use, `nn.MaxPooling2d` and define the stride and pooling size.

In [None]:
# Define the model
# input: 100x100 image with 3 channels -> (100, 100, 3) tensor.
# this applies 32 convolution filters of size 3x3 each.
# attribute padding='same' applies zero-padding to the input feature map
# attribute strides=1 applies stride of 1
class ConvMaxPoolNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(
            in_channels=3,
            out_channels=32,
            kernel_size=3,
            stride=1,
            padding='same',
        )
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(
            kernel_size=2,
            stride=2,
            padding=0
        )

    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        x = self.pool(x)
        return x

# Instantiate and run the model
input = torch.randn((1, 3, 100, 100))  # (batch_size, num_channels, height, width)
model = ConvMaxPoolNet()
output = model(input)

# Compare input and output size
print('Input size: ({:}, {:}, {:})'.format(input.shape[1], input.shape[2], input.shape[3]))
print('Output size: ({:}, {:}, {:})'.format(output.shape[1], output.shape[2], output.shape[3]))

# Example: Classification on MNIST

In this section, we show how to perform image classification when the input data is a 2D image instead of a flat 1D vector.

As discussed above, Convolutional Neural Networks aim to extract and exploit the local relationships on 2D maps, hence, CNNs are much more convenient for images than Multi-layer Perceptron models.

In [None]:
transform = transforms.ToTensor()
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = data.DataLoader(test_dataset, batch_size=32)

print(f'Image shape: {train_dataset[0][0].shape}')
print(f"Total number of training samples: {len(train_dataset)}")
print(f"Total number of test samples: {len(test_dataset)}")

Now we can define a model composed of convolutional layers, activation functions, and maxpool operators:

In [None]:
class CNN(nn.Module):
    def __init__(self, input_channels=1):
        super().__init__()
        self.model = nn.Sequential(
            nn.Conv2d(in_channels=input_channels, out_channels=16, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            nn.Conv2d(in_channels=16, out_channels=16, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )

    def forward(self, x):
        return self.model(x)

As explained in the previous tutorial, in a classification problem, the output of the model is a probability vector. Each dimension of the vector indicates how likely is that the input image belongs to a specific class.

Up to now, the resulting feature map of the model is a map with the shape *Batch x Weight' x Height' x Channel*, and it needs to be mapped into a vector with shape *Batch x Num Classes*. A common technique to process this mapping is to add a Flatten layer that will reshape the feature map to *Batch x (Weight' * Height' * Channel)*. Following the Flatten layer, we add a dense layer which maps this new feature map to the desired output size.

In [None]:
class ClassificationHead(nn.Module):
    def __init__(self, input_channels):
        super().__init__()
        self.flatten = nn.Flatten()
        self.fc = nn.Linear(input_channels, 10)

    def forward(self, x):
        x = self.flatten(x)
        x = self.fc(x)
        return x

Finally, we can train our CNN and check its performance on MNIST.

In [None]:
set_seed(42)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = nn.Sequential(
    CNN(input_channels=1),
    ClassificationHead(input_channels=16 * 7 * 7)
)
model = model.to(device)
print(torchinfo.summary(model, input_size=(1, 1, 28, 28)))

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.RMSprop(model.parameters(), lr=1e-4)

# Training loop
for epoch in range(10):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        preds = torch.argmax(outputs, dim=1)
        correct += (preds == targets).sum().item()
        total += inputs.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total
    print(f"Epoch [{epoch + 1}/10] - Loss: {epoch_loss:.4f}, Accuracy: {epoch_acc*100:.2f}%")

# Evaluation
model.eval()
correct = 0
total = 0
loss_total = 0

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss_total += loss.item() * inputs.size(0)

        preds = torch.argmax(outputs, dim=1)
        correct += (preds == targets).sum().item()
        total += inputs.size(0)

avg_loss = loss_total / total
accuracy = correct / total

print()
print(f'Test loss: {avg_loss}, Test accuracy: {accuracy*100:.2f}%')

# Coursework


## Task 1: Classification

At this point, we know what is a CNN, how they work, and the components needed to design them. In this first task, we want you to create a CNN that is able to outperform the Multi-layer Perceptron model from Tutorial 1. For the first part of the coursework, we train on CIFAR10, a  classical dataset for image classification. Note that in these tutorials, we mainly use the official test sets of several standard datasets as our validation data. The reason we use the given test sets as validation data for the tutorials is that is an easy way to make sure that we all work with the same split and report results using the same data. However, in a proper machine learning setup, your validation set should be separate from the test set, so you can tune the model/parameters on the validation set and then check the final performance in the test set. Thus, even though the variables are `x_test` and `y_test`, they represent our validation set.

Let's first load the dataset and visualise some examples:

In [None]:
# Load CIFAR-10 dataset
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transforms.ToTensor())

# Create DataLoaders
train_loader = data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = data.DataLoader(test_dataset, batch_size=32)

# Visualize some examples
X_train = train_dataset.data       # NumPy array of shape (50000, 32, 32, 3)
y_train = np.array(train_dataset.targets)

X_test = test_dataset.data
y_test = np.array(test_dataset.targets)

print('Image shape: {0}'.format(X_train.shape[1:]))
print('Total number of training samples: {0}'.format(X_train.shape[0]))
print('Total number of validation samples: {0}'.format(X_test.shape[0]))

N = 5
start_val = 0  # pick an element for the code to plot the following N**2 values
fig, axes = plt.subplots(N, N, figsize=(8, 8))
class_names = train_dataset.classes  # List of class names: ['airplane', 'automobile', ..., 'truck']

for row in range(N):
    for col in range(N):
        idx = start_val + row + N * col
        axes[row, col].imshow(X_train[idx])
        label_idx = y_train[idx]
        axes[row, col].set_title(class_names[label_idx])
        axes[row, col].set_xticks([])
        axes[row, col].set_yticks([])

fig.subplots_adjust(hspace=0.6)
plt.show()

Now, we are ready to define the Multi-layer Perceptron model and train it.

In [None]:
set_seed(42)

# Define the model
class FullyConnectedNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(3072, 32),
            nn.ReLU(),
            nn.Linear(32, 64),
            nn.ReLU(),
            nn.Linear(64, 128),
            nn.ReLU(),
            nn.Linear(128, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        return self.model(x)

# Instantiate model, loss function, and optimizer
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = FullyConnectedNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.RMSprop(model.parameters(), lr=1e-4)
epochs = 20

# Training loop
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        inputs = inputs.view(inputs.size(0), -1) # Flatten the input for MLP
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        preds = torch.argmax(outputs, dim=1)
        correct += (preds == targets).sum().item()
        total += inputs.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total
    print(f"Epoch [{epoch + 1}/{epochs}] - Loss: {epoch_loss:.4f}, Accuracy: {epoch_acc*100:.2f}%")

# Evaluation
model.eval()
correct = 0
total = 0
loss_total = 0.0
with torch.no_grad():
    for xb, yb in test_loader:
        xb, yb = xb.to(device), yb.to(device)
        xb = xb.view(xb.size(0), -1)
        preds = model(xb)
        loss_total += criterion(preds, yb).item() * xb.size(0)
        predicted = torch.argmax(preds, dim=1)
        correct += (predicted == yb).sum().item()
        total += yb.size(0)

print()
print(f"Validation loss: {loss_total/total:.4f}, Validation accuracy: {correct/total*100:.2f}%")

### Problem Definition

In this exercise, you are asked to test several CNN architectures in the code provided below. Do not modify the optimizer, loss used or parameters related to the training such as the learning rate, they will be investigated in future tutorials. You must focus on the architecture itself: number of convolutional layers, number of filters in every layer, activation functions, pooling operators, among others. Batch Normalization and Dropout layers, which are quite used in CNN architectures, will be also investigated in a future tutorial so you do not have to discuss them.


**Report**:
*   Present a bar figure with the training and validation accuracies for different design choices. Discuss only the parameters that have a significant influence on the network's performance. Explain any discrepancy between training and validation accuracies.
*   Present a sketch that introduces your best architecture. See some examples on how to display networks in [cv-tricks' blog](https://cv-tricks.com/cnn/understand-resnet-alexnet-vgg-inception/).


In [None]:
torch.manual_seed(42)

# the data, shuffled and split between train and test sets
# Here we are using the official test set as our validation set, in further
# tutorials, test and validation splits will be explained properly.
# Hence, even though the variables are `x_test` and `y_test`, they represent our validation set
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transforms.ToTensor())

# Create DataLoaders
train_loader = data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = data.DataLoader(test_dataset, batch_size=32, shuffle=False)

# TODO: Define your architecture here
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # First convolutional block
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 32x32 -> 16x16

            # Second convolutional block
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 16x16 -> 8x8

            # Third convolutional block
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 8x8 -> 4x4

            # Flatten and classification head
            nn.Flatten(),
            nn.Linear(128 * 4 * 4, 256),
            nn.ReLU(),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        return self.model(x)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Model().to(device)
print(torchinfo.summary(model, input_size=(1, 3, 32, 32)))
criterion = nn.CrossEntropyLoss()
optimizer = optim.RMSprop(model.parameters(), lr=1e-4)
epochs = 20

# Training loop
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        preds = torch.argmax(outputs, dim=1)
        correct += (preds == targets).sum().item()
        total += inputs.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total

    # Validation evaluation
    model.eval()
    val_loss = 0.0
    val_correct = 0
    with torch.no_grad():
        for input, target in test_loader:
            input, target = input.to(device), target.to(device)
            output = model(input)
            val_loss += criterion(output, target).item() * input.size(0)
            pred = output.argmax(dim=1)
            val_correct += (pred == target).sum().item()

    val_loss /= len(test_dataset)
    val_acc = 100. * val_correct / len(test_dataset)

    print(f"Epoch [{epoch + 1}/{epochs}] - Train Loss: {epoch_loss:.4f}, Train Acc: {epoch_acc*100:.2f}%, Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%")

In [None]:
100%|██████████| 170M/170M [00:05<00:00, 29.7MB/s]
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
Model                                    [1, 10]                   --
├─Sequential: 1-1                        [1, 10]                   --
│    └─Conv2d: 2-1                       [1, 32, 32, 32]           896
│    └─ReLU: 2-2                         [1, 32, 32, 32]           --
│    └─Conv2d: 2-3                       [1, 32, 32, 32]           9,248
│    └─ReLU: 2-4                         [1, 32, 32, 32]           --
│    └─MaxPool2d: 2-5                    [1, 32, 16, 16]           --
│    └─Conv2d: 2-6                       [1, 64, 16, 16]           18,496
│    └─ReLU: 2-7                         [1, 64, 16, 16]           --
│    └─Conv2d: 2-8                       [1, 64, 16, 16]           36,928
│    └─ReLU: 2-9                         [1, 64, 16, 16]           --
│    └─MaxPool2d: 2-10                   [1, 64, 8, 8]             --
│    └─Conv2d: 2-11                      [1, 128, 8, 8]            73,856
│    └─ReLU: 2-12                        [1, 128, 8, 8]            --
│    └─Conv2d: 2-13                      [1, 128, 8, 8]            147,584
│    └─ReLU: 2-14                        [1, 128, 8, 8]            --
│    └─MaxPool2d: 2-15                   [1, 128, 4, 4]            --
│    └─Flatten: 2-16                     [1, 2048]                 --
│    └─Linear: 2-17                      [1, 256]                  524,544
│    └─ReLU: 2-18                        [1, 256]                  --
│    └─Linear: 2-19                      [1, 10]                   2,570
==========================================================================================
Total params: 814,122
Trainable params: 814,122
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 39.28
==========================================================================================
Input size (MB): 0.01
Forward/backward pass size (MB): 0.92
Params size (MB): 3.26
Estimated Total Size (MB): 4.19
==========================================================================================
Epoch [1/20] - Train Loss: 1.7818, Train Acc: 34.20%, Val Loss: 1.5483, Val Acc: 43.56%
Epoch [2/20] - Train Loss: 1.4937, Train Acc: 45.44%, Val Loss: 1.4951, Val Acc: 44.96%
Epoch [3/20] - Train Loss: 1.3503, Train Acc: 50.94%, Val Loss: 1.2956, Val Acc: 53.00%
Epoch [4/20] - Train Loss: 1.2355, Train Acc: 55.58%, Val Loss: 1.1836, Val Acc: 57.16%
Epoch [5/20] - Train Loss: 1.1257, Train Acc: 59.97%, Val Loss: 1.1547, Val Acc: 58.55%
Epoch [6/20] - Train Loss: 1.0355, Train Acc: 63.25%, Val Loss: 1.0436, Val Acc: 62.33%
Epoch [7/20] - Train Loss: 0.9548, Train Acc: 66.46%, Val Loss: 0.9651, Val Acc: 65.71%
Epoch [8/20] - Train Loss: 0.8818, Train Acc: 68.92%, Val Loss: 0.9674, Val Acc: 66.40%
Epoch [9/20] - Train Loss: 0.8169, Train Acc: 71.36%, Val Loss: 1.0780, Val Acc: 63.07%
Epoch [10/20] - Train Loss: 0.7572, Train Acc: 73.35%, Val Loss: 0.9372, Val Acc: 67.25%
Epoch [11/20] - Train Loss: 0.7011, Train Acc: 75.47%, Val Loss: 0.8301, Val Acc: 71.07%
Epoch [12/20] - Train Loss: 0.6468, Train Acc: 77.39%, Val Loss: 0.8939, Val Acc: 70.20%
Epoch [13/20] - Train Loss: 0.5945, Train Acc: 79.55%, Val Loss: 0.8314, Val Acc: 71.49%
Epoch [14/20] - Train Loss: 0.5429, Train Acc: 81.16%, Val Loss: 0.8385, Val Acc: 72.17%
Epoch [15/20] - Train Loss: 0.4929, Train Acc: 82.77%, Val Loss: 0.9207, Val Acc: 70.50%
Epoch [16/20] - Train Loss: 0.4409, Train Acc: 84.65%, Val Loss: 0.8831, Val Acc: 72.02%
Epoch [17/20] - Train Loss: 0.3908, Train Acc: 86.42%, Val Loss: 0.9211, Val Acc: 71.78%
Epoch [18/20] - Train Loss: 0.3404, Train Acc: 88.24%, Val Loss: 0.9912, Val Acc: 71.78%
Epoch [19/20] - Train Loss: 0.2945, Train Acc: 89.71%, Val Loss: 1.0686, Val Acc: 71.75%
Epoch [20/20] - Train Loss: 0.2499, Train Acc: 91.43%, Val Loss: 1.1387, Val Acc: 71.34%


In [None]:
# Architecture 2: Deeper network with more convolutional layers and filters
torch.manual_seed(42)

# the data, shuffled and split between train and test sets
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transforms.ToTensor())

# Create DataLoaders
train_loader = data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = data.DataLoader(test_dataset, batch_size=32, shuffle=False)

# Architecture 2: Deeper network with 4 convolutional blocks and more filters
class Model2(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # First convolutional block
            nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 32x32 -> 16x16

            # Second convolutional block
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 16x16 -> 8x8

            # Third convolutional block
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 8x8 -> 4x4

            # Fourth convolutional block
            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 4x4 -> 2x2

            # Flatten and classification head
            nn.Flatten(),
            nn.Linear(512 * 2 * 2, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        return self.model(x)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Model2().to(device)
print(torchinfo.summary(model, input_size=(1, 3, 32, 32)))
criterion = nn.CrossEntropyLoss()
optimizer = optim.RMSprop(model.parameters(), lr=1e-4)
epochs = 20

# Training loop
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        preds = torch.argmax(outputs, dim=1)
        correct += (preds == targets).sum().item()
        total += inputs.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total

    # Validation evaluation
    model.eval()
    val_loss = 0.0
    val_correct = 0
    with torch.no_grad():
        for input, target in test_loader:
            input, target = input.to(device), target.to(device)
            output = model(input)
            val_loss += criterion(output, target).item() * input.size(0)
            pred = output.argmax(dim=1)
            val_correct += (pred == target).sum().item()

    val_loss /= len(test_dataset)
    val_acc = 100. * val_correct / len(test_dataset)

    print(f"Epoch [{epoch + 1}/{epochs}] - Train Loss: {epoch_loss:.4f}, Train Acc: {epoch_acc*100:.2f}%, Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%")

In [None]:
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
Model2                                   [1, 10]                   --
├─Sequential: 1-1                        [1, 10]                   --
│    └─Conv2d: 2-1                       [1, 64, 32, 32]           1,792
│    └─ReLU: 2-2                         [1, 64, 32, 32]           --
│    └─Conv2d: 2-3                       [1, 64, 32, 32]           36,928
│    └─ReLU: 2-4                         [1, 64, 32, 32]           --
│    └─MaxPool2d: 2-5                    [1, 64, 16, 16]           --
│    └─Conv2d: 2-6                       [1, 128, 16, 16]          73,856
│    └─ReLU: 2-7                         [1, 128, 16, 16]          --
│    └─Conv2d: 2-8                       [1, 128, 16, 16]          147,584
│    └─ReLU: 2-9                         [1, 128, 16, 16]          --
│    └─MaxPool2d: 2-10                   [1, 128, 8, 8]            --
│    └─Conv2d: 2-11                      [1, 256, 8, 8]            295,168
│    └─ReLU: 2-12                        [1, 256, 8, 8]            --
│    └─Conv2d: 2-13                      [1, 256, 8, 8]            590,080
│    └─ReLU: 2-14                        [1, 256, 8, 8]            --
│    └─MaxPool2d: 2-15                   [1, 256, 4, 4]            --
│    └─Conv2d: 2-16                      [1, 512, 4, 4]            1,180,160
│    └─ReLU: 2-17                        [1, 512, 4, 4]            --
│    └─MaxPool2d: 2-18                   [1, 512, 2, 2]            --
│    └─Flatten: 2-19                     [1, 2048]                 --
│    └─Linear: 2-20                      [1, 512]                  1,049,088
│    └─ReLU: 2-21                        [1, 512]                  --
│    └─Linear: 2-22                      [1, 10]                   5,130
==========================================================================================
Total params: 3,379,786
Trainable params: 3,379,786
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 172.93
==========================================================================================
Input size (MB): 0.01
Forward/backward pass size (MB): 1.90
Params size (MB): 13.52
Estimated Total Size (MB): 15.44
==========================================================================================
Epoch [1/20] - Train Loss: 1.6987, Train Acc: 36.26%, Val Loss: 1.4936, Val Acc: 43.73%
Epoch [2/20] - Train Loss: 1.3339, Train Acc: 51.29%, Val Loss: 1.4664, Val Acc: 48.46%
Epoch [3/20] - Train Loss: 1.1124, Train Acc: 59.82%, Val Loss: 1.0355, Val Acc: 63.52%
Epoch [4/20] - Train Loss: 0.9586, Train Acc: 65.86%, Val Loss: 1.1197, Val Acc: 61.98%
Epoch [5/20] - Train Loss: 0.8366, Train Acc: 70.28%, Val Loss: 0.8436, Val Acc: 70.31%
Epoch [6/20] - Train Loss: 0.7325, Train Acc: 74.08%, Val Loss: 0.8054, Val Acc: 72.47%
Epoch [7/20] - Train Loss: 0.6356, Train Acc: 77.69%, Val Loss: 0.7266, Val Acc: 75.11%
Epoch [8/20] - Train Loss: 0.5474, Train Acc: 80.71%, Val Loss: 0.7370, Val Acc: 75.36%
Epoch [9/20] - Train Loss: 0.4618, Train Acc: 83.74%, Val Loss: 0.7509, Val Acc: 75.17%
Epoch [10/20] - Train Loss: 0.3789, Train Acc: 86.80%, Val Loss: 0.7347, Val Acc: 76.48%
Epoch [11/20] - Train Loss: 0.3016, Train Acc: 89.35%, Val Loss: 0.7891, Val Acc: 76.67%
Epoch [12/20] - Train Loss: 0.2289, Train Acc: 92.06%, Val Loss: 0.8874, Val Acc: 75.52%
Epoch [13/20] - Train Loss: 0.1703, Train Acc: 94.10%, Val Loss: 0.9265, Val Acc: 76.98%
Epoch [14/20] - Train Loss: 0.1239, Train Acc: 95.66%, Val Loss: 1.0212, Val Acc: 77.14%
Epoch [15/20] - Train Loss: 0.0961, Train Acc: 96.55%, Val Loss: 1.0396, Val Acc: 77.00%
Epoch [16/20] - Train Loss: 0.0771, Train Acc: 97.30%, Val Loss: 1.1465, Val Acc: 77.29%
Epoch [17/20] - Train Loss: 0.0680, Train Acc: 97.65%, Val Loss: 1.2966, Val Acc: 76.88%
Epoch [18/20] - Train Loss: 0.0589, Train Acc: 97.88%, Val Loss: 1.2857, Val Acc: 77.00%
Epoch [19/20] - Train Loss: 0.0539, Train Acc: 98.11%, Val Loss: 1.3123, Val Acc: 77.91%
Epoch [20/20] - Train Loss: 0.0513, Train Acc: 98.26%, Val Loss: 1.3358, Val Acc: 77.18%


In [None]:
# Architecture 3: Different activation function (LeakyReLU) and AveragePooling instead of MaxPooling
torch.manual_seed(42)

# the data, shuffled and split between train and test sets
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transforms.ToTensor())

# Create DataLoaders
train_loader = data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = data.DataLoader(test_dataset, batch_size=32, shuffle=False)

# Architecture 3: LeakyReLU activation and AveragePooling
class Model3(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # First convolutional block
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding='same'),
            nn.LeakyReLU(negative_slope=0.1),
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding='same'),
            nn.LeakyReLU(negative_slope=0.1),
            nn.AvgPool2d(kernel_size=2, stride=2),  # 32x32 -> 16x16

            # Second convolutional block
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding='same'),
            nn.LeakyReLU(negative_slope=0.1),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding='same'),
            nn.LeakyReLU(negative_slope=0.1),
            nn.AvgPool2d(kernel_size=2, stride=2),  # 16x16 -> 8x8

            # Third convolutional block
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding='same'),
            nn.LeakyReLU(negative_slope=0.1),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'),
            nn.LeakyReLU(negative_slope=0.1),
            nn.AvgPool2d(kernel_size=2, stride=2),  # 8x8 -> 4x4

            # Flatten and classification head
            nn.Flatten(),
            nn.Linear(128 * 4 * 4, 256),
            nn.LeakyReLU(negative_slope=0.1),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        return self.model(x)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Model3().to(device)
print(torchinfo.summary(model, input_size=(1, 3, 32, 32)))
criterion = nn.CrossEntropyLoss()
optimizer = optim.RMSprop(model.parameters(), lr=1e-4)
epochs = 20

# Training loop
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        preds = torch.argmax(outputs, dim=1)
        correct += (preds == targets).sum().item()
        total += inputs.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total

    # Validation evaluation
    model.eval()
    val_loss = 0.0
    val_correct = 0
    with torch.no_grad():
        for input, target in test_loader:
            input, target = input.to(device), target.to(device)
            output = model(input)
            val_loss += criterion(output, target).item() * input.size(0)
            pred = output.argmax(dim=1)
            val_correct += (pred == target).sum().item()

    val_loss /= len(test_dataset)
    val_acc = 100. * val_correct / len(test_dataset)

    print(f"Epoch [{epoch + 1}/{epochs}] - Train Loss: {epoch_loss:.4f}, Train Acc: {epoch_acc*100:.2f}%, Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%")

In [None]:
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
Model3                                   [1, 10]                   --
├─Sequential: 1-1                        [1, 10]                   --
│    └─Conv2d: 2-1                       [1, 32, 32, 32]           896
│    └─LeakyReLU: 2-2                    [1, 32, 32, 32]           --
│    └─Conv2d: 2-3                       [1, 32, 32, 32]           9,248
│    └─LeakyReLU: 2-4                    [1, 32, 32, 32]           --
│    └─AvgPool2d: 2-5                    [1, 32, 16, 16]           --
│    └─Conv2d: 2-6                       [1, 64, 16, 16]           18,496
│    └─LeakyReLU: 2-7                    [1, 64, 16, 16]           --
│    └─Conv2d: 2-8                       [1, 64, 16, 16]           36,928
│    └─LeakyReLU: 2-9                    [1, 64, 16, 16]           --
│    └─AvgPool2d: 2-10                   [1, 64, 8, 8]             --
│    └─Conv2d: 2-11                      [1, 128, 8, 8]            73,856
│    └─LeakyReLU: 2-12                   [1, 128, 8, 8]            --
│    └─Conv2d: 2-13                      [1, 128, 8, 8]            147,584
│    └─LeakyReLU: 2-14                   [1, 128, 8, 8]            --
│    └─AvgPool2d: 2-15                   [1, 128, 4, 4]            --
│    └─Flatten: 2-16                     [1, 2048]                 --
│    └─Linear: 2-17                      [1, 256]                  524,544
│    └─LeakyReLU: 2-18                   [1, 256]                  --
│    └─Linear: 2-19                      [1, 10]                   2,570
==========================================================================================
Total params: 814,122
Trainable params: 814,122
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 39.28
==========================================================================================
Input size (MB): 0.01
Forward/backward pass size (MB): 0.92
Params size (MB): 3.26
Estimated Total Size (MB): 4.19
==========================================================================================
Epoch [1/20] - Train Loss: 1.7948, Train Acc: 34.37%, Val Loss: 1.6096, Val Acc: 42.57%
Epoch [2/20] - Train Loss: 1.5360, Train Acc: 44.17%, Val Loss: 1.4925, Val Acc: 45.68%
Epoch [3/20] - Train Loss: 1.4067, Train Acc: 49.16%, Val Loss: 1.3493, Val Acc: 51.60%
Epoch [4/20] - Train Loss: 1.3072, Train Acc: 53.24%, Val Loss: 1.2580, Val Acc: 54.53%
Epoch [5/20] - Train Loss: 1.2202, Train Acc: 56.25%, Val Loss: 1.2188, Val Acc: 56.11%
Epoch [6/20] - Train Loss: 1.1493, Train Acc: 58.74%, Val Loss: 1.1683, Val Acc: 58.05%
Epoch [7/20] - Train Loss: 1.0793, Train Acc: 61.86%, Val Loss: 1.1056, Val Acc: 60.22%
Epoch [8/20] - Train Loss: 1.0176, Train Acc: 63.90%, Val Loss: 1.0676, Val Acc: 62.11%
Epoch [9/20] - Train Loss: 0.9606, Train Acc: 66.21%, Val Loss: 1.1422, Val Acc: 59.92%
Epoch [10/20] - Train Loss: 0.9088, Train Acc: 68.07%, Val Loss: 1.0155, Val Acc: 64.27%
Epoch [11/20] - Train Loss: 0.8570, Train Acc: 69.76%, Val Loss: 0.9714, Val Acc: 65.43%
Epoch [12/20] - Train Loss: 0.8103, Train Acc: 71.78%, Val Loss: 0.9732, Val Acc: 65.98%
Epoch [13/20] - Train Loss: 0.7659, Train Acc: 73.17%, Val Loss: 0.9540, Val Acc: 66.96%
Epoch [14/20] - Train Loss: 0.7217, Train Acc: 74.78%, Val Loss: 0.9347, Val Acc: 68.16%
Epoch [15/20] - Train Loss: 0.6816, Train Acc: 75.96%, Val Loss: 0.9464, Val Acc: 67.96%
Epoch [16/20] - Train Loss: 0.6383, Train Acc: 77.48%, Val Loss: 0.9385, Val Acc: 68.15%
Epoch [17/20] - Train Loss: 0.6005, Train Acc: 79.21%, Val Loss: 0.9210, Val Acc: 68.98%
Epoch [18/20] - Train Loss: 0.5599, Train Acc: 80.51%, Val Loss: 0.9586, Val Acc: 69.02%
Epoch [19/20] - Train Loss: 0.5215, Train Acc: 81.79%, Val Loss: 0.9386, Val Acc: 69.42%
Epoch [20/20] - Train Loss: 0.4849, Train Acc: 83.08%, Val Loss: 1.0303, Val Acc: 67.75%


In [None]:
# Architecture 4: Wider network with more filters per layer, larger kernel sizes, fewer layers
torch.manual_seed(42)

# the data, shuffled and split between train and test sets
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transforms.ToTensor())

# Create DataLoaders
train_loader = data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = data.DataLoader(test_dataset, batch_size=32, shuffle=False)

# Architecture 4: Wider network with larger kernels and more filters
class Model4(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # First convolutional block - wider with larger kernel
            nn.Conv2d(in_channels=3, out_channels=64, kernel_size=5, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=5, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 32x32 -> 16x16

            # Second convolutional block - wider with larger kernel
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=5, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 16x16 -> 8x8

            # Third convolutional block - wider
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 8x8 -> 4x4

            # Flatten and classification head - wider fully connected layers
            nn.Flatten(),
            nn.Linear(256 * 4 * 4, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        return self.model(x)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Model4().to(device)
print(torchinfo.summary(model, input_size=(1, 3, 32, 32)))
criterion = nn.CrossEntropyLoss()
optimizer = optim.RMSprop(model.parameters(), lr=1e-4)
epochs = 20

# Training loop
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        preds = torch.argmax(outputs, dim=1)
        correct += (preds == targets).sum().item()
        total += inputs.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total

    # Validation evaluation
    model.eval()
    val_loss = 0.0
    val_correct = 0
    with torch.no_grad():
        for input, target in test_loader:
            input, target = input.to(device), target.to(device)
            output = model(input)
            val_loss += criterion(output, target).item() * input.size(0)
            pred = output.argmax(dim=1)
            val_correct += (pred == target).sum().item()

    val_loss /= len(test_dataset)
    val_acc = 100. * val_correct / len(test_dataset)

    print(f"Epoch [{epoch + 1}/{epochs}] - Train Loss: {epoch_loss:.4f}, Train Acc: {epoch_acc*100:.2f}%, Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%")

In [None]:
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
Model4                                   [1, 10]                   --
├─Sequential: 1-1                        [1, 10]                   --
│    └─Conv2d: 2-1                       [1, 64, 32, 32]           4,864
│    └─ReLU: 2-2                         [1, 64, 32, 32]           --
│    └─Conv2d: 2-3                       [1, 64, 32, 32]           102,464
│    └─ReLU: 2-4                         [1, 64, 32, 32]           --
│    └─MaxPool2d: 2-5                    [1, 64, 16, 16]           --
│    └─Conv2d: 2-6                       [1, 128, 16, 16]          204,928
│    └─ReLU: 2-7                         [1, 128, 16, 16]          --
│    └─Conv2d: 2-8                       [1, 128, 16, 16]          409,728
│    └─ReLU: 2-9                         [1, 128, 16, 16]          --
│    └─MaxPool2d: 2-10                   [1, 128, 8, 8]            --
│    └─Conv2d: 2-11                      [1, 256, 8, 8]            295,168
│    └─ReLU: 2-12                        [1, 256, 8, 8]            --
│    └─MaxPool2d: 2-13                   [1, 256, 4, 4]            --
│    └─Flatten: 2-14                     [1, 4096]                 --
│    └─Linear: 2-15                      [1, 512]                  2,097,664
│    └─ReLU: 2-16                        [1, 512]                  --
│    └─Linear: 2-17                      [1, 256]                  131,328
│    └─ReLU: 2-18                        [1, 256]                  --
│    └─Linear: 2-19                      [1, 10]                   2,570
==========================================================================================
Total params: 3,248,714
Trainable params: 3,248,714
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 288.38
==========================================================================================
Input size (MB): 0.01
Forward/backward pass size (MB): 1.71
Params size (MB): 12.99
Estimated Total Size (MB): 14.72
==========================================================================================
Epoch [1/20] - Train Loss: 1.6553, Train Acc: 38.97%, Val Loss: 1.4906, Val Acc: 46.53%
Epoch [2/20] - Train Loss: 1.2792, Train Acc: 53.56%, Val Loss: 1.1883, Val Acc: 57.31%
Epoch [3/20] - Train Loss: 1.0746, Train Acc: 61.47%, Val Loss: 1.0567, Val Acc: 62.10%
Epoch [4/20] - Train Loss: 0.9220, Train Acc: 67.47%, Val Loss: 0.9681, Val Acc: 65.52%
Epoch [5/20] - Train Loss: 0.7958, Train Acc: 71.95%, Val Loss: 0.7979, Val Acc: 72.08%
Epoch [6/20] - Train Loss: 0.6867, Train Acc: 75.95%, Val Loss: 0.7751, Val Acc: 72.70%
Epoch [7/20] - Train Loss: 0.5906, Train Acc: 79.26%, Val Loss: 0.7634, Val Acc: 73.79%
Epoch [8/20] - Train Loss: 0.5006, Train Acc: 82.32%, Val Loss: 0.7546, Val Acc: 74.42%
Epoch [9/20] - Train Loss: 0.4128, Train Acc: 85.42%, Val Loss: 0.7170, Val Acc: 76.33%
Epoch [10/20] - Train Loss: 0.3344, Train Acc: 88.34%, Val Loss: 0.8252, Val Acc: 75.87%
Epoch [11/20] - Train Loss: 0.2577, Train Acc: 91.05%, Val Loss: 0.7800, Val Acc: 76.11%
Epoch [12/20] - Train Loss: 0.1899, Train Acc: 93.37%, Val Loss: 0.8879, Val Acc: 75.89%
Epoch [13/20] - Train Loss: 0.1425, Train Acc: 95.09%, Val Loss: 0.9587, Val Acc: 77.43%
Epoch [14/20] - Train Loss: 0.1066, Train Acc: 96.31%, Val Loss: 1.0793, Val Acc: 77.16%
Epoch [15/20] - Train Loss: 0.0816, Train Acc: 97.18%, Val Loss: 1.1263, Val Acc: 77.70%
Epoch [16/20] - Train Loss: 0.0698, Train Acc: 97.53%, Val Loss: 1.2658, Val Acc: 76.84%
Epoch [17/20] - Train Loss: 0.0590, Train Acc: 98.04%, Val Loss: 1.2653, Val Acc: 77.44%
Epoch [18/20] - Train Loss: 0.0567, Train Acc: 98.06%, Val Loss: 1.3022, Val Acc: 77.84%
Epoch [19/20] - Train Loss: 0.0512, Train Acc: 98.23%, Val Loss: 1.3348, Val Acc: 77.16%
Epoch [20/20] - Train Loss: 0.0447, Train Acc: 98.45%, Val Loss: 1.3622, Val Acc: 77.25%


In [None]:
# Architecture 5: Strided convolutions instead of pooling, Tanh activation, different layer structure
torch.manual_seed(42)

# the data, shuffled and split between train and test sets
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transforms.ToTensor())

# Create DataLoaders
train_loader = data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = data.DataLoader(test_dataset, batch_size=32, shuffle=False)

# Architecture 5: Strided convolutions instead of pooling, Tanh activation
class Model5(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # First convolutional block - using stride=2 instead of pooling
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1),
            nn.Tanh(),
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding=1),
            nn.Tanh(),
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride=2, padding=1),  # 32x32 -> 16x16
            nn.Tanh(),

            # Second convolutional block
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
            nn.Tanh(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding=1),
            nn.Tanh(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=2, padding=1),  # 16x16 -> 8x8
            nn.Tanh(),

            # Third convolutional block
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
            nn.Tanh(),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1),
            nn.Tanh(),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=2, padding=1),  # 8x8 -> 4x4
            nn.Tanh(),

            # Flatten and classification head
            nn.Flatten(),
            nn.Linear(128 * 4 * 4, 256),
            nn.Tanh(),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        return self.model(x)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Model5().to(device)
print(torchinfo.summary(model, input_size=(1, 3, 32, 32)))
criterion = nn.CrossEntropyLoss()
optimizer = optim.RMSprop(model.parameters(), lr=1e-4)
epochs = 20

# Training loop
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        preds = torch.argmax(outputs, dim=1)
        correct += (preds == targets).sum().item()
        total += inputs.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total

    # Validation evaluation
    model.eval()
    val_loss = 0.0
    val_correct = 0
    with torch.no_grad():
        for input, target in test_loader:
            input, target = input.to(device), target.to(device)
            output = model(input)
            val_loss += criterion(output, target).item() * input.size(0)
            pred = output.argmax(dim=1)
            val_correct += (pred == target).sum().item()

    val_loss /= len(test_dataset)
    val_acc = 100. * val_correct / len(test_dataset)

    print(f"Epoch [{epoch + 1}/{epochs}] - Train Loss: {epoch_loss:.4f}, Train Acc: {epoch_acc*100:.2f}%, Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%")

In [None]:
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
Model5                                   [1, 10]                   --
├─Sequential: 1-1                        [1, 10]                   --
│    └─Conv2d: 2-1                       [1, 32, 32, 32]           896
│    └─Tanh: 2-2                         [1, 32, 32, 32]           --
│    └─Conv2d: 2-3                       [1, 32, 32, 32]           9,248
│    └─Tanh: 2-4                         [1, 32, 32, 32]           --
│    └─Conv2d: 2-5                       [1, 32, 16, 16]           9,248
│    └─Tanh: 2-6                         [1, 32, 16, 16]           --
│    └─Conv2d: 2-7                       [1, 64, 16, 16]           18,496
│    └─Tanh: 2-8                         [1, 64, 16, 16]           --
│    └─Conv2d: 2-9                       [1, 64, 16, 16]           36,928
│    └─Tanh: 2-10                        [1, 64, 16, 16]           --
│    └─Conv2d: 2-11                      [1, 64, 8, 8]             36,928
│    └─Tanh: 2-12                        [1, 64, 8, 8]             --
│    └─Conv2d: 2-13                      [1, 128, 8, 8]            73,856
│    └─Tanh: 2-14                        [1, 128, 8, 8]            --
│    └─Conv2d: 2-15                      [1, 128, 8, 8]            147,584
│    └─Tanh: 2-16                        [1, 128, 8, 8]            --
│    └─Conv2d: 2-17                      [1, 128, 4, 4]            147,584
│    └─Tanh: 2-18                        [1, 128, 4, 4]            --
│    └─Flatten: 2-19                     [1, 2048]                 --
│    └─Linear: 2-20                      [1, 256]                  524,544
│    └─Tanh: 2-21                        [1, 256]                  --
│    └─Linear: 2-22                      [1, 10]                   2,570
==========================================================================================
Total params: 1,007,882
Trainable params: 1,007,882
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 46.37
==========================================================================================
Input size (MB): 0.01
Forward/backward pass size (MB): 1.03
Params size (MB): 4.03
Estimated Total Size (MB): 5.08
==========================================================================================
Epoch [1/20] - Train Loss: 1.7996, Train Acc: 35.74%, Val Loss: 1.6475, Val Acc: 41.23%
Epoch [2/20] - Train Loss: 1.5387, Train Acc: 45.02%, Val Loss: 1.5375, Val Acc: 44.00%
Epoch [3/20] - Train Loss: 1.3790, Train Acc: 50.60%, Val Loss: 1.3366, Val Acc: 51.98%
Epoch [4/20] - Train Loss: 1.2703, Train Acc: 54.56%, Val Loss: 1.2650, Val Acc: 55.43%
Epoch [5/20] - Train Loss: 1.1814, Train Acc: 58.12%, Val Loss: 1.2141, Val Acc: 56.55%
Epoch [6/20] - Train Loss: 1.1052, Train Acc: 61.25%, Val Loss: 1.1847, Val Acc: 58.13%
Epoch [7/20] - Train Loss: 1.0359, Train Acc: 64.01%, Val Loss: 1.1713, Val Acc: 58.75%
Epoch [8/20] - Train Loss: 0.9663, Train Acc: 66.30%, Val Loss: 1.1310, Val Acc: 60.22%
Epoch [9/20] - Train Loss: 0.9026, Train Acc: 68.88%, Val Loss: 1.1064, Val Acc: 61.37%
Epoch [10/20] - Train Loss: 0.8383, Train Acc: 71.22%, Val Loss: 1.1054, Val Acc: 61.17%
Epoch [11/20] - Train Loss: 0.7754, Train Acc: 73.59%, Val Loss: 1.1049, Val Acc: 61.58%
Epoch [12/20] - Train Loss: 0.7140, Train Acc: 76.08%, Val Loss: 1.0903, Val Acc: 62.10%
Epoch [13/20] - Train Loss: 0.6530, Train Acc: 78.26%, Val Loss: 1.0907, Val Acc: 62.72%
Epoch [14/20] - Train Loss: 0.5925, Train Acc: 80.63%, Val Loss: 1.1203, Val Acc: 62.27%
Epoch [15/20] - Train Loss: 0.5339, Train Acc: 82.91%, Val Loss: 1.1021, Val Acc: 62.85%
Epoch [16/20] - Train Loss: 0.4734, Train Acc: 85.28%, Val Loss: 1.1541, Val Acc: 62.33%
Epoch [17/20] - Train Loss: 0.4177, Train Acc: 87.46%, Val Loss: 1.1619, Val Acc: 62.94%
Epoch [18/20] - Train Loss: 0.3642, Train Acc: 89.55%, Val Loss: 1.2011, Val Acc: 62.10%
Epoch [19/20] - Train Loss: 0.3124, Train Acc: 91.48%, Val Loss: 1.2147, Val Acc: 62.65%
Epoch [20/20] - Train Loss: 0.2647, Train Acc: 93.17%, Val Loss: 1.2609, Val Acc: 62.42%



---
---

## Task 2: Regression

Now we face a regression task instead of a classification problem. Loss function, activations, and dataset will change in the following task. Thus, instead of having one vector with the probabilities of each class, in this regression problem, the output is a single scalar.

For this second task, we chose the task of estimating house prices based on input images. To get the data run the following script, which clones Ahmed and Moustafa’s [repository](https://github.com/emanhamed/Houses-dataset) into colmap.

In [None]:
!git clone https://github.com/emanhamed/Houses-dataset
%cd /content/Houses-dataset/Houses\ Dataset

This dataset contains four images of the house (kitchen, frontal, bedroom and bathroom), and attributes (number of bedrooms, number of bathrooms, zip code...). For our exercise, we only use the images of the house. We start with front door images.

In [None]:
house_section = 'frontal' # select between: kitchen, frontal, bedroom or bathroom
print('We have selected {} images. You can switch to the kitchen, frontal, bedroom or bathroom images by changing house_section variable.'.format(house_section))
images = []
for i_im in range(1, 536):
  image = cv2.imread(str(i_im)+'_'+house_section+'.jpg')
  image = cv2.resize(image, (64, 64))
  images.append(image)

labels = []
f = open('HousesInfo.txt', "r")
for x in f:
  label = (x).split(' ')[-1].split('\n')[0]
  labels.append(label)

# Let's visualize some examples
N=3
start_val = 0 # pick an element for the code to plot the following N**2 values
fig, axes = plt.subplots(N,N)
for row in range(N):
  for col in range(N):
    idx = start_val+row+N*col
    tmp = cv2.cvtColor(images[idx],cv2.COLOR_BGR2RGB)
    axes[row,col].imshow(tmp, cmap='gray')
    fig.subplots_adjust(hspace=0.5)
    target = int(labels[idx])
    axes[row,col].set_title(str(target) + '$')
    axes[row,col].set_xticks([])
    axes[row,col].set_yticks([])

Prepare the dataset for training the model:

In [None]:
# CORRECTED ARCHITECTURES FOR TASK 2 REGRESSION
# Copy these corrected architectures to replace the ones in cells 41, 45, 46, 47

# The fix: Add nn.AdaptiveAvgPool2d((4, 4)) before nn.Flatten() to ensure correct dimensions

# CORRECTED HousePriceModel (Baseline - replace in original cell):
"""
class HousePriceModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # First convolutional block
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 64x64 -> 32x32
            
            # Second convolutional block
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 32x32 -> 16x16
            
            # Third convolutional block
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 16x16 -> 8x8
            
            # Fourth convolutional block
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 8x8 -> 4x4
            
            # Flatten and regression head - FIXED with adaptive pooling
            nn.AdaptiveAvgPool2d((4, 4)),  # Ensure 4x4 output
            nn.Flatten(),
            nn.Linear(256 * 4 * 4, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, 1)  # Single output for regression
        )

    def forward(self, x):
        return self.model(x)
"""

print("See corrected architectures above. Replace the Flatten sections in cells 41, 45, 46, 47 with:")
print("nn.AdaptiveAvgPool2d((4, 4)),  # Add this line before nn.Flatten()")
print("nn.Flatten(),")

In [None]:
# Set seed for reproducibility
set_seed(42)

# Convert to NumPy arrays and normalize
images = np.asarray(images).astype(np.float32) / 255.0  # Normalize pixel values
labels = np.asarray(labels).astype(np.float32)

# Normalize labels
max_price = labels.max()
labels /= max_price

# Shuffle
indices = np.random.permutation(len(images))
images = images[indices]
labels = labels[indices]

# Split into train and validation
split_idx = int(0.8 * len(images))
X_train_np, X_val_np = images[:split_idx], images[split_idx:]
Y_train_np, Y_val_np = labels[:split_idx], labels[split_idx:]

# Convert to PyTorch tensors
# If images are in (N, H, W, C) format (NHWC), convert to (N, C, H, W)
X_train = torch.from_numpy(X_train_np).permute(0, 3, 1, 2)  # NHWC → NCHW
X_val = torch.from_numpy(X_val_np).permute(0, 3, 1, 2)
Y_train = torch.from_numpy(Y_train_np)
Y_val = torch.from_numpy(Y_val_np)

# Create TensorDataset and DataLoader
batch_size = 32
train_loader = data.DataLoader(data.TensorDataset(X_train, Y_train), batch_size=batch_size, shuffle=True)
val_loader = data.DataLoader(data.TensorDataset(X_val, Y_val), batch_size=batch_size, shuffle=False)

# Check shape
print('X_train shape:', X_train.shape)
print('Y_train shape:', Y_train.shape)
print('X_val shape:', X_val.shape)
print('Y_val shape:', Y_val.shape)

In [None]:
# Architecture 1: Baseline CNN for regression
set_seed(42)

# MAPE
def mean_absolute_percentage_error(y_pred, y_true):
    return torch.mean(torch.abs((y_true - y_pred) / (y_true + 1e-12))) * 100

class HousePriceModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # First convolutional block
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 64x64 -> 32x32
            
            # Second convolutional block
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 32x32 -> 16x16
            
            # Third convolutional block
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 16x16 -> 8x8
            
            # Fourth convolutional block
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 8x8 -> 4x4
            
            # Flatten and regression head - FIXED: use adaptive pooling to ensure correct dimensions
            nn.AdaptiveAvgPool2d((4, 4)),  # Ensure 4x4 output regardless of input size
            nn.Flatten(),
            nn.Linear(256 * 4 * 4, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, 1)  # Single output for regression
        )

    def forward(self, x):
        return self.model(x)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = HousePriceModel().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)

# Training loop
for epoch in range(100):
    model.train()
    train_loss = 0.0
    train_sample = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = mean_absolute_percentage_error(outputs.squeeze(), targets)
        loss.backward()
        optimizer.step()
        train_loss += loss.item() * inputs.size(0)
        train_sample += inputs.size(0)

    # Validation
    model.eval()
    val_loss = 0.0
    val_sample = 0
    with torch.no_grad():
        for inputs, targets in val_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)
            val_loss += mean_absolute_percentage_error(outputs.squeeze(), targets).item() * inputs.size(0)
            val_sample += inputs.size(0)

    print(f"Epoch [{epoch+1}/100] - Train loss: {train_loss/train_sample:.2f}%, Validation loss: {val_loss/val_sample:.2f}%")


In [None]:
# Architecture 2: Deeper network with more filters for regression
set_seed(42)

# MAPE
def mean_absolute_percentage_error(y_pred, y_true):
    return torch.mean(torch.abs((y_true - y_pred) / (y_true + 1e-12))) * 100

class HousePriceModel2(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # First convolutional block
            nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 64x64 -> 32x32

            # Second convolutional block
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 32x32 -> 16x16

            # Third convolutional block
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 16x16 -> 8x8

            # Fourth convolutional block
            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 8x8 -> 4x4

            # Flatten and regression head - FIXED: use adaptive pooling to ensure correct dimensions
            nn.AdaptiveAvgPool2d((4, 4)),  # Ensure 4x4 output regardless of input size
            nn.Flatten(),
            nn.Linear(512 * 4 * 4, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, 1)  # Single output for regression
        )

    def forward(self, x):
        return self.model(x)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = HousePriceModel2().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)

# Training loop
for epoch in range(100):
    model.train()
    train_loss = 0.0
    train_sample = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = mean_absolute_percentage_error(outputs.squeeze(), targets)
        loss.backward()
        optimizer.step()
        train_loss += loss.item() * inputs.size(0)
        train_sample += inputs.size(0)

    # Validation
    model.eval()
    val_loss = 0.0
    val_sample = 0
    with torch.no_grad():
        for inputs, targets in val_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)
            val_loss += mean_absolute_percentage_error(outputs.squeeze(), targets).item() * inputs.size(0)
            val_sample += inputs.size(0)

    print(f"Epoch [{epoch+1}/100] - Train loss: {train_loss/train_sample:.2f}%, Validation loss: {val_loss/val_sample:.2f}%")

In [None]:
# Architecture 3: Wider network with larger kernels and AveragePooling
set_seed(42)

# MAPE
def mean_absolute_percentage_error(y_pred, y_true):
    return torch.mean(torch.abs((y_true - y_pred) / (y_true + 1e-12))) * 100

class HousePriceModel3(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # First convolutional block - wider with larger kernel
            nn.Conv2d(in_channels=3, out_channels=64, kernel_size=5, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=5, padding='same'),
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=2, stride=2),  # 64x64 -> 32x32

            # Second convolutional block
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=5, padding='same'),
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=2, stride=2),  # 32x32 -> 16x16

            # Third convolutional block
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=2, stride=2),  # 16x16 -> 8x8

            # Fourth convolutional block
            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=2, stride=2),  # 8x8 -> 4x4

            # Flatten and regression head - FIXED: use adaptive pooling to ensure correct dimensions
            nn.AdaptiveAvgPool2d((4, 4)),  # Ensure 4x4 output regardless of input size
            nn.Flatten(),
            nn.Linear(512 * 4 * 4, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, 1)  # Single output for regression
        )

    def forward(self, x):
        return self.model(x)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = HousePriceModel3().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)

# Training loop
for epoch in range(100):
    model.train()
    train_loss = 0.0
    train_sample = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = mean_absolute_percentage_error(outputs.squeeze(), targets)
        loss.backward()
        optimizer.step()
        train_loss += loss.item() * inputs.size(0)
        train_sample += inputs.size(0)

    # Validation
    model.eval()
    val_loss = 0.0
    val_sample = 0
    with torch.no_grad():
        for inputs, targets in val_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)
            val_loss += mean_absolute_percentage_error(outputs.squeeze(), targets).item() * inputs.size(0)
            val_sample += inputs.size(0)

    print(f"Epoch [{epoch+1}/100] - Train loss: {train_loss/train_sample:.2f}%, Validation loss: {val_loss/val_sample:.2f}%")

In [None]:
# Architecture 4: LeakyReLU activation and strided convolutions
set_seed(42)

# MAPE
def mean_absolute_percentage_error(y_pred, y_true):
    return torch.mean(torch.abs((y_true - y_pred) / (y_true + 1e-12))) * 100

class HousePriceModel4(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # First convolutional block - using stride instead of pooling
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1),
            nn.LeakyReLU(negative_slope=0.1),
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding=1),
            nn.LeakyReLU(negative_slope=0.1),
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=2, padding=1),  # 64x64 -> 32x32
            nn.LeakyReLU(negative_slope=0.1),

            # Second convolutional block
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding=1),
            nn.LeakyReLU(negative_slope=0.1),
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=2, padding=1),  # 32x32 -> 16x16
            nn.LeakyReLU(negative_slope=0.1),

            # Third convolutional block
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1),
            nn.LeakyReLU(negative_slope=0.1),
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=2, padding=1),  # 16x16 -> 8x8
            nn.LeakyReLU(negative_slope=0.1),

            # Fourth convolutional block
            nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding=1),
            nn.LeakyReLU(negative_slope=0.1),
            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=2, padding=1),  # 8x8 -> 4x4
            nn.LeakyReLU(negative_slope=0.1),

            # Flatten and regression head - FIXED: use adaptive pooling to ensure correct dimensions
            nn.AdaptiveAvgPool2d((4, 4)),  # Ensure 4x4 output regardless of input size
            nn.Flatten(),
            nn.Linear(512 * 4 * 4, 512),
            nn.LeakyReLU(negative_slope=0.1),
            nn.Linear(512, 256),
            nn.LeakyReLU(negative_slope=0.1),
            nn.Linear(256, 1)  # Single output for regression
        )

    def forward(self, x):
        return self.model(x)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = HousePriceModel4().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)

# Training loop
for epoch in range(100):
    model.train()
    train_loss = 0.0
    train_sample = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = mean_absolute_percentage_error(outputs.squeeze(), targets)
        loss.backward()
        optimizer.step()
        train_loss += loss.item() * inputs.size(0)
        train_sample += inputs.size(0)

    # Validation
    model.eval()
    val_loss = 0.0
    val_sample = 0
    with torch.no_grad():
        for inputs, targets in val_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)
            val_loss += mean_absolute_percentage_error(outputs.squeeze(), targets).item() * inputs.size(0)
            val_sample += inputs.size(0)

    print(f"Epoch [{epoch+1}/100] - Train loss: {train_loss/train_sample:.2f}%, Validation loss: {val_loss/val_sample:.2f}%")

In [None]:
# Evaluate model on different house image types (frontal, kitchen, bedroom, bathroom)
# This creates a table comparing results across all image types

import pandas as pd

# MAPE function
def mean_absolute_percentage_error(y_pred, y_true):
    return torch.mean(torch.abs((y_true - y_pred) / (y_true + 1e-12))) * 100

# List of all house sections to test
house_sections = ['frontal', 'kitchen', 'bedroom', 'bathroom']

# Store results for each image type
results = []

# Load labels once (same for all image types)
labels = []
f = open('HousesInfo.txt', "r")
for x in f:
    label = (x).split(' ')[-1].split('\n')[0]
    labels.append(label)
labels = np.asarray(labels).astype(np.float32)
max_price = labels.max()
labels /= max_price  # Normalize labels

# Shuffle indices (same shuffle for all image types)
indices = np.random.permutation(len(labels))
split_idx = int(0.8 * len(labels))
val_indices = indices[split_idx:]

# For each house section, load images and evaluate
for house_section in house_sections:
    print(f"\n{'='*60}")
    print(f"Evaluating on {house_section} images")
    print(f"{'='*60}")

    # Load images for this section
    images = []
    for i_im in range(1, 536):
        image = cv2.imread(str(i_im)+'_'+house_section+'.jpg')
        if image is not None:
            image = cv2.resize(image, (64, 64))
            images.append(image)
        else:
            # If image doesn't exist, use zeros (shouldn't happen, but safety check)
            images.append(np.zeros((64, 64, 3), dtype=np.uint8))

    # Convert to NumPy arrays and normalize
    images = np.asarray(images).astype(np.float32) / 255.0
    images = images[indices]  # Apply same shuffle

    # Split into train and validation
    X_val_np = images[split_idx:]
    Y_val_np = labels[split_idx:]

    # Convert to PyTorch tensors
    X_val = torch.from_numpy(X_val_np).permute(0, 3, 1, 2)  # NHWC → NCHW
    Y_val = torch.from_numpy(Y_val_np)

    # Create DataLoader
    val_loader = data.DataLoader(data.TensorDataset(X_val, Y_val), batch_size=32, shuffle=False)

    # Evaluate model (assuming you've trained a model - use the best architecture)
    # Replace HousePriceModel with your trained model class
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    # Load or recreate your trained model here
    # For this example, we'll evaluate with a fresh model (you should load your trained weights)
    # model = HousePriceModel().to(device)  # Replace with your best architecture
    # model.load_state_dict(torch.load('best_model.pth'))  # Load trained weights if saved
    # model.eval()

    # For demonstration, we'll just show the structure
    # You need to train your model first, then evaluate here
    print(f"Note: Train your model first, then evaluate on {house_section} images")
    print(f"Validation set size: {len(X_val)} images")

    # Uncomment below after training your model:
    """
    error_total = 0.0
    sample_total = 0

    with torch.no_grad():
        for xb, yb in val_loader:
            xb, yb = xb.to(device), yb.to(device)
            preds = model(xb)
            error_total += mean_absolute_percentage_error(preds.squeeze(), yb).item() * xb.size(0)
            sample_total += xb.size(0)

    mape_error = error_total / sample_total
    results.append({
        'Image Type': house_section,
        'Validation MAPE (%)': f"{mape_error:.2f}"
    })
    print(f"Validation MAPE Error: {mape_error:.2f}%")
    """

# Create results table
if results:
    results_df = pd.DataFrame(results)
    print("\n" + "="*60)
    print("Results Summary Table")
    print("="*60)
    print(results_df.to_string(index=False))
else:
    print("\nTo get results:")
    print("1. Train your model on frontal images first")
    print("2. Save the trained model weights")
    print("3. Uncomment the evaluation code above")
    print("4. Run this cell to get results for all image types")

### Problem Definition

Similar to the previous task, you are asked to design a CNN architecture able to perform the estimation of house prices based on the `frontal` house image. Design a new model by changing parameters such as the number of convolutional layers, activation functions, strides, or pooling operators, among others.

In [None]:
set_seed(42)

# MAPE
def mean_absolute_percentage_error(y_pred, y_true):
    return torch.mean(torch.abs((y_true - y_pred) / (y_true + 1e-12))) * 100

# Architecture 1: Baseline CNN for regression
class HousePriceModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # First convolutional block
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 64x64 -> 32x32
            
            # Second convolutional block
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 32x32 -> 16x16
            
            # Third convolutional block
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 16x16 -> 8x8
            
            # Fourth convolutional block
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding='same'),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # 8x8 -> 4x4
            
            # Flatten and regression head - FIXED: use adaptive pooling to ensure correct dimensions
            nn.AdaptiveAvgPool2d((4, 4)),  # Ensure 4x4 output regardless of input size
            nn.Flatten(),
            nn.Linear(256 * 4 * 4, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, 1)  # Single output for regression
        )

    def forward(self, x):
        return self.model(x)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = HousePriceModel().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)

# Training loop
for epoch in range(100):
    model.train()
    train_loss = 0.0
    train_sample = 0

    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = mean_absolute_percentage_error(outputs.squeeze(), targets)
        loss.backward()
        optimizer.step()
        train_loss += loss.item() * inputs.size(0)
        train_sample += inputs.size(0)

    # Validation
    model.eval()
    val_loss = 0.0
    val_sample = 0
    with torch.no_grad():
        for inputs, targets in val_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)
            val_loss += mean_absolute_percentage_error(outputs.squeeze(), targets).item() * inputs.size(0)
            val_sample += inputs.size(0)

    print(f"Epoch [{epoch+1}/100] - Train loss: {train_loss/train_sample:.2f}%, Validation loss: {val_loss/val_sample:.2f}%")


The metric used in this problem to evaluate the performance is the same we used for training the model, the mean absolute percentage error. Mean absolute percentage error is defined as $\frac{100}{n} \sum_n \frac{|\hat{y} - y|}{|y|}$ where $y$ is the ground-truth, $\hat{y}$ is the estimation of the model and `n` the number of elements in the set we are evaluating.

In [None]:
model.eval()
error_total = 0.0
sample_total = 0

with torch.no_grad():
    for xb, yb in val_loader:
        xb, yb = xb.to(device), yb.to(device)
        preds = model(xb)
        error_total += mean_absolute_percentage_error(preds.squeeze(), yb).item() * xb.size(0)
        sample_total += xb.size(0)

print(f"Predicting house prices - Estimation Error: {error_total/sample_total:.2f}%")

**Report**:


*   Propose a CNN architecture that has an estimation error in the validation set below 75%.
*   Present a figure showing the training and validation loss vs the number of training epochs for different architectural design choices. Discuss the gap between the training and validation loss depending on the proposed architecture.
*   Report a table with results when using any of the other images from the house (kitchen, bedroom, and bathroom).