[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1AeVWJn0TWtr4DjV99uZ3ONGRWF0W-FIR?usp=sharing)

Submitted By: Chiara Vega

# Problem Set 1 - Take at home (50 Points)

## PS1.A (15 points)


![](images/autoencoder.png)

Design a neural system that will receive an input image from the CIFAR10 dataset and it will _reconstruct it_ at the output.

To do so the encoder must compress the input image into a vector of size $d$ and the decoder must reconstruct the image from the vector of size $d$.

The encoder spatial dimensions of its output feature maps will determined by the input height or width (i), kernel (k), padding (p), and stride (s), producing the output feature map (o) size that is given by

$$o = \frac{i-k+2p}{s}+1$$

The decoder must make use of a number of _transposed convolutional_ layers - each successively upsamples its input feature map. The operation is the inverse of the earlier question on representing convolution as a matrix by vector product:

$$x = C^T y$$

For a given size of the input (i), kernel (k), padding (p), and stride (s), the size of the output feature map (o) generated is given by

$$o = s(i-1)+k-2p$$

Ultimately the size of the input image is produced at the output of the decoder and this would allow the comparison of the original input image and the decoded image $\tilde x$.  

Your design should include all dimensions of the encoder and decoder CNNs including the number of layers, number of filters and spatial dimensions, the activation functions, the loss function, the optimizer. You need to expose this information via the API (you can use any framework you know).



In [None]:
# type here the python model of the autoencoder

# please note that elements of the model that are typed in without any explanation will receive 0 points

In this section, an autoencoder model CNN model is specifically designed for image reconstruction using the CIFAR10 dataset. The PyTorch `nn.Module` is used to implement the autoencoder CNN model as a class called `Autoencoder`. The **encoder** and the **decoder** are the two fundamental parts of the autoencoder architecture.

**Encoder**

The encoder's function is to compress the input image into a lower-dimensional representation. Multiple convolutional layers are used in the construction of the encoder, each of which is, of course, followed by ReLU activation functions. as well as max-pooling layers. These convolutional and pooling layers are implemented using 'nn.Conv2d' and 'nn.MaxPool2d'. The first convolutional layer applies 64 3x3 filters, with padding of 1 and stride of 1, to input images with 3 channels (representing RGB color). Non-linearity is introduced through ReLU activation, facilitating feature extraction. The spatial dimensions of the feature map are reduced using a subsequent max-pooling layer with a kernel size of 2x2 and a stride of 2. The same pattern is followed by additional convolutional layers, each with 128 filters. The encoder yields an 8x8x128-pixel compressed feature map as its output.

**Decoder**

The decoder attempts to reconstruct the original image from this compressed form. Transposed convolutional layers (`nn.ConvTranspose2d`), ReLU activation functions, and a final sigmoid activation are used to implement the decoder. Upsampling the compressed feature map to the original image size is the decoder's main objective. The compressed feature map is accepted by the first transposed convolutional layer, which then applies 64 filters of size 4x4 with padding of 1 and stride of 2. ReLU activation increases the reconstruction process by introducing non-linearity. The subsequent second transposed convolutional layer then applies 3 filters of size of 4x4 and padding of 1 and stride of 2 . The [0, 1] range, which represents the pixel values of the reconstructed image, is restricted for the output of the decoder by the final sigmoid activation.

**Linear Layer**

A linear layer (`nn.Linear`) is added to the encoder's output feature map in order to further compress it, creating a vector with a size of 128 - for the input data to be represented more effectively.

**Loss Function and Optimizer**

The Mean Squared Error (MSE) loss function is used in training the autoencoder - the objective being to measure and minimize the discrepancies between the original images and the reconstructed images. Due to being well-noted to be a highly popular optimization technique, this model will use the Adam optimizer, along with a learning rate of 0.001 and weight decay of 0.001.

In [None]:
!pip install optuna

Collecting optuna
  Downloading optuna-3.2.0-py3-none-any.whl (390 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m390.6/390.6 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.11.1-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m26.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting cmaes>=0.9.1 (from optuna)
  Downloading cmaes-0.10.0-py3-none-any.whl (29 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.7.0-py2.py3-none-any.whl (11 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.2.4-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.7/78.7 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: Mako, colorlog, cmaes, alembic, optuna
Successfully installed Mako-1.2.4 alembic-1.11.1 cmaes-0.10.0 colorlog-6.7.0 optuna-3.2.0


In [None]:
# Import necessary libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import optuna

In [None]:
# Can use for reproducibility
#torch.manual_seed(0)

In [None]:
# Define the Autoencoder class
class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()

        # Encoder layers
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1),  # o = (i - k + 2p) / s + 1
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # o = (i - k + 2p) / s + 1
            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),  # o = (i - k + 2p) / s + 1
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),  # o = (i - k + 2p) / s + 1
        )

        # Decoder layers
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1),  # o = s(i-1) + k - 2p
            nn.ReLU(),
            nn.ConvTranspose2d(64, 3, kernel_size=4, stride=2, padding=1),  # o = s(i-1) + k - 2p
            nn.Sigmoid(),
        )

        # Linear layer to compress the input into a vector of size 128
        self.linear = nn.Linear(8 * 8 * 128, 128)

    def forward(self,x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

# Loss function and optimizer
model = Autoencoder()
criterion = nn.MSELoss()  # Mean Squared Error loss for reconstruction
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=0.001)

# PS1.B (15 points)

Train your PS1.A autoencoder  and produce the validation or test loss as a number of epochs. Make sure you explain any preprocessing stages and ensure you show the hyperparameter tuning process in your code.



In [None]:
# Type here the training code for the autoencoder and the code for the prediction of the test set

# please note that elements of the training code that are typed in without any explanation will receive 0 points

With the autoencoder CNN model designed, the subsequent step will be to train the model on the CIFAR10 dataset.

**Data Preprocessing**

Data Preprocessing is an important step that must be accomplished prior to training - beginning with `transforms.Compose` function from Pytorch to define a series of transformations. The transformed images were turned into tensors (`ToTensor`) and the pixel values normalized (`Normalize`) to lie within the range [-1, 1].

**DataLoader**

After data preprocessing, the CIFAR10 datasets loaded from `datasets.CIFAR10` and organized into `DataLoader` for efficient facilitation of batches for training and testing, along with being set to shuffle.

**Hyperparameter Tuning**

Optuna will be used for hyperparameter tuning. Firstly, a function called `validation_loss` is defined to calculate the validation loss - where the model is put into evaluation by `model.eval()` and the test DataLoader is iterated over in order to calculate the validation loss. Similarly stated in the previous section, MSE loss function will be used. Optuna study's `objective` function will determine the best combination of hyperparameters for the model, specifically the learning rate (`lr`) and weight decay (`weight_decay`). The training loop is repeated by the function for a 5 epochs. The model is trained using the Adam optimizer with the predetermined learning rate and weight decay throughout each epoch. At the conclusion of each period, the validation loss is calculated and printed. The optimal hyperparameters are obtained via `study.best_params` following the hyperparameter tuning - where the training loop is repeated with the best learning rate and weight decay. The Adam optimizer is used to train the final model using the optimal hyperparameters. And each epoch's training loss is printed during the final training.

**Test Loss Calculation**

In order to calculate the test loss, the trained model is assessed on the test set, with the MSE loss function used to compute the average test loss after iterating the test DataLoader.

You will find that the test loss (MSE) is: 0.1437.

In [None]:
# Data Preprocessing
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Normalize to range [-1, 1]
])

train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:12<00:00, 13724443.64it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


In [None]:
# Function to calculate the validation loss
def validation_loss(model, dataloader, criterion, device):
    model.eval()
    total_loss = 0.0
    with torch.no_grad():
        for inputs, _ in dataloader:
            inputs = inputs.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, inputs)
            total_loss += loss.item() * inputs.size(0)
    model.train()
    return total_loss / len(dataloader.dataset)

In [None]:
epochs = 5

# Define the objective function for Optuna study
def objective(trial):
    lr = trial.suggest_float('lr', 1e-5, 1e-2, log=True)
    weight_decay = trial.suggest_float('weight_decay', 1e-5, 1e-3, log=True)

    model = Autoencoder()
    model.to(device)

    # Define optimizer and criterion (MSE loss function)
    optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
    criterion = nn.MSELoss()

    # Training loop
    for epoch in range(epochs):
        model.train()
        total_loss = 0.0
        for inputs, _ in train_loader:
            inputs = inputs.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, inputs)
            total_loss += loss.item() * inputs.size(0)
            loss.backward()
            optimizer.step()

        avg_loss = total_loss / len(train_loader.dataset)
        val_loss = validation_loss(model, test_loader, criterion, device)
        trial.report(val_loss, step=epoch)

        # Prune bad trials
        if trial.should_prune():
            raise optuna.TrialPruned()

    return val_loss

# Perform hyperparameter tuning with Optuna
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=5)

# Get the best hyperparameters
best_params = study.best_params
best_lr = best_params['lr']
best_weight_decay = best_params['weight_decay']

[I 2023-08-03 09:12:53,486] A new study created in memory with name: no-name-7d4e3995-7f78-479c-b232-de08bab72b9f
[I 2023-08-03 09:14:26,138] Trial 0 finished with value: 0.14526146059036255 and parameters: {'lr': 0.0006176026261782298, 'weight_decay': 0.00017266991418482335}. Best is trial 0 with value: 0.14526146059036255.
[I 2023-08-03 09:15:44,172] Trial 1 finished with value: 0.14318454551696777 and parameters: {'lr': 0.0002879948240762162, 'weight_decay': 5.7906721789705734e-05}. Best is trial 1 with value: 0.14318454551696777.
[I 2023-08-03 09:17:02,254] Trial 2 finished with value: 0.15730477073192597 and parameters: {'lr': 1.0990048244125832e-05, 'weight_decay': 9.811819704589201e-05}. Best is trial 1 with value: 0.14318454551696777.
[I 2023-08-03 09:18:22,306] Trial 3 finished with value: 0.14876255395412444 and parameters: {'lr': 3.094919955898236e-05, 'weight_decay': 3.56125386439924e-05}. Best is trial 1 with value: 0.14318454551696777.
[I 2023-08-03 09:19:43,275] Trial 4 

In [None]:
# Train the autoencoder model with the best hyperparameters
model = Autoencoder()
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=best_lr, weight_decay=best_weight_decay)
criterion = nn.MSELoss()

for epoch in range(epochs):
    model.train()
    total_loss = 0.0
    for inputs, _ in train_loader:
        inputs = inputs.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, inputs)
        total_loss += loss.item() * inputs.size(0)
        loss.backward()
        optimizer.step()

    avg_loss = total_loss / len(train_loader.dataset)
    print(f"Epoch {epoch + 1}/{epochs}: Training Loss using Best Hyperparameters: {avg_loss:.4f}")

Epoch 1/5: Training Loss using Best Hyperparameters: 0.1686
Epoch 2/5: Training Loss using Best Hyperparameters: 0.1497
Epoch 3/5: Training Loss using Best Hyperparameters: 0.1482
Epoch 4/5: Training Loss using Best Hyperparameters: 0.1474
Epoch 5/5: Training Loss using Best Hyperparameters: 0.1468


In [None]:
# Produce the test loss
model.eval()
test_loss = 0.0
with torch.no_grad():
    for inputs, _ in test_loader:
        inputs = inputs.to(device)
        outputs = model(inputs)
        loss = criterion(outputs, inputs)
        test_loss += loss.item() * inputs.size(0)

test_loss /= len(test_loader.dataset)
print(f"Test Loss: {test_loss:.4f}")

Test Loss: 0.1437


# PS1.C (20 points)

Add zero-mean Gaussian noise with variance that ranges from 0 to 1 to the images and repeat the training exercise comparing results with PS1.B and explaining why adding noise is a good idea.

Explain here

In [None]:
# add the noise and repeat

This final section examines the results of training with zero-mean Gaussian noise added to the input images. The goal is to assess and compare the model's performance with added noise to that of the model's performance without noise added (from the previous section).

**Adding Gaussian Noise**

For the purpose of adding Gaussian noise to the input images, the function `add_noise` is defined, taking in two arguments: the input images `inputs` and a Gaussian noise level `noise_factor`, with a set standard deviation and zero mean - all while making sure that pixel values remain within the range [0, 1]

**Repeating the Process**

The process from the previous section is repeated, that is, training autoencoder CNN model with the noisy inputs and using Optuna for hyperparameter tuning. The losses, of course, will be recorded.

**Comparson of Test Losses**

The test loss (MSE) when adding the Gaussian noise is: 0.323. The test loss, 0.323, obtained when adding the noise is smaller than the test loss, 0.1437, obtained when the noise was not added. Adding noise to allows for the decoder to be more robust and reduce overfitting - the model essentially encouraged to learn and focus on the important features of the input images - eventually leading to improved reconstructions of the images.

In [None]:
# Function to add Gaussian noise to the inputs
def add_noise(inputs, noise_factor=0.3):
    noisy = inputs + torch.randn_like(inputs) * noise_factor
    noisy = torch.clip(noisy, 0., 1.)
    return noisy

In [None]:
# Define the objective function for Optuna study
def objective(trial):
    lr = trial.suggest_float('lr', 1e-5, 1e-2, log=True)
    weight_decay = trial.suggest_float('weight_decay', 1e-5, 1e-3, log=True)

    model_noisy = Autoencoder()
    model_noisy.to(device)

    # Define optimizer and criterion (MSE loss function)
    optimizer = optim.Adam(model_noisy.parameters(), lr=lr, weight_decay=weight_decay)
    criterion = nn.MSELoss()

    # Training loop
    for epoch in range(epochs):
        model_noisy.train()
        total_loss = 0.0
        for inputs, _ in train_loader:
            noisy_inputs = add_noise(inputs, noise_factor=0.3)
            noisy_inputs = noisy_inputs.to(device)
            optimizer.zero_grad()
            outputs = model_noisy(noisy_inputs)
            loss = criterion(outputs, noisy_inputs)
            total_loss += loss.item() * noisy_inputs.size(0)
            loss.backward()
            optimizer.step()

        avg_loss = total_loss / len(train_loader.dataset)
        val_loss = validation_loss(model_noisy, test_loader, criterion, device)
        trial.report(val_loss, step=epoch)

        # Prune bad trials
        if trial.should_prune():
            raise optuna.TrialPruned()

    return val_loss

# Perform hyperparameter tuning with Optuna
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
study_noise = optuna.create_study(direction='minimize')
study_noise.optimize(objective, n_trials=5)

# Get the best hyperparameters
best_params_noise = study_noise.best_params
best_lr_noise = best_params_noise['lr']
best_weight_decay_noise = best_params_noise['weight_decay']

[I 2023-08-03 09:22:20,575] A new study created in memory with name: no-name-fb819400-47d6-4050-b968-22bede10e730
[I 2023-08-03 09:23:49,815] Trial 0 finished with value: 0.15506750931739807 and parameters: {'lr': 2.527315894917555e-05, 'weight_decay': 0.00011884404253595709}. Best is trial 0 with value: 0.15506750931739807.
[I 2023-08-03 09:25:16,334] Trial 1 finished with value: 0.14672551219463348 and parameters: {'lr': 0.0002107086947761691, 'weight_decay': 5.124743572408872e-05}. Best is trial 1 with value: 0.14672551219463348.
[I 2023-08-03 09:26:43,182] Trial 2 finished with value: 0.15658902175426484 and parameters: {'lr': 3.8889170964738805e-05, 'weight_decay': 0.00042355218511782087}. Best is trial 1 with value: 0.14672551219463348.
[I 2023-08-03 09:28:10,159] Trial 3 finished with value: 0.15462566313743592 and parameters: {'lr': 0.00241960635034059, 'weight_decay': 0.00033288611852949255}. Best is trial 1 with value: 0.14672551219463348.
[I 2023-08-03 09:29:35,809] Trial 4 

In [None]:
# Train the autoencoder model with the best hyperparameters
model_noisy = Autoencoder()
model_noisy.to(device)
optimizer = optim.Adam(model_noisy.parameters(), lr=best_lr_noise, weight_decay=best_weight_decay_noise)
criterion = nn.MSELoss()

for epoch in range(epochs):
    model_noisy.train()
    total_loss = 0.0
    for inputs, _ in test_loader:
        noisy_inputs = add_noise(inputs, noise_factor=0.3)
        noisy_inputs = noisy_inputs.to(device)
        optimizer.zero_grad()
        outputs = model_noisy(noisy_inputs)
        loss = criterion(outputs, noisy_inputs)
        total_loss += loss.item() * noisy_inputs.size(0)
        loss.backward()
        optimizer.step()

    avg_loss = total_loss / len(train_loader.dataset)
    print(f"Epoch {epoch + 1}/{epochs}: Training Loss using Best Hyperparameters: {avg_loss:.4f}")

Epoch 1/5: Training Loss using Best Hyperparameters: 0.0149
Epoch 2/5: Training Loss using Best Hyperparameters: 0.0078
Epoch 3/5: Training Loss using Best Hyperparameters: 0.0071
Epoch 4/5: Training Loss using Best Hyperparameters: 0.0067
Epoch 5/5: Training Loss using Best Hyperparameters: 0.0065


In [None]:
# Produce the test loss
model_noisy.eval()
test_loss_noise = 0.0
with torch.no_grad():
    for inputs, _ in test_loader:
        noisy_inputs = add_noise(inputs, noise_factor=0.3)
        noisy_inputs = noisy_inputs.to(device)
        outputs = model_noisy(noisy_inputs)
        loss = criterion(outputs, noisy_inputs)
        test_loss_noise += loss.item() * noisy_inputs.size(0)

test_loss_noise /= len(test_loader.dataset)
print(f"Test Loss: {test_loss_noise:.4f}")

Test Loss: 0.0323


Comparison of Test Loss Before Adding Noise and Test Loss After Adding Noise

In [None]:
from prettytable import PrettyTable

# Compare the test losses
table = PrettyTable()
table.field_names = ["Test Loss", "Test Loss with Noise"]
table.add_row([f"{test_loss:.4f}", f"{test_loss_noise:.4f}"])

print(table)

+-----------+----------------------+
| Test Loss | Test Loss with Noise |
+-----------+----------------------+
|   0.1437  |        0.0323        |
+-----------+----------------------+
