HHU Deep Learning, SS2025, Prof. Dr. Markus Kollmann

Tutoring is done by Nikolas Adaloglou and Felix Michels.

# Assignment 01 - Representation Learning with AutoEncoders

---

Submit the solved notebook (not a zip) with your full name plus assingment number for the filename as an indicator, e.g `max_mustermann_a1.ipynb` for assignment 1. If we feel like you have genuinely tried to solve the exercise and submitted code for each task, you will receive 1 point for this assignment, regardless of the quality of your solution.

If you cannot proceed past some part you need to ask on RocketChat ahead of the deadline.


## <center> DUE FRIDAY 25.04.2024 2:30 pm </center>

Drop-off link: [https://uni-duesseldorf.sciebo.de/s/UTrFcW2xJVbV4XO](https://uni-duesseldorf.sciebo.de/s/UTrFcW2xJVbV4XO)

---



## Introduction to Autoencoders

An autoencoder (AE) is a neural network that learns to reconstruct its input data by compressing the input into a lower-dimensional representation, which is called a **bottleneck** or latent representation. The basic architecture of an autoencoder consists of two main components: an encoder and a decoder. The encoder takes the input data and maps it to the bottleneck representation. The decoder then takes this bottleneck representation and maps it back to the original input space.

During training, the autoencoder tries to minimize the reconstruction loss between the original input and the reconstructed output. The weights of the encoder and decoder are adjusted during training to minimize this reconstruction error. By minimizing the reconstruction loss, the autoencoder learns to compress the input data into a lower-dimensional representation while preserving the essential information.


Autoencoders can be used for representation learning by leveraging the encoder as a feature extractor. The learned features can be more informative and discriminative than the original input data, especially when it has a high dimensionality. Autoencoders can also be used for transfer learning, where the pre-trained encoder can be fine-tuned on a new dataset for a different task.


The learned features can be used for downstream tasks such as classification, clustering, or regression.

In this exercise, we will focus on image classification. The bottleneck representation learned by the autoencoder will be used as a feature vector for a classifier.





# Part I. Basic imports


In [None]:
import os
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import torch.utils.data as data
import random
import matplotlib.pyplot as plt
from torchvision import transforms as T
from tqdm.notebook import tqdm

# Getting the data


In [None]:

def load_data( batch_size=128, train_split="unlabeled", test_split="test"):
    # Returns a train and validation dataloader for STL10 dataset
    ### START CODE HERE ### (≈ 6 lines of code)
    ### END CODE HERE ###
    return train_dl, val_dl

def imshow(img, i=0, mean=torch.tensor([0.0], dtype=torch.float32), std=torch.tensor([1], dtype=torch.float32)):
    """
    shows an image on the screen. mean of 0 and variance of 1 will show the images unchanged in the screen
    """
    # undoes the normalization
    unnormalize = T.Normalize((-mean / std).tolist(), (1.0 / std).tolist())
    npimg = unnormalize(img).numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

def test_load_data():
    train_dl, val_dl = load_data(batch_size=16)
    for i, (x, y) in enumerate(train_dl):
        print(x.shape, y.shape)
        plt.figure(figsize=(6, 6))
        imshow(x[0,...])
        break
    for i, (x, y) in enumerate(val_dl):
        print(x.shape, y.shape)
        plt.figure(figsize=(6, 6))
        imshow(x[0,...])
        break

test_load_data()

# Build the AutoEncoder

- Given in_channels, base_channels, n_layers, kernel_size,stride,padding
- Build a symmetrical autoencoder with convolutions and transpose convolutions.
- For all the intermediate layers, use the ReLU function. Sigmoid for the output.
- Base channels are the out channels of the first conv layer in the encoder.
- Each subsequent encoder layer doubles the channels of the previous.
- Each subsequent decoder layer halves the channels of the previous.
- The implementation should support a varying number of layers (`n_layers`)
- The Conv and Transpose conv take the same kernel size, stride and padding.
- Choose kernel size and padding to halve the feature dimension in the encoder.

#### Hints:
You can use the `layers = nn.ModuleList()` to append the layers and then plug them in `nn.Sequential(*layers)`


In [None]:
class Autoencoder(nn.Module):
    def __init__(self, in_channels, base_channels, n_layers, kernel_size, stride, padding):
        """Builds the autoencoder model

        Args:
            in_channels: number of input channels (3 for RGB images)
            base_channels: number of channels in the first conv layer of the encoder
            n_layers: number of conv/transp_conv layers in the encoder/decoder
            kernel_size: kernel size for the conv/transp_conv layers
            stride: stride for the conv/transp_conv layers
            padding: padding for the conv/transp_conv layers
        """
        super(Autoencoder, self).__init__()
        # Optional to use nn.ModuleList
        enc_layers = nn.ModuleList()
        dec_layers = nn.ModuleList()
        ### START CODE HERE ### (≈ 2 lines of code)
        ### END CODE HERE ###
        self.encoder = nn.Sequential(*enc_layers)
        self.decoder = nn.Sequential(*dec_layers)

    def forward(self, x):
        bottleneck = self.encoder(x)
        reconstruction = self.decoder(bottleneck)
        return reconstruction

    ### START CODE HERE ###
    ### END CODE HERE ###

def test_Autoencoder():
    base_channels=4
    n_layers = 4
    autoencoder = Autoencoder(base_channels=base_channels, n_layers=n_layers,
                                in_channels=5, kernel_size=3, stride=2, padding=1)
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    autoencoder.to(device)
    x = torch.randn(1, 5, 32, 32).to(device)
    reconstruction = autoencoder(x)
    bottleneck = autoencoder.encode(x)
    assert bottleneck.shape == (1, 32 , 2, 2), "Bottleneck shape is not correct"
    print("Autoencoder test passed!")

test_Autoencoder()

# Validate function


First, we will implement the validation function. We will use later it in the train function. The idea is to test our model on the (unseen) validation data in regular intervals during training, e.g. after every epoch (traversal of the whole dataset).

`Exercise`: Implement the validation function, which reads in data from the val_loader, runs a forward pass and records the loss. The validation loop typically follows these steps:
- Set the model to evaluation mode. This is important because it disables any dropout or batch normalization layers, which are used during training but not during inference.
- Iterate through the validation loader.
- For each batch, pass the input data through the model to get the predicted outputs.
- Compare the predicted outputs with the ground-truth labels to calculate a loss value.
- Aggregate the loss values across all batches to get a total validation loss.

In [None]:
def validate(model, val_loader,criterion, device):
    ### START CODE HERE ### (≈ 12 lines of code)
    ### END CODE HERE ###
        return val_loss_epoch

# Train one epoch function

The training loop typically follows these steps:
- Set the model to training mode.
- Iterate through the training dataset in batches.
- Forward propagation: For each batch, pass the input data through the model to get the predicted outputs.
- Backward propagation: Compute loss function:Calculate the loss value between the predicted outputs and the ground-truth labels using a loss function.
- Calculate the gradients of the loss with respect to the model parameters.
- Update parameters: Update the model parameters based on the gradients.

In [None]:
def train_one_epoch(model, optimizer, train_loader, criterion, device):
    ### START CODE HERE ### (≈ 12 lines of code)
    ### END CODE HERE ###
    return loss_curr_epoch

# Train code

Combine train_one_epoch and validation code into a single script.

A dictionary should be returned with the train and validation loss per epoch.

Save the model with the smallest validation loss

In [None]:
def save_model(model, path, epoch, optimizer, val_loss):
    torch.save({
        'epoch': epoch,
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'loss': val_loss,
        }, path)

def train(model, optimizer, num_epochs, train_loader, val_loader, criterion, device):
    dict_log = {"train_loss":[], "val_loss":[]}
    ### START CODE HERE ### (≈ 12 lines of code)
        ### END CODE HERE ###
    return dict_log

# Launch training!
Choose the model and hyperparameters and launch the training.
The training should not take more than 1h (26mins with our setup).

In [None]:
### START CODE HERE ###
### END CODE HERE ###

### Expected result
Val Loss $\sim 0.003$

# Loading the best model and visualizing reconstructions

In [None]:
def load_model(model, path):
    checkpoint = torch.load(path)
    model.load_state_dict(checkpoint['model_state_dict'])
    print(f"Model {path} is loaded from epoch {checkpoint['epoch']} , loss {checkpoint['loss']}")
    return model

def get_reconstruction(autoencoder, img_batch):
    autoencoder.eval()
    autoencoder = autoencoder.to(device)
    img_batch = img_batch.to(device)
    reconstruction = autoencoder(img_batch)
    reconstruction = reconstruction.cpu().detach()
    return reconstruction

### START CODE HERE ### (≈ 8 line of code)
### END CODE HERE ###


# Plotting code is provided for you
batch = img_batch.shape[0]
plt.figure(figsize=(14,10))
plt.subplots_adjust(wspace=0.1, hspace=0.3)
for i in range(batch):
    plt.subplot(batch,4, i*batch+1)
    plt.title(f"Original Image")
    imshow(img_batch[i,...])
    plt.subplot(batch,4, i*batch+2)
    imshow(reconstructions[i,...])
    plt.title(f"Reconstruction")
    plt.subplot(batch,4, i*batch+3)
    plt.imshow(l1_diff[i,...], cmap='gray')
    plt.title(f"L1 Diff")
    plt.subplot(batch,4, i*batch+4)
    plt.imshow(l2_diff[i,...], cmap='gray')
    plt.title(f"L2 Diff")
plt.savefig("viz_reconstructions.png")
plt.show()

 ### Expected result

![im1](https://github.com/HHU-MMBS/RepresentationLearning_PUBLIC_2024/raw/main/exercises/week01/figs/viz_reconstructions_solution.png)

# Get the features from the trained encoder

In [None]:
# Provided
def get_features(model, dataloader, device):
    model = model.to(device)
    feats, labs = [], []
    for i in dataloader:
        inp_data,labels = i
        inp_data = inp_data.to(device)
        features = model(inp_data)
        features = features.cpu().detach().flatten(start_dim=1)
        labels = labels.cpu().detach()
        feats.append(features)
        labs.append(labels)
    f = torch.cat(feats, dim=0)
    l = torch.cat(labs, dim=0)
    return f,l

### START CODE HERE ### (≈ 8 line of code)
### END CODE HERE ###

# Linear evaluation: Probing

Linear evaluation or probing is a technique used in deep learning to evaluate the quality of the learned representations in pre-trained neural networks. The basic idea is to train a linear classifier on top of a pre-trained network and use it to perform a downstream task that can be different from the pre-training objective.

The pre-trained network is usually trained on a large dataset with the aim of learning useful representations that can be reused for other tasks. By training a linear classifier on top of the pre-trained network, we can evaluate how well the learned representations capture the relevant information for the new task. If the pre-trained network has learned useful representations, we should be able to achieve good performance on the new task without having to train the entire network from scratch.

Below you have to used the saved features and train a liner classifier on top on the training split of STL10.


```python
def train_one_epoch(model, optimizer, train_loader, device)
def validate(model, val_loader, device)
def linear_eval(model, optimizer, num_epochs, train_loader, val_loader, device)
```

In [None]:
### START CODE HERE ### (>20 line of code)
### END CODE HERE ###

### Expected results (< 1 minute training)
Ep 29/30: Accuracy : Train:67.76 	 Val:46.95

# Conclusion and Bonus reads

That's the end of this exercise. If you reached this point, congratulations!

If you are interested to delve into this topic further, here are some links:

- [Recent Advances in Autoencoder-Based Representation Learning](https://arxiv.org/abs/1812.05069)
- [Understanding Representation Learning With Autoencoder ](https://neptune.ai/blog/representation-learning-with-autoencoder)
- [Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations ](https://pubmed.ncbi.nlm.nih.gov/30842833/)
