# Autoencoders (AE)

The **MNIST dataset** consists of 60,000 images of hand written digit, where each image has size 28X28.

We will define and train two **autoencoders (AEs)** on the MNIST dataset:
- A simple autoencoder composed of linear layers
- A more complex autoencoder composed of convolutional layers.

<img src="files/figures/mnist.png" width="600px"/>

## 0. Librairies

In [2]:
import numpy as np

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

import matplotlib.pyplot as plt

## 1. Load MNIST data

- The following code loads the MNIST data.
- Note that we don't need any validation set, since we are in an unsupervised setting.

In [None]:
# data
transform = transforms.ToTensor()

mnist_data = datasets.MNIST(root='./data', train=True,
                            download=True,
                            transform=transform)

In [None]:
# dataloader
train_loader = torch.utils.data.DataLoader(dataset=mnist_data, batch_size=64, shuffle=True)

In [None]:
len(train_loader) * 64

In [None]:
# data format
data_iter = iter(train_loader)
images, labels = next(data_iter)
images.shape, labels.shape

In [None]:
torch.min(images), torch.max(images)

## 2. AE with linear layers

### Model

The following class implements an **autoencoder (AE)** composed of **linear layers**.

- Understand the architecture of this **autonecoder (AE)**.

In [None]:
torch.cuda.is_available()

In [None]:
class Autoencoder_Linear(nn.Module):
    """Implements an linear automencoder"""

    def __init__(self):
        """constructor"""

        super().__init__()

        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 12),
            nn.ReLU(),
            nn.Linear(12, 2) # -> N, 2 only!
        )

        self.decoder = nn.Sequential(
            nn.Linear(2, 12),
            nn.ReLU(),
            nn.Linear(12, 64),
            nn.ReLU(),
            nn.Linear(64, 128),
            nn.ReLU(),
            nn.Linear(128, 28 * 28),
            nn.Sigmoid()             # output neurons between 0 and 1
        )

    def forward(self, x):
        """forward pass"""
        
        encoded_data = self.encoder(x)
        decoded_data = self.decoder(encoded_data)

        return decoded_data

### Training

- Define a **loss (`MSEloss`)** and an **optimizer (`torch.optim.Adam`)** with learning rate `lr=1e-3` for this model.

- Implement a **training loop** for this model during $24$ epochs. During training, after each epoch, store the current `epoch`, `inputs`, `outputs` and `train_loss` of the model (in this order) in a list called `outputs_l`.

### Results

- Plot the training loss to confirm that the model has been trained.

- Run the following code which displays the inputs and their reconstructions by the autoencoder at epochs $1$, $12$ and $24$. The reconstructions should improve as the epochs increase.

In [None]:
fig, axs = plt.subplots(6, 10, 
                        figsize=(10, 7), 
                        layout="constrained",
                        gridspec_kw={'wspace': 0.1, 'hspace': 0.1})

for i, k in enumerate([0, num_epochs//2, num_epochs-1]):

    inputs = outputs_l[k][1].detach().numpy()
    outputs = outputs_l[k][2].detach().numpy()

    axs[2*i, 0].set_title(f"Epoch {k+1}")
    axs[2*i, 0].set_ylabel("Inputs")

    for j, item in enumerate(inputs):

        if j >= 10: break
        item = item.reshape(-1, 28,28) # for Autoencoder_Linear
        axs[2*i, j].imshow(item[0])
        axs[2*i, j].set_xticks([])
        axs[2*i, j].set_yticks([])

    axs[2*i + 1, 0].set_ylabel("Outputs")
    for j, item in enumerate(outputs):

        if j >= 10: break
        item = item.reshape(-1, 28,28) # for Autoencoder_Linear
        axs[2*i + 1, j].imshow(item[0])
        axs[2*i + 1, j].set_xticks([])
        axs[2*i + 1, j].set_yticks([])

fig.savefig('figures/AE_linear.pdf', format='pdf', bbox_inches='tight')
plt.show()

## 3. AE with convolutional layers

### Model

The following class implements an **autoencoder (AE)** composed of **convolutional layers**.

- Understand the architecture of this **autonecoder (AE)**.

In [None]:
class Autoencoder_CNN(nn.Module):
    """Implements a CNN autoencoder"""

    def __init__(self):
        """constructor"""

        super().__init__()

        # N, 1, 28, 28
        self.encoder = nn.Sequential(
            # -> N, 16, 14, 14
            nn.Conv2d(1, 16, 3, stride=2, padding=1),
            nn.ReLU(),
            # -> N, 32, 7, 7
            nn.Conv2d(16, 32, 3, stride=2, padding=1),
            nn.ReLU(),
            # -> N, 64, 1, 1
            nn.Conv2d(32, 64, 7)
            )

        # N , 64, 1, 1
        self.decoder = nn.Sequential(
            # -> N, 32, 7, 7
            nn.ConvTranspose2d(64, 32, 7),
            nn.ReLU(),
            # N, 16, 14, 14 (N, 16, 13, 13 without output_padding)
            nn.ConvTranspose2d(32, 16, 3, 
                               stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            # N, 1, 28, 28 (N,1, 27, 27)
            nn.ConvTranspose2d(16, 1, 3, 
                               stride=2, padding=1, output_padding=1),
            nn.Sigmoid()
            )

    def forward(self, x):
        """forward function"""

        encoded_data = self.encoder(x)
        decoded_data = self.decoder(encoded_data)

        return decoded_data


# Note: nn.MaxPool2d -> use nn.MaxUnpool2d, or use different kernelsize, stride etc to compensate...
# Input [-1, +1] -> use nn.Tanh

### Training

- Define a **loss** (`MSEloss`) and an **optimizer** (`torch.optim.Adam`) with learning rate `lr=1e-3` for this model.

- Implement a **training loop** for this model during $12$ epochs. During training, after each epoch, store the current `epoch`, `inputs`, `outputs` and `train_loss` of the model (in this order) in a list called `outputs_l`.

### Results

- Plot the training loss to confirm that the model has been trained.

- Run the following code which displays the inputs and their reconstructions by the autoencoder at epochs $1$, $7$ and $12$. The reconstructions should improve as the epochs increase.

In [None]:
fig, axs = plt.subplots(6, 10, 
                        figsize=(10, 7), 
                        layout="constrained",
                        gridspec_kw={'wspace': 0.1, 'hspace': 0.1})

for i, k in enumerate([0, num_epochs//2, num_epochs-1]):

    inputs = outputs_l[k][1].detach().numpy()
    outputs = outputs_l[k][2].detach().numpy()

    axs[2*i, 0].set_title(f"Epoch {k+1}")
    axs[2*i, 0].set_ylabel("Inputs")

    for j, item in enumerate(inputs):

        if j >= 10: break
        axs[2*i, j].imshow(item[0])
        axs[2*i, j].set_xticks([])
        axs[2*i, j].set_yticks([])

    axs[2*i + 1, 0].set_ylabel("Outputs")
    for j, item in enumerate(outputs):

        if j >= 10: break
        axs[2*i + 1, j].imshow(item[0])
        axs[2*i + 1, j].set_xticks([])
        axs[2*i + 1, j].set_yticks([])

fig.savefig('figures/AE_cnn.pdf', format='pdf', bbox_inches='tight')
# plt.show()

## 4. Visualizing the latent space

- Select about 1000 input images, about 100 samples of each input digit (0, 1, 2, ..., 9) (around 1000 / 64 = 16 batches), from your dataset.

- Compute the **encodings** of these samples in the **2D latent space**. For this purpose, note that the **encoder** part of the **autoencoder** `model` can be selected using tthe following instruction: `encoder = model.encoder.eval()`

- Plot the encoded data in the 2D latent space with their original labels in different colors. This will give you an idea of how the different data are distributed in the latent sapce.

## 5. Generative capabilities of the decoder

### 5.1: Uniform sampling from the latent space

- By considering the 1000 encoded data of the preovous exercice, estimate the minimal and maximal coordinates $x$ and $y$ of your encoded data in the 2D-latent space ($x_{min}$, $x_{max}$, $y_{min}$, $y_{max}$).

- Sample 100 2D-points from the latent space according to the uniform distribution $\mathcal{U}([x_{min}, x_{max}], [y_{min}, y_{max}])$.<br>
For this purpose, you can use the instruction:<br> 
`np.random.uniform(low=xy_min, high=xy_max, size=(100,2))`

- Compute the **decoded images** of these 2D-points. For this purpose, note that the **decoder** part of the **autoencoder** `model` can be selected using tthe following instruction: `decoder = model.decoder.eval()`. Store these images in a list called `decoded_samples`.

- At this stage, you have **generated** new images by *sampling* the 2D-latent space and *decoding* these samples. Plot these decoded images. Are these images of good quality? Do they resemble to the digits of the dataset?

### 5.2 Normal sampling from the latent space

- By considering the 1000 encoded data of the preovous exercice, estimate the mean $\mu$ and variance $\sigma^2$ of your encoded data in the 2D-latent space.

- Sample 100 2D-points from the latent space according to the normal distribution $\mathcal{UN}(\mu, \sigma^2)$.<br>
For this purpose, you can use the instruction:<br> 
`np.random.multivariate_normal(mean, cov, (100))`<br>
where `cov` is the diagonal matrix where the diagonal is the variance vector.

- Compute the **decoded images** of these 2D-points. For this purpose, note that the **decoder** part of the **autoencoder** `model` can be selected using tthe following instruction: `decoder = model.decoder.eval()`. Store these images in a list called `decoded_samples`.

- At this stage, you have **generated** new images by *sampling* the 2D-latent space and *decoding* these samples. Plot these decoded images. Are these images of good quality? Do they resemble to the digits of the dataset?

**Conclusion:** In both cases, the generated data are somewhat blurry and some numbers are over-represented.