[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/alexwolson/postdocbootcamp2023/blob/main/lab_2_1_autoencoders.ipynb)


# UofT DSI-CARTE Postdoc Bootcamp
#### Wednesday, July 19, 2023
#### Autoencoders - Lab 1, Day 2
#### Teaching team: Teaching team: Alex Olson, Nakul Upadhya, Shehnaz Islam
##### Lab author: Alex Olson, edited by Jake Mosseri and Shehnaz Islam


Today we are going to learn about a type of deep learning model used for dimensionality reduction and data compression called Autoencoders.

An autoencoder is a special type of neural network with an unusual task: for some input X, all it has to do is return that input X as accurately as possible. But there's a catch, of course! Between the input and the output, the number of nodes in each hidden layer actually gets progressively smaller. This means that in the first half of the network, the network must learn how to represent the input in ever more compact formats. The second half does this in reverse, taking the smallest representation of the input and expanding it back out into the full, original data.

Thus, autoencoders consist of an encoder and a decoder part, which are symmetric in structure. The encoder part compresses the input data into a lower-dimensional representation, while the decoder part tries to reconstruct the original input from this compressed representation.

<img src="https://github.com/lyeskhalil/mlbootcamp/blob/master/img/ae.png?raw=1" alt="cross-val" width="500"/>

Autoencoders are much more powerful than traditional dimensionality reduction methods such as PCA and t-SNE when it comes to learning compact representations of data, as we will see here. This is because while PCA finds linear combinations of the original features to create the lower-dimensional representation, autoencoders have the ability to learn non-linear mappings due to the non-linear activation functions after each layer.

Now lets first implement PCA and give it an autoencoder's task: first, reduce the dimensionality of a dataset, and then recover the full dimensionality of the original input.

We will use MNIST dataset today, through a slightly different mechanism to ease its compatibility with Pytorch. Run the code below to download the dataset:

In [None]:
import torch
import torch.nn as nn
import torch.utils.data as data
import torchvision

import numpy as np

from pathlib import Path

import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
# Path parameters
MNIST_PATH = Path("./mnist/")
DOWNLOAD_MNIST = not MNIST_PATH.exists()

In [None]:
train_data = torchvision.datasets.MNIST(
    root="./mnist/",
    train=True,  # this is training data
    transform=torchvision.transforms.ToTensor(),  # Converts a PIL.Image or numpy.ndarray to
    # torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]
    download=DOWNLOAD_MNIST,  # download it if you don't have it
)

In [None]:
def plot_image(data, index):
    plt.imshow(data.data[index].numpy(), cmap="gray")
    plt.title(f"{data.targets[index]}")
    plt.show()


print(f"Training data size:\t {train_data.data.size()}")  # (60000, 28, 28)
print(f"Training targets size:\t {train_data.targets.size()}")  # (60000)

plot_image(train_data, 2)

In [None]:
from sklearn.decomposition import PCA

# Reshape the data into 2D array
reshaped_data = train_data.data.numpy().reshape(60000, 28 * 28)

# Perform PCA to reduce the data to 2 dimensions
pca = PCA(n_components=2)
pca.fit(reshaped_data)
data_pca = pca.transform(reshaped_data)

# Transform the data back to its original size
data_pca_inv = pca.inverse_transform(data_pca)

# Plot the image in original dimension
plt.imshow(data_pca_inv[2].reshape(28, 28), cmap="gray")
plt.title(f"{train_data.targets[2]}")
plt.show()

**YOUR TURN**
* How does the image look after dimensionality reduction compared to the input? ______
* Why might it look this way? ______

Let's now move on to building an autoencoder for the same task. Autoencoders are easy networks to build, split into the _encoder_, which 'steps' the data down to the final compact representation, and the _decoder_, which is a mirror image of the encoder.


In [None]:
myEncoder = nn.Sequential(
    nn.Linear(28 * 28, 128),
    nn.Tanh(),
    nn.Linear(128, 64),
    nn.Tanh(),
    nn.Linear(64, 12),
    nn.Tanh(),
    nn.Linear(12, 2),
)

As you may expect, designing the structure of the encoder is something of an art, and it requires balance between the time and input data required to train the network, and performance. Here we are using four step-down operations (the linear layers), which take us from the input of size 784 down to just two dimensions at the bottom. Between each step-down layer is a non-linear activation layer.

**YOUR TURN**

It would of course be possible to go straight from the input size to the final number of dimensions, but we would lose an incredibly important aspect of neural networks in doing so.

* What would we miss out on? ______
* Why is this a problem? ______

In the cell below, build the structure of the decoder layer for our network. Remember, this is a mirror image of our encoder!

In [None]:
myDecoder = nn.Sequential(
    # YOUR CODE HERE
)

Finally, we just need some boilerplate class code to bring the whole thing together into a PyTorch network:

In [None]:
class AutoEncoder(nn.Module):
    def __init__(self, encoder, decoder):
        super(AutoEncoder, self).__init__()

        self.encoder = encoder
        self.decoder = decoder

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return encoded, decoded

Now let's create an instance of our network. We also need to define a few other parameters, like the loss function and the optimizer.

For the loss function, we will be using Mean Squared Error, which we covered in the second lab. For the optimizer, let's use an advanced optimizer called Adam:

In [None]:
autoencoder = AutoEncoder(myEncoder, myDecoder)
optimizer = torch.optim.Adam(autoencoder.parameters(), lr=0.005)
loss_func = nn.MSELoss()

We'll use a helper function during training which will pass us the data as we go. It's important to remember that for an autoencoder, the input and the label are identical, so we don't have any labels per se.

In [None]:
train_loader = data.DataLoader(dataset=train_data, batch_size=64, shuffle=True)

Our training function is going to show us the recovered images at the end of each epoch, to help us get an idea of how the training process is going. Beforehand, we will just plot out five of the digits so we can compare our autoencoder's output to what the target looks like:

In [None]:
def plot_images(view_data, n_images=5):
    f, a = plt.subplots(1, n_images, figsize=(5, 2))
    for i in range(n_images):
        a[i].imshow(np.reshape(view_data.data.numpy()[i], (28, 28)), cmap="gray")
        a[i].set_xticks(())
        a[i].set_yticks(())


N_TEST_IMG = 5
view_data = (
    train_data.data[:N_TEST_IMG].view(-1, 28 * 28).type(torch.FloatTensor) / 255.0
)
plot_images(view_data, N_TEST_IMG)

OK, now we're ready to train! Before running this code, make sure you understand what is happening. The comments should help. Once you feel comfortable, run the cell! It will take a little while to complete, but you will get progress updates as it goes.

In [None]:
def plot_decoded_images(autoencoder, view_data, n_images=5):
    # Helper function to plot decoded images
    f, a = plt.subplots(1, n_images, figsize=(5, 2))
    _, decoded_data = autoencoder(view_data)
    for i in range(n_images):
        a[i].clear()
        a[i].imshow(np.reshape(decoded_data.data.numpy()[i], (28, 28)), cmap="gray")
        a[i].set_xticks(())
        a[i].set_yticks(())
    plt.draw()
    plt.pause(0.1)

In [None]:
n_epochs = 25

for epoch in range(n_epochs):
    for step, (x, _) in enumerate(train_loader):
        b_x = x.view(-1, 28 * 28)  # batch x, shape (batch, 28*28)
        b_y = x.view(-1, 28 * 28)  # batch y, shape (batch, 28*28)

        encoded, decoded = autoencoder(b_x)

        loss = loss_func(decoded, b_y)  # calculate error
        optimizer.zero_grad()  # reset the gradients, otherwise they will accumulate between epochs
        loss.backward()  # compute gradients and backpropagate loss
        optimizer.step()  # apply gradients to update our parameters

    print(f"Epoch: {epoch} | train loss: {loss.data.numpy():.4f}")
    plot_decoded_images(autoencoder, view_data, N_TEST_IMG)

The third image in the column is our Autoencoder's recovered version of the number 4 we looked at with PCA. How does it look? Could it be better?

We can also plot the encoded data of the autoencoder, to see how it has represented the data in lower 2 dimensional space:

In [None]:
def plot_encoded_data(X, Y, values):
    plt.figure(figsize=(16, 10))
    plt.scatter(
        X,
        Y,
        c=values,
        edgecolor="none",
        alpha=0.5,
        cmap=plt.colormaps.get_cmap("nipy_spectral"),
    )
    plt.xlabel("component 1")
    plt.ylabel("component 2")
    plt.colorbar()


view_data = train_data.data.view(-1, 28 * 28).type(torch.FloatTensor) / 255.0

# Get encoded data
encoded_data, _ = autoencoder(view_data)

# Get the X, Y dimensions and values
X, Y = (
    encoded_data.data[:, 0].numpy(),
    encoded_data.data[:, 1].numpy(),
)  # 2 Dimensions of encoded data
values = train_data.targets.numpy()

# Plot encoded data
plot_encoded_data(X, Y, values)

Now we are going to plot different outputs based on changes to the representation (encoded) space. We will loop through all different combinations between -1 and 1 to see what the autoencoder has learned.

In [None]:
from torch import Tensor


def plot_decoded_image(ax, autoencoder, encoded_vector):
    decoded_image = autoencoder.decoder(encoded_vector).detach().numpy().reshape(28, 28)
    ax.imshow(decoded_image, cmap="gray")
    ax.set_xticks(())
    ax.set_yticks(())


f, a = plt.subplots(9, 9, figsize=(12, 12))
for i, v in enumerate(np.linspace(1, -1, 9)):
    for j, k in enumerate(np.linspace(-1, 1, 9)):
        plot_decoded_image(a[i, j], autoencoder, Tensor([k, v]))

Let's utilize the autoencoder's learned embedding space to classify digits. We'll convert input images into embedded features using the encoder of the autoencoder, then pass them to a classifier with their corresponding labels for training. You have the flexibility to choose a classifier such as logistic regression or others. Part of the code is provided below.

**YOUR TURN:**
* Implement a classifier of your choice to train on the input embeddings and labels. You can use a simple classifier such as logistic regression classifier or can try other classifiers of your choice. _________
* What is the prediction accuracy of the classifier on the test set? _________

In [None]:
def preprocess_data(data, start, end):
    """Preprocesses the data by reshaping and normalizing it, and returns the labels."""
    processed_data = (
        data.data[start:end].view(-1, 28 * 28).type(torch.FloatTensor) / 255.0
    )
    labels = data.targets[start:end].numpy()
    return processed_data, labels


n_samples_train = 100
n_samples_test = 10

# Preprocess the train and test sets
train_set, train_labels = preprocess_data(train_data, 0, n_samples_train)
test_set, test_labels = preprocess_data(
    train_data, n_samples_train, n_samples_train + n_samples_test
)

# Obtain embedded representations of train set images using the encoder part of the autoencoder
embedded_train = autoencoder.encoder(train_set).detach().numpy()

In [None]:
# YOUR CODE HERE
# Train your chosen classifier using embedded features and corresponding labels

In [None]:
# YOUR CODE HERE
# Compute the prediction accuracy of the classifier on the test set