Optional: Autoencoder (AE)
================


The Autoencoder is an important Neural Network architecture that can be trained unsupervised (no labels required) and can learn more sophisticated representations than linear Principle Component Analysis. 
Autoencoder are used for many applications such as:

- Image compression
- Feature learning
- Image denoising or reconstruction
- Unsupervised pretraining of Neural Networks

An vanilla Autoencoder consists of an Encoder, a latent representation, and a Decoder. 
In this setup, the full resolution image is fed to the encoder that transforms the input into a lower dimensional latent representation vector. The dimensionality of the vector determine how much the initial information is compressed. Then, the decoder tries to reconstruct the initial image based on the latent representation vector. We train the network by comparing the reconstructed image to the original image and derive a loss from the difference. Therefore, we can train these networks without any supervision in the form of labels.

The architecture of an autoencoder is demonstrated in the following image:
![autoencoder.png](https://blog.keras.io/img/ae/autoencoder_schema.jpg)

We can test an autoencoder, by taking a trained decoder and randomly sample a bottleneck vector. By feeding this vector into the network, we can generate "new" dataset examples.

Goal of this notebook
========

By now, you always had a very structured notebook to follow. However, you will not have this luxury in your upcoming projects beyond this class. In order to deal with new tasks, it is important that you learn how to deal with new frameworks and data. To complete this notebook you have to:
1. Deal with an unknown dataset.
2. Learn how to use the pytorch dataloader class to efficiently manipulate your data and prepare it for training and testing
3. Check out the pytorch neural network class.
4. Write your own solver for a different task than classification.
5. Understand the basic principles of Autoencoders.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import torch
from torch.autograd import Variable

#torch.set_default_tensor_type('torch.FloatTensor')
#set up default cuda device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

## 1. Data: MNIST

[MNIST](http://yann.lecun.com/exdb/mnist/), developed by Yann LeCun et al., is one of the oldest datasets around and is still used to get into machine learning. It contains grayscale images of single digit numbers of size `28x28x1` and was originally used for postal code classification.

Don't forget to check out the dataset code from our current dataset, CIFAR10, which we provided in `data_utils.py` that you can find in the `exercise_code` folders of our exercise, to implement a similar pipeline for MNIST.

Complete the following steps:
1. Download the dataset either from the [official homepage](http://yann.lecun.com/exdb/mnist/) or any other source and try to load it into python so that you can print the contents as numpy arrays similar to how we started with the CIFAR10 dataset. As before, you should consider writing a dataset visualization.
2. Use the official [documentation](http://pytorch.org/docs/data.html) to make yourself familiar with the `Dataset` class. It offers more sophisticated methods to load data which we don't require for MNIST but is a good practice for bigger datasets later on. Write a dataset class for your MNIST images and think about which transformations you would like to use.
3. Use the `DataLoader` class to finish the data loader that we will use for training. It offers sampling and loading procedures which can easily be multithreaded.

In the end, your dataloader should return an array of shape `(batch_size, 28, 28)` or similar that contains normalized values between `-1` and `1`.

## 2. Network: First Steps

Please define your `class Autoencoder` as a `nn.Module` subclass and start with a simple linear implementation and use it to debug the solver class. Later on, move back to this step and refine the architecture, especially with a bottleneck.

In [10]:
############################################################################
# Example: Autoencoder class with a single linear layer and no bottleneck  #
############################################################################
class AutoencoderModel(torch.nn.Module):
    def __init__(self, shape=(28, 28)):
        super(AutoencoderModel, self).__init__()
        shape_prod = shape[0] * shape[1]
        self.linear_layer = torch.nn.Linear(shape_prod, shape_prod)
    
    def forward(self, x):
        # We assume that the input is of shape (batch_size, 28, 28) therefore
        # we reshape the array to (batch_size, 28x28)
        shape = input.size()
        x = x.view(input.size(0), -1)
        x = self.linear_layer(x)
        # Reshape back to input size
        x = x.view(*shape)
        return x
                       
model = AutoencoderModel()
############################################################################
#                             END OF YOUR CODE                             #
############################################################################

Steps:
1. Take one sample from the dataloader and feed it through the network. Visualize the input and output. The output will be random noise. (Optional: You can also check out if you can set the weights so that the linear layer above will be the identity.)
2. Currently, the output can have any output values. Restrict the output of our network using a appropriate non-linearity so that they are again between `-1` and `1` as our data (see [pytorch layers for candidates or check out the lecture](https://pytorch.org/docs/stable/nn.html)).
3. Go directly to the solver and code that up first. Afterwards you can come back to the network architecture

## 3. Autoencoder-Solver

The solver should replicate the reconstruction autoencoder training procedure which is as follows:
1. Take a random batch from the dataset.
2. Feed it into our network.
3. Compare the network output with the original batch.
4. Backpropagate the error.

We also want to evaluate the network performance on new data by computing the validation loss on our validation set.

You will need to check out pytorch optimizers and look for some way to visualize your training with either the code we already written in the previous exercises or online with some additional libraries such as [tensorboardX](https://github.com/lanpa/tensorboardX).

### Loss function and accuracy
As we want to compare our reconstructed image with the input, it is ill-advised to use crossentropy as our loss function. A more appropriate candidate would be a `L1 Loss` or something similar. You can check out the pytorch losses [here](https://pytorch.org/docs/stable/_modules/torch/nn/modules/loss.html).

By now we always used validation accuracy as our metric of choice. However, this no longer applies to our current situation as we are comparing images and accuracy does not represent our task as well as in classification. Therefore, we will only compare losses and there is no need to display accuracies.

### Solver tasks
1. Network training procedure
2. Sets optimizer and loss function
3. Gives training visualization either online or offline after training using graphs
4. Saves the trained network after training

## 4. Test the pipeline

1. Augment your dataloader so that it only returns a single image
2. Run a training procedure to overfit the simple model from above using your new solver on this single image. This will make it very easy to debug your training procedure.

Note: if you use the L1 loss, you won't be able to overfit completely. Why?

## 5. Network: Autoencoder

Now that we have a working solver, we can start with more sophisticated networks.

1. Split up your single network into an encoder and a decoder. The encoder takes as input the original image and returns a smaller representation by decreasing the number of neurons at the last layer. The decoder takes this bootlneck input and returns a tensor of the form of the input.
2. Overfit this network on a single test image.
3. Train on the whole dataset.
4. How small can you make the bottleneck while still being able to reconstruct the original image well enough?

## 6. Test your network

Now that we have a working autoencoder, we can test it by loading the decoder and take a random tensor of the shape of our bottleneck. We feed this random tensor into the decoder and it will present us with a "new" image of our dataset.

## Optional:
1. Do classification on MNIST
2. Improve classification accuracy: take a smaller subset of MNIST, take your trained encoder, extend it to work as a classification network and compare to an untrained version of itself by training it on the smaller subset. During the autoencoder training it should have learned various features which make it very easy to learn using way less training images in comparison to a newly initialized network.

Merry Christmas and a happy new year!