# Chapter 15: Autoencoders

## Chapter Exercises

### 1. What are the main tasks that autoencoders are used for?

Autoencoders are neural networks which are trained to reproduce their output. Rheir hidden layers have a lower dimensionality than the signals, so the model needs to learn a "coding" of the training set in the hidden layers. From these codings it should be possible to reproduce instances of the input set, or event create entirely new instances that appear to be sampled from the original training set.

### 2. Suppose you want to train a classifier and have plenty of unlabeled training data, but only a few thousand labeled instances. How could autoencoders help? How would you proceed?

First you would train an autoencoder to reproduce the unlabeled training data. The lower layers and coding layer can be reused as the lower layers of a classification neural network. The classifier will be able to get reasonable performance because of the unsupervised pretraining you did with the autoencoder.

### 3. If an autoencoder perfectly reconstructs the inputs, is it necessarily a good autoencoder? How can you evaluate the performance of the autoencoder?

Not necessarily. The model could have just learned to reproduce the training set and cannot generalize to new data, i.e. it is overfitting the training data. One way to see if an autoencoder can generalize is to record the value of the cost function when trying to reproduce new data from the test set.

### 4. What are undercomplete and overcomplete autoencoders? What is the main risk of an excessive undercomplete autoencoder? What about the main risk of an overcomplete autoencoder?

An undercomplete autoencoder's coding layer has a lower dimensionality than the input features. This forces the model to learn general patterns about the training set since it must learn to represent the information in a more dense representation.

An overcomplete autoencoder's coding layer has a higher dimensionality than the input features. This means it is capable of trivially reproducing the input features because it has more degrees of freedom than the input. This is why we enforce sparsity loss on overcomplete autoencoders, to prevent overfitting.

### 5. How would you tie weights in a stacked autoencoder? What is the point of doing so?

If you have a stacked autoencoder with $N$ hidden layers, let $W_L$ be the weights (or kernel) of the $L$<sup>th</sup> layer. You tie the weights of an autoencoder by ensuring that

$$ W_{N - L + 1} = W_L^{\;T} $$

You can see an example of an autoencoder that ties weights in `Autoencoders.ipynb`.

### 6. What is a common technique to visualize features learned by the lower layer of a stacked autoencoder? What about the higher layers?

One way to visualize the lower layer features is to reshape the weight vectors of the coding layers to the shape of the input features and plot them, see **Visualizing the Extracted Features** in `Autoencoders.ipynb` for an example. 

To visualize the features you can see which training instances activate each neuron in the higher layers the most.

### 7. What is a generative model? Can you name a type of generative autoencoder?

A generative model are probabilistic models whose outputs depend on change, both during training and during producing (or "generating") new data that looks like it was sampled from the training set. The model also learns the parameters for the distribution of the random variables as part of training.

One type of generative autoencoder is a _variational autoencoder_, see `Autoencoders.ipynb` for an example of a variational autoencoder for the MNIST dataset.

### 8. Let's use a denoising autoencoder to pretrain an image classifier.

#### You can use MNIST (simplest), or another large set of images such as [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) if you want a bigger challenge. If you choose CIFAR10 you need to write code to load batches of images for traiing. If you want to skip this part, [TensorFlow's model zoo contains tools to do just that](https://github.com/tensorflow/models/blob/master/research/slim/download_and_convert_data.py).

Since the MNIST dataset is used many places in this repository, I am going to use the CIFAR10 dataset and train a convolutional autoencoder since the final task is to train an image classifcation model.

In [10]:
# Downloading the data. The code here is based on the code from
# https://github.com/tensorflow/models/blob/master/research/slim/datasets/download_and_convert_cifar10.py

from six.moves import urllib
import os
import sys
import tarfile

!mkdir -p 'data'
data_dir = 'data/'

data_url = 'https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz'
filename = data_url.split('/')[-1]
filepath = data_dir + data_url.split('/')[-1]

if not os.path.exists(filepath):
  def progress(count, block_size, total_size):
    sys.stdout.write('\r>> Downloading %s %.1f%%' % \
        (filename, float(count * block_size) / total_size * 100))
    sys.stdout.flush()
  filepath, _ = urllib.request.urlretrieve(data_url, filepath, progress)
  tarfile.open(filepath, 'r:gz').extractall(data_dir)

>> Downloading cifar-10-python.tar.gz 100.0%

In [19]:
!ls data/cifar-10-batches-py/

batches.meta  data_batch_2  data_batch_4  readme.html
data_batch_1  data_batch_3  data_batch_5  test_batch


In [0]:
# Transforming the data into a useable form.

import pickle
import numpy as np

cifar_dir = 'data/cifar-10-batches-py/'
batches_path = cifar_dir + 'data_batch_'
test_batch_path = cifar_dir + 'test_batch'

data_batches = []
X_train, y_train = [], []
for i in range(1, 6):
  with open(batches_path + str(i), 'rb') as f:
    D = pickle.load(f, encoding='bytes')
    for data in D[b'data']:
      X_train.append(data)
    for label in D[b'labels']:
      y_train.append(label)

X_test, y_test = [], []
with open(test_batch_path, 'rb') as f:
  D = pickle.load(f, encoding='bytes')
  for data in D[b'data']:
    X_test.append(data)
  for label in D[b'labels']:
    y_test.append(label)

In [0]:
X_train, y_train = np.array(X_train), np.array(y_train)
X_test, y_test = np.array(X_test), np.array(y_test)

In [44]:
print(X_train.shape)
print(y_train.shape)

(50000, 3072)
(50000,)


In [0]:
# Defining a function for generating mini-batches during training.

import numpy.random as rnd

def shuffle_batches(X, y, batch_size):
  rand_idx = rnd.permutation(len(X))
  n_batches = len(X) // batch_size
  for batch_idx in np.array_split(rand_idx, n_batches):
    yield X[batch_idx], y[batch_idx] 

#### Split the dataset into a training set and a test set. Train a deep denoising autoencoder on the full training set.

#### Check that the images are fairly well reconstructed, and visualize the low level features. Visualize the images that most activate each neuron in the coding layer.

#### Build a classification deep neural network, reusing the lower layers of the autoencoder. Train it using only 10% of the training set. Can you get it to perform as well as the same classifier trained on the full training set?

### 9. [_Semanting hashing_](http://www.cs.toronto.edu/~rsalakhu/papers/sdarticle.pdf) is a technique used for efficient _information retrieval_: a document (e.g. an iamges) is passed throug a system (typically a neural network) that outputs a low-dimensional binary vector (e.g. 30 bits). Indexing each document using its hash, it is possible to retrieve many documents similar to the particular document almonst instantly. Let's implement semantic hashing using a slightly tweaked stacked autoencoder.

#### Create a stacked autoencoder containng two hidden layers below the coding layer, and train it in the image dataset you used in the previous exercis. The coding layer should contain 30 neurons and use the logistic activation function to output values between 0 and 1. After training, to produce the hash of an image, you can simply run it through the autoencoder, take the output of the coding layer and round the output of each neuron to 0 or 1.

#### One trick proposed by Salakhutdinov and Hinton is to add Gaussian noise (with zero mean) to the inputs of the coding layer during training. In order to preserve a high signal-to-noise ration, the autoencoder will learn to feed large values to the coding layer. In turn, this means the logistic activation function of the coding layer will likely saturate at 0 or 1. As a result, rounding the codings to 0 or 1 won't distort the output as much.

#### Compute the hash of every image, and see if images with identical hashes look alike. Since the dataset is labeled, one way to measure the performance of the autoencoder is to see if images with the same hash are part of the same class. You can use the Gini purity (introduced in Chapter 6) of the sets of images with identical (or very similar hashes).

#### Try fine tuning the hyperparameters using cross-validation.

#### Note that with a labeled dataset, another approach is to train a convolutional neural network (CNN) for classiciation, then use the layer below the output layer to produce the hashes. See [Jimma Gua and Jianmin Li's 2015 paper](https://arxiv.org/pdf/1509.01354.pdf). See if it performs better.

### 10. Train a variational autoencoder on the image dataset used in the previous exercises and make it generate images. Alternatively, you can try to find an unlabeled dataset that you are interested in and see if you can generate new samples.

Look at this tutorial: https://www.tensorflow.org/beta/tutorials/generative/cvae for a convolutional variational autoencoder (CVAE).