# Lab 9: Introduction to TensorFlow and Autoencoders

(125 points + extra 15 points)

This code is adapted from Chapter 15 of the 'Hands On Machine Learning with Scikit-Learn & Tensorflow' by A. Geron, ([reference](http://proquest.safaribooksonline.com/book/programming/9781491962282/15dot-autoencoders/autoencoders_chapter_html)) and the corresponding github repository.

Please have a look at the chapter referenced to get a background on autoencoders.

### Installing Tensorflow

To install Tensorflow, you can use Anaconda. Here are the steps to follow:
* Download the relevant Anaconda platform from [here](https://www.anaconda.com/download/)
* Install TensorFlow using Anaconda, by using the relevant commands from [here](https://www.tensorflow.org/install/)

It took me 5 minutes to install Tensorflow using this method, on my own Linux machine, and about 30 minutes to do so on the EWS Linux machine. 

With that you should be able to run Tensorflow. As a quick check, try to run 'import tensorflow as tf' in a jupyter notebook to verify. You can then proceed.

In [1]:
import tensorflow as tf

###  What to hand in: 
You will need to pack following things into a file. **Please zip all these into one zip file**, with the name netID_lab9


   * The completed Notebook file (ipynb) - Remember to answer all the questions in the notebook!
   * All the figures plotted in this lab (The reconstruction of the autoencoders)

# Setup

First, let's make sure this notebook works well in both python 2 and 3, import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures:

In [2]:
# To support both python 2 and python 3
from __future__ import division, print_function, unicode_literals

# Common imports
import numpy as np
import os
import sys
import numpy.random as rnd

# to make this notebook's output stable across runs
def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "autoencoders"

def save_fig(fig_id, tight_layout=True):
    path = os.path.join("images", fig_id + ".png")
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format='png', dpi=300)

Let's define a couple utility functions to plot grayscale 28x28 image:

In [3]:
def plot_image(image, shape=[28, 28]):
    plt.imshow(image.reshape(shape), cmap="Greys", interpolation="nearest")
    plt.axis("off")

In [4]:
def plot_multiple_images(images, n_rows, n_cols, pad=2):
    images = images - images.min()  # make the minimum == 0, so the padding looks white
    w,h = images.shape[1:]
    image = np.zeros(((w+pad)*n_rows+pad, (h+pad)*n_cols+pad))
    for y in range(n_rows):
        for x in range(n_cols):
            image[(y*(h+pad)+pad):(y*(h+pad)+pad+h),(x*(w+pad)+pad):(x*(w+pad)+pad+w)] = images[y*n_cols+x]
    plt.imshow(image, cmap="Greys", interpolation="nearest")
    plt.axis("off")

Since this will be the first time many of you will be using tensorflow, this section gives most of the basic code. Your task will be to fill in some missing code sections, while you get acquainted with the format of neural networks in TensorFlow.

For an overview of neural networks, you can visit https://deeplearning4j.org/neuralnet-overview. There are many online resources available. 

Another resource is the book 'Hands On Machine Learning with Scikit-Learn & Tensorflow' by A. Geron, whose ebook you can find in the library resources section. It is strongly encouraged to go through the chapters related to TensorFlow in this book.

#### Step 0: Load TensorFlow and MNIST data

In [5]:
# Import the tensorflow module
import tensorflow as tf

# Load the MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/")

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


There are many ways to train autoencoders. We will train and visualize the reconstructions of some autoencoders in this lab.

## Exercise 1: Building a simple autoencoder

As the first step, let's build an autoencoder with just one hidden layer. 

In [6]:
# Clear any existing tensorflow graphs. Tensorflow operates by creating graphs and nodes, 
# and passing data as tensors through the graph, during execution. It loads the exact
# values of nodes during execution.

reset_graph()

from functools import partial

Let's create a function that will train one autoencoder and return the transformed training set (i.e., the output of the hidden layer) and the model parameters.

In [7]:
################### FILL IN THE CODE HERE ##################################

def train_autoencoder(X_train, n_neurons, n_epochs, batch_size,
                      learning_rate = 0.01, l2_reg = 0.0005,
                      activation=tf.nn.elu, seed=42):
   
    # Tensorflow works with computation graphs, loading the needed data and variable values
    # when a session is opened, using the tf.Session() 
    
    graph = tf.Graph()
    with graph.as_default():
        tf.set_random_seed(seed)
    
        n_inputs = X_train.shape[1]

        # Tensorflow uses placeholders to indicate that data/variables will be passed here 
        # during the execution phase. 
        # To define placeholders, we have to pass the type of 
        # the variable, and its shape as inputs. If you specify None for any dimension of 
        # shape, it means "any size". This dimension will then be inferred from the data
        
        # Create a placeholder with tf.float32 and 
        # shape such that the second dimension is always n_inputs, while the first depends 
        # on data. 

        # (5 points)
        X = tf.placeholder(tf.float32, shape=(None,n_inputs))
        
        # Code to define a dense layer of connections. 
        my_dense_layer = partial(
            tf.layers.dense,
            activation=activation,
            kernel_initializer=tf.contrib.layers.variance_scaling_initializer(),
            kernel_regularizer=tf.contrib.layers.l2_regularizer(l2_reg))

        # Create the first 'hidden layer' will map X to n_neurons in the hidden layer.
        hidden = my_dense_layer(X, n_neurons, name="hidden")
        
        # Fill the blanks below to build the output layer by mapping hidden (layer) to 
        # n_inputs (since n_outputs = n_inputs).

        # (5 points)
        outputs = my_dense_layer(hidden, n_inputs, activation=None, name="outputs")

        # Fill in the blanks below to compute the reconstruction loss. 
        # Use the square() and reduce_mean() functions of TensorFlow
        # to compute the mean square error between outputs and X. 
        
        # (10 points)
        reconstruction_loss = tf.reduce_mean(tf.square(outputs - X))
        
        # Define regularization loss
        reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
        
        # The loss being optimized is a combination of the reconstruction and regularization losses.
        loss = tf.add_n([reconstruction_loss] + reg_losses)
        
        # To learn the parameters of the autoencoder, an Adam Optimizer can be used used
        # Use the Adam Optimizer, with the given learning_rate. To see usage,
        # see https://www.tensorflow.org/api_docs/python/tf/train 
        
        # (10 points)
        optimizer = tf.train.AdamOptimizer(learning_rate)
        
        # To perform the training, a training operation is defined. The task of 
        # the operation is to minimize the loss passed to it. 
        training_op = optimizer.minimize(loss)

        # Initialize all trainable variables in one go, before training starts. 
        # To do so use the global_variables_initializer() function of tensorflow.  
        # See https://www.tensorflow.org/api_guides/python/state_ops#Variable_helper_functions 
        
        # (10 points)
        init = tf.global_variables_initializer()

    # Now the computational graph is defined. We can now pass data, and perform training.
   
    with tf.Session(graph=graph) as sess:
        
        # Call the initializer defined above. 
        init.run()
        
        # An epoch comprises of passing all the samples in the training set
        # to the algorithm once. 
        
        for epoch in range(n_epochs):
            
            # The training is performed in batch of size batch_size. The number of batches can be computed.
            n_batches = len(X_train) // batch_size
            
            # Go through all the batches..
            for iteration in range(n_batches):
                print("\r{}%".format(100 * iteration // n_batches), end="")
                sys.stdout.flush()
                
                indices = rnd.permutation(len(X_train))[:batch_size]
                X_batch = X_train[indices]
                
                # To run the training once, pass the training operation and the data.
                # In tensorflow, to pass the data to the placeholders, feed_dict is used. 
               
                sess.run(training_op, feed_dict={X: X_batch})
             
            # After going through an entire epoch, we want to know the training loss.
            # To do so, we need to evaluate the reconstruction loss node in the graph.
            # Pass the appropriate data in the function below to do so. 
            
            loss_train = reconstruction_loss.eval(feed_dict={X: X_batch})
            
            print("\r{}".format(epoch), "Train MSE:", loss_train)
            
        params = dict([(var.name, var.eval()) for var in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)])
        hidden_val = hidden.eval(feed_dict={X: X_train})
        
        # return the needed parameters
        return hidden_val, params["hidden/kernel:0"], params["hidden/bias:0"], \
               params["outputs/kernel:0"], params["outputs/bias:0"]

In [39]:
# Now run the training of the autoencoder, using the MNIST training images. 
# Complete the code below to use 100 hidden neurons, and train for 10 epochs
# with a batch size of 150

# (5 points)
simple_output, W1_simple, b1_simple, W2_simple, b2_simple = train_autoencoder(
    mnist.train.images, 100,10,150)

0 Train MSE: 0.0188739
1 Train MSE: 0.0179402
2 Train MSE: 0.0197976
3 Train MSE: 0.0195595
4 Train MSE: 0.0191635
5 Train MSE: 0.019063
6 Train MSE: 0.0194744
7 Train MSE: 0.0192214
8 Train MSE: 0.0202738
9 Train MSE: 0.0200159


We have the weights W1, b1, W2 and b2. 

Using these, we can run test images through the autoencoder and see the results. We will do that later. Before that, let's build a stacked autoencoder.


## Exercise 2: Building a stacked autoencoder

Now let's train two autoencoders. The first one is trained on the training data, and the second is trained on the previous Autoencoder's hidden layer output. To do so, we can use the train_autoencoder() function we defined earlier. 

Use code similar to the previous line to train two autoencoders. We did not use the first output above. 
We will use it here as the intermediate output, which will be used to train the second autoencoder.

In [24]:
# Train a stacked autoencoder using 300 neurons in the first layer, and 150 neurons in the next layer, for 
# 10 epochs each, with a batchsize of 150.
# (15 points)

hidden_output, W1_stacked, b1_stacked, W4_stacked, b4_stacked = train_autoencoder(mnist.train.images, 300, 10, 150)
stacked_output, W2_stacked, b2_stacked, W3_stacked, b3_stacked = train_autoencoder(hidden_output, 150, 10, 150)

0 Train MSE: 0.0183982
1 Train MSE: 0.0177385
2 Train MSE: 0.0196133
3 Train MSE: 0.0193563
4 Train MSE: 0.0190451
5 Train MSE: 0.0190034
6 Train MSE: 0.0194233
7 Train MSE: 0.0192226
8 Train MSE: 0.0202485
9 Train MSE: 0.0198858
0 Train MSE: 0.00376496
1 Train MSE: 0.00409651
2 Train MSE: 0.00424412
3 Train MSE: 0.00430497
4 Train MSE: 0.00409228
5 Train MSE: 0.00431693
6 Train MSE: 0.00420121
7 Train MSE: 0.00425522
8 Train MSE: 0.00417095
9 Train MSE: 0.00403692


## Exercise 3: Visualizing the Reconstructions

Let us now visualize the reconstructions on test digits. Let's first define a function to show the reconstructed digits.

In [9]:
def show_reconstructed_digits(X, outputs, filename='', n_test_digits = 10):
    with tf.Session() as sess:
        X_test = mnist.test.images[:n_test_digits]
        outputs_val = outputs.eval(feed_dict={X: X_test})

    fig = plt.figure(figsize=(8, 3 * n_test_digits))
    for digit_index in range(n_test_digits):
        plt.subplot(n_test_digits, 2, digit_index * 2 + 1)
        plot_image(X_test[digit_index])
        plt.subplot(n_test_digits, 2, digit_index * 2 + 2)
        plot_image(outputs_val[digit_index])
    if not filename == '':
        plt.savefig(filename)
        
    plt.close()

Let's visualize the reconstruction of the **simple** autoencoder first. Fill in the code below.


In [26]:
reset_graph()

n_inputs = 28*28 # size of the MNIST images

# create a placeholder of type float32, with fixed second dimension = n_inputs. 
#### YOUR CODE HERE
# (5 points)
X = tf.placeholder(tf.float32, shape=(None,n_inputs))

# create the hidden layer and ouptut layer. Remember that 
# hidden = activation_function(W1*X + b1) and
# choose your activation function to be the exponential linear unit 
# using tf.nn.elu() and tf.matmul() for matrix multiplication.
# (5 points)
hidden1 = tf.nn.elu(tf.matmul(X, W1_simple) + b1_simple)

# output = (hidden1*W2 + b2) [NO ACTIVATION HERE]
# (5 points)
outputs = (tf.matmul(hidden1,W2_simple) + b2_simple)

Save the reconstructed digits as an appropriate png file.

In [27]:
show_reconstructed_digits(X, outputs, 'simple.png')

Repeat for the **stacked** autoencoder.

In [28]:
reset_graph()

n_inputs = 28*28

# Use that
# hidden1 = activation_function(W1*X + b1)
# hidden2 = activation_function(W2*hidden1 + b2)
# hidden3 = activation_function(W3*hidden2 + b3)
# output = W4*hidden3 + b4


# (15 points)
X = tf.placeholder(tf.float32, shape=(None, n_inputs))
hidden1 = tf.nn.elu(tf.matmul(X, W1_stacked) + b1_stacked) 
hidden2 = tf.nn.elu(tf.matmul(hidden1, W2_stacked) + b2_stacked)
hidden3 = tf.nn.elu(tf.matmul(hidden2, W3_stacked) + b3_stacked)
outputs = tf.matmul(hidden3, W4_stacked) + b4_stacked

In [29]:
show_reconstructed_digits(X, outputs, 'stacked_300_50.png')

### Extra credit
(10 points)
* Build a stacked autoencoder with 3 layers, similar to the above.
* Tune the number of hidden neurons to get the best output (visually). 
* Save the corresponding output images with appropriate names. 

(5 points)
* Do you think adding another hidden layer was helpful? Explain.

In [11]:
# YOUR CODE HERE FOR EXTRA CREDIT
hidden_output1, W1_stacked, b1_stacked, W6_stacked, b6_stacked = train_autoencoder(mnist.train.images, 300, 10, 150)
hidden_output2, W2_stacked, b2_stacked, W5_stacked, b5_stacked = train_autoencoder(hidden_output1, 150, 10, 150)
hidden_output3, W3_stacked, b3_stacked, W4_stacked, b4_stacked = train_autoencoder(hidden_output2, 150, 10, 150)

reset_graph()

n_inputs = 28*28

X = tf.placeholder(tf.float32, shape=(None, n_inputs))
hidden1 = tf.nn.elu(tf.matmul(X, W1_stacked) + b1_stacked)
hidden2 = tf.nn.elu(tf.matmul(hidden1, W2_stacked) + b2_stacked)
hidden3 = tf.nn.elu(tf.matmul(hidden2, W3_stacked) + b3_stacked)
hidden4 = tf.nn.elu(tf.matmul(hidden3, W4_stacked) + b4_stacked)
hidden5 = tf.nn.elu(tf.matmul(hidden4, W5_stacked) + b5_stacked)
outputs = tf.matmul(hidden5, W6_stacked) + b6_stacked

show_reconstructed_digits(X, outputs, '3_layer_300_50.png')

0 Train MSE: 0.0183982
1 Train MSE: 0.0177385
2 Train MSE: 0.0196133
3 Train MSE: 0.0193563
4 Train MSE: 0.0190451
5 Train MSE: 0.0190034
6 Train MSE: 0.0194233
7 Train MSE: 0.0192226
8 Train MSE: 0.0202485
9 Train MSE: 0.0198858
0 Train MSE: 0.00376496
1 Train MSE: 0.00409651
2 Train MSE: 0.00424412
3 Train MSE: 0.00430497
4 Train MSE: 0.00409228
5 Train MSE: 0.00431693
6 Train MSE: 0.00420121
7 Train MSE: 0.00425522
8 Train MSE: 0.00417095
9 Train MSE: 0.00403692
0 Train MSE: 0.00193519
1 Train MSE: 0.00228511
2 Train MSE: 0.00218026
3 Train MSE: 0.00224638
4 Train MSE: 0.00241536
5 Train MSE: 0.002187
6 Train MSE: 0.00232828
7 Train MSE: 0.00230324
8 Train MSE: 0.00239218
9 Train MSE: 0.00230837


## Questions

* Why is n_outputs = n_inputs in the train_autoencoder() function? (5 points)
* Do autoencoders come under supervised learning algorithms, or unsupervised learning algorithms? Explain. (5 points)
* Vary the number of neurons in the hidden layer of the 2-layer stacked autoencoder.  (15 points)

    Try out the following number of hidden neurons, and save the reconstructions for [hidden_1, hidden_2] = 
    * [ 25, 10]
    * [ 50, 25]
    * [100, 35]
    * [300, 50]
* Explain the variation. (5 points) 
* Compare the reconstruction of the simple autoencoder vs the stacked autoencoders (5 points)

## Your answers here

n_outputs is equal to n_inputs in the train_autoencoder() function so that the whole picture can be rebuilt.

Autoencoders come under unsupervised learning because there is no prior defined labels on the training data.

Some of the neuron pairs produce clearer images versus other ones. I could not really establish a correlation between the layers though.

The reconstruction of the simple autoencoder looked slightly better than that of the stacked autoencoders in all cases.

In [12]:
def test_neurons(hidden_1, hidden_2, title):
    
    hidden_output, W1_stacked, b1_stacked, W4_stacked, b4_stacked = train_autoencoder(mnist.train.images, hidden_1, 10, 150)
    stacked_output, W2_stacked, b2_stacked, W3_stacked, b3_stacked = train_autoencoder(hidden_output, hidden_2, 10, 150)
    
    reset_graph()

    n_inputs = 28*28

    X = tf.placeholder(tf.float32, shape=(None, n_inputs))
    hidden1 = tf.nn.elu(tf.matmul(X, W1_stacked) + b1_stacked) 
    hidden2 = tf.nn.elu(tf.matmul(hidden1, W2_stacked) + b2_stacked)
    hidden3 = tf.nn.elu(tf.matmul(hidden2, W3_stacked) + b3_stacked)
    outputs = tf.matmul(hidden3, W4_stacked) + b4_stacked
    
    show_reconstructed_digits(X, outputs, title)

In [13]:
test_neurons(25, 10, "25_10_stacked.png")
test_neurons(50,25, "50_25_stacked.png")
test_neurons(100,35, "100_35_stacked.png")
test_neurons(300, 50, "300_50_stacked.png")

0 Train MSE: 0.0277956
1 Train MSE: 0.0223349
2 Train MSE: 0.0239603
3 Train MSE: 0.0235846
4 Train MSE: 0.0228943
5 Train MSE: 0.0231197
6 Train MSE: 0.0233736
7 Train MSE: 0.0226926
8 Train MSE: 0.0239273
9 Train MSE: 0.0234951
0 Train MSE: 0.334457
1 Train MSE: 0.265359
2 Train MSE: 0.29439
3 Train MSE: 0.283808
4 Train MSE: 0.281086
5 Train MSE: 0.278161
6 Train MSE: 0.260879
7 Train MSE: 0.268513
8 Train MSE: 0.272943
9 Train MSE: 0.263226
0 Train MSE: 0.0206827
1 Train MSE: 0.0182166
2 Train MSE: 0.0197283
3 Train MSE: 0.0195614
4 Train MSE: 0.0190983
5 Train MSE: 0.0189817
6 Train MSE: 0.0192776
7 Train MSE: 0.0190889
8 Train MSE: 0.0200199
9 Train MSE: 0.0199175
0 Train MSE: 0.0413285
1 Train MSE: 0.0371483
2 Train MSE: 0.036482
3 Train MSE: 0.0369612
4 Train MSE: 0.0337837
5 Train MSE: 0.0356899
6 Train MSE: 0.0355617
79% Train MSE: 0.039513
8 Train MSE: 0.0380454
9 Train MSE: 0.033611
0 Train MSE: 0.0188739
1 Train MSE: 0.0179402
2 Train MSE: 0.0197976
3 Train MSE: 0.0195595
