In this exercise, we practice to build up an autoencoder. An autoencoder is a network system that includes a encoder and a decoder. The encoder can encode the input data to latent feature and the decoder can reconstruct the input data from the encoded latent feature.

Suppose you have input data x, Enc is encoder network and Dec represents decoder network, then the output of Dec(Enc(x)) is supposed to be equal to x. 

Practically, we often use convolutional layers for encoder and deconvolutional layers for decoder. We will practice how to build the autoencoder with convulutional layers and deconvolutional layers.

In [None]:
!(nvidia-smi | grep -q "has failed") && echo "No GPU found!" || nvidia-smi 

In [None]:
import tensorflow as tf
import random
import numpy as np
import matplotlib.pyplot as plt

In [None]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/MNIST_data")

x_train = mnist.train.images[:55000,:]
x_train.shape


## The encoder

**TASK**

Build the encoder network with 2 or 3 or 4 convolutional layers. Compare the results of different number of layers. Set `stride=2` for downsampling. You can use the following helper function for code brevity

The number of output_channel for current layer is mostly two times by the output_channel of previous layer, expecially when your `stride == 2`. For the first layer, we mostly use 32 or 64 or larger. It depends on the input image scale.

In [None]:
def conv2d_withwb(x, k_w=3, k_h=3, input_channel=1, output_channel=8, stride=1, name='conv2d', reuse=False):
    """
    k_w: kernel width
    k_h: kernel height
    """
    with tf.variable_scope(name, reuse=reuse) as scope:
        w = tf.get_variable('w', [k_w, k_h, input_channel, output_channel],
                            initializer=tf.truncated_normal_initializer(stddev=0.02))
        b = tf.get_variable('b', [output_channel], initializer=tf.constant_initializer(0))

    return tf.nn.conv2d(x, w, strides=[1,stride,stride,1], padding='SAME') + b

In [None]:
def encoder(x):
    with tf.variable_scope('encoder') as scope:
        
        # Encoder Hidden layer #1
        x = tf.nn.relu(conv2d_withwb(x, input_channel=1,  output_channel=32,  stride=2, name='conv1'))
        x = tf.nn.relu(conv2d_withwb(x, input_channel=32, output_channel=64,  stride=2, name='conv2'))
        x = tf.nn.relu(conv2d_withwb(x, input_channel=64, output_channel=128, stride=2, name='conv3'))

#         x = tf.nn.sigmoid(x)
    return x

## The decoder

**TASK**

Build the encoder network with deconvolutional layer (function in the next cell `deconv2d`). 

Set `stride=2` for downsampling. Try with different number of layers. 

Be careful that the number of upsampling deconvolutional layers should be consistent with the downsampling convolutional layers.

And be careful with the padding  method when the second and third dimension of input or output layer is odd (`SAME` or `VALID`. Please check the tensorflow tutorial about [tf.nn.conv2d_transpose](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d_transpose)). 

You can also play with the `g_dim` to see whether it makes some difference.

In [None]:
def deconv2d(x, k_w=3, k_h=3, output_shape=[1,1,1,1], stride=2, padding='SAME', name='deconv', reuse=False):
    with tf.variable_scope(name, reuse=reuse):
        w = tf.get_variable('w', [k_w, k_h, output_shape[-1], int(x.get_shape()[-1])], 
                            initializer=tf.truncated_normal_initializer(stddev=0.1))
        b = tf.get_variable('b', [output_shape[-1]], initializer=tf.constant_initializer(0.1))
    
    return b + tf.nn.conv2d_transpose(x, w, output_shape=output_shape, strides=[1,stride,stride,1], padding=padding)


In [None]:
def decoder(x):
    with tf.variable_scope('decoder'):
        
        g_dim = 64
        
        # Decoder Hidden layer with sigmoid activation #1
        output1_shape = [batch_size, 7, 7, g_dim*4]
        layer_1 = deconv2d(x,       k_w=4, k_h=4, output_shape=output1_shape, stride=2, padding='SAME', name='deconv1')

        # Decoder Hidden layer with sigmoid activation #2 
        output2_shape = [batch_size, 14 , 14 , g_dim*2]
        layer_2 = deconv2d(layer_1, k_w=4, k_h=4, output_shape=output2_shape, stride=2, padding='SAME', name='deconv2')

        # Decoder Hidden layer with sigmoid activation #3 
        output3_shape = [batch_size, 28 , 28 , 1]
        layer_3 = deconv2d(layer_2, k_w=4, k_h=4, output_shape=output3_shape, stride=2, padding='SAME', name='deconv3')

#         output4_shape = [batch_size, 28 , 28 , 1]
#         layer_4 = deconv2d(layer_3, k_w=4, k_h=4, output_shape=output4_shape, stride=1, padding='SAME', name='deconv4')

    return tf.nn.tanh(layer_3)

## Build the network

Call the encoder and the decoder

In [None]:
batch_size = 16

# if you had some errors and you need to run this cell again,
# tensorflow will complain because the operations that were 
# run successfully cannot be readded to the graph. So we reset the graph
tf.reset_default_graph()

x_placeholder = tf.placeholder("float", shape = [batch_size,28,28,1])

feature = encoder(x_placeholder)
x_estimate = decoder(feature)

## Loss function

**TASK**

Compute L2 loss between the ground truth x_placeholder and the output of autoencoder x_estimate

In [None]:
l2_loss = tf.reduce_mean(tf.square(x_placeholder - x_estimate))

Prepare the training variables

In [None]:
tvars = tf.trainable_variables()
e_vars = [var for var in tvars if 'encoder' in var.name]
d_vars = [var for var in tvars if 'decoder' in var.name]

In [None]:
learning_rate = 0.0002
num_steps = 10000
n = int(np.sqrt(batch_size))

display_step = 200

In [None]:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(l2_loss, var_list=e_vars+d_vars)

In [None]:
with tf.Session() as sess:

    # Run the initializer
    sess.run(tf.global_variables_initializer())

    # Training
    for i in range(1, num_steps+1):
        # Prepare Data
        # Get the next batch of MNIST data (only images are needed, not labels)
        batch_x, _ = mnist.train.next_batch(batch_size)  # mnist.test.next_batch(batch_size) for test set
        batch_x = np.reshape(batch_x,[batch_size,28,28,1])
        
        # Run optimization op (backprop) and cost op (to get loss value)
        _, loss_value, reconstructed_img = sess.run([optimizer, l2_loss, x_estimate], 
                                                 feed_dict={x_placeholder: batch_x})

        # Display logs per step
        if i % display_step == 0 or i == 1:
            print('Step %i: Minibatch Loss: %f' % (i, loss_value))
            
            canvas_orig = np.empty((28 * n, 28 * n))
            canvas_recon = np.empty((28 * n, 28 * n))
            for i in range(n):

                # Display original images
                for j in range(n):
                    # Draw the original digits
                    canvas_orig[i * 28:(i + 1) * 28, j * 28:(j + 1) * 28] = \
                        batch_x[n*i+j].reshape([28, 28])
                # Display reconstructed images
                for j in range(n):
                    # Draw the reconstructed digits
                    canvas_recon[i * 28:(i + 1) * 28, j * 28:(j + 1) * 28] = \
                        reconstructed_img[n*i+j].reshape([28, 28])

            fig, axes = plt.subplots(1, 2, figsize=(2*n,n), frameon=False)
            axes[0].imshow(canvas_orig, origin="upper", cmap="gray")
            axes[0].set_title("Original Images")

            axes[1].imshow(canvas_recon, origin="upper", cmap="gray")
            axes[1].set_title("Reconstructed Images")

            axes[0].set_axis_off(); axes[1].set_axis_off()
            plt.show(fig)