Deep Learning
=============

Assignment 4
------------

Previously in `2_fullyconnected.ipynb` and `3_regularization.ipynb`, we trained fully connected networks to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters.

The goal of this assignment is make the neural network convolutional.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

#my imports
import timeit
from functools import reduce

In [2]:
pickle_file = 'notMNIST.pickle'
sanitized_pickle_file = 'notMNIST_sanitized.pickle' # additional work (it includes cleaned test and valid datasets)

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

with open(sanitized_pickle_file, 'rb') as f:
  save = pickle.load(f)
  sanit_valid_dataset = save['valid_dataset']
  sanit_valid_labels = save['valid_labels']
  sanit_test_dataset = save['test_dataset']
  sanit_test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('\nsanit_Validation set', sanit_valid_dataset.shape, sanit_valid_labels.shape)
  print('sanit_Test set', sanit_test_dataset.shape, sanit_test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)

sanit_Validation set (8984, 28, 28) (8984,)
sanit_Test set (8709, 28, 28) (8709,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
  dataset = dataset.reshape(
    (-1, image_size, image_size, num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)

sanit_valid_dataset, sanit_valid_labels = reformat(sanit_valid_dataset, sanit_valid_labels)
sanit_test_dataset, sanit_test_labels = reformat(sanit_test_dataset, sanit_test_labels)

print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

print('\nsanitValidation set', sanit_valid_dataset.shape, sanit_valid_labels.shape)
print('sanitTest set', sanit_test_dataset.shape, sanit_test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)

sanitValidation set (8984, 28, 28, 1) (8984, 10)
sanitTest set (8709, 28, 28, 1) (8709, 10)


In [4]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

In [5]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
  layer1_biases = tf.Variable(tf.zeros([depth]))
  layer2_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
  layer3_weights = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
  layer4_weights = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
  # Model.
  def model(data):
    conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer2_biases)
    shape = hidden.get_shape().as_list()
    reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    return tf.matmul(hidden, layer4_weights) + layer4_biases

  # Training computation.
  logits = model(tf_train_dataset)
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [6]:
num_steps = 1001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 3.115222
Minibatch accuracy: 6.2%
Validation accuracy: 10.2%
Minibatch loss at step 50: 1.371427
Minibatch accuracy: 50.0%
Validation accuracy: 54.9%
Minibatch loss at step 100: 1.262059
Minibatch accuracy: 62.5%
Validation accuracy: 68.5%
Minibatch loss at step 150: 0.893900
Minibatch accuracy: 75.0%
Validation accuracy: 74.6%
Minibatch loss at step 200: 0.415209
Minibatch accuracy: 87.5%
Validation accuracy: 78.3%
Minibatch loss at step 250: 0.581476
Minibatch accuracy: 68.8%
Validation accuracy: 77.7%
Minibatch loss at step 300: 0.718953
Minibatch accuracy: 81.2%
Validation accuracy: 79.7%
Minibatch loss at step 350: 0.729384
Minibatch accuracy: 81.2%
Validation accuracy: 80.0%
Minibatch loss at step 400: 0.856863
Minibatch accuracy: 81.2%
Validation accuracy: 76.1%
Minibatch loss at step 450: 1.358471
Minibatch accuracy: 62.5%
Validation accuracy: 80.1%
Minibatch loss at step 500: 0.561696
Minibatch accuracy: 81.2%
Validation accuracy: 80.9%
Mi

---
Problem 1
---------

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

## Solution 1

In [7]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

g = tf.Graph()

with g.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  tf_sanit_valid_dataset = tf.constant(sanit_valid_dataset)
  tf_sanit_test_dataset = tf.constant(sanit_test_dataset)
  
  # Variables.
  W_1 = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
  b_1 = tf.Variable(tf.zeros([depth]))
  W_2 = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
  b_2 = tf.Variable(tf.constant(1.0, shape=[depth]))
  W_3 = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  b_3 = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
  W_4 = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
  b_4 = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
  # Model.
  def model(data):
    conv = tf.nn.conv2d(data, W_1, [1, 1, 1, 1], padding='SAME')
    hidden = tf.nn.relu(conv + b_1)
    pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
    conv = tf.nn.conv2d(pool, W_2, [1, 1, 1, 1], padding='SAME')
    hidden = tf.nn.relu(conv + b_2)
    pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
    shape = pool.get_shape().as_list()
    reshape = tf.reshape(pool, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, W_3) + b_3)
    return tf.matmul(hidden, W_4) + b_4

  # Training computation.
  logits = model(tf_train_dataset)
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))
  sanit_valid_prediction = tf.nn.softmax(model(tf_sanit_valid_dataset))
  sanit_test_prediction = tf.nn.softmax(model(tf_sanit_test_dataset))

In [8]:
# Optimization

n_epochs = 200 # number of epochs

# early-stopping parameters
patience = 5000 # look as this many examples regardless
patience_increase = 2
improvement_threshold = 0.995 # a relative improvement of 
                                  # this much is considered significant
best_valid_loss = np.inf
start_time = timeit.default_timer()

n_train_batches = train_dataset.shape[0] // batch_size

valid_freq = min(n_train_batches, patience // 2)

with tf.Session(graph=g) as sess:
    tf.global_variables_initializer().run()
    print('Initialized')
    done_looping = False
    epoch = 0
    while (epoch < n_epochs) and (not done_looping):
        epoch = epoch + 1
        for minibatch_index in range(n_train_batches):
            
            batch_data = \
                train_dataset[minibatch_index * batch_size:(minibatch_index + 1) * batch_size, :]
            batch_labels = \
                train_labels[minibatch_index * batch_size:(minibatch_index + 1) * batch_size, :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
            _, l, predictions = sess.run(
                [optimizer, loss, train_prediction], feed_dict=feed_dict)
        
            iter = (epoch - 1) * n_train_batches + minibatch_index # cumulative iteration number
        
            if (iter + 1) % valid_freq == 0:
                this_valid_loss = 1. - (accuracy(valid_prediction.eval(), valid_labels) / 100.)
            
                print("Minibatch loss at epoch %i and iter %i: %f" % 
                      (epoch, iter, l))
                print("Minibatch  train, validation and sanitized validation accuracy: %.1f%%, %.1f%%, %.1f%%" 
                    % (accuracy(predictions, batch_labels), 
                    accuracy(valid_prediction.eval(), valid_labels),
                    accuracy(sanit_valid_prediction.eval(), sanit_valid_labels)))
            
                if this_valid_loss < best_valid_loss:
                    if this_valid_loss < best_valid_loss * improvement_threshold:
                        patience = max(patience, iter * patience_increase)
                    
                    best_valid_loss = this_valid_loss
                    
                    params = (sess.run(W_1), sess.run(b_1),
                              sess.run(W_2), sess.run(b_2),
                              sess.run(W_3), sess.run(b_3),
                              sess.run(W_4), sess.run(b_4))
                    
                    # save the best model
                    with open('best_model_params.pkl', 'wb') as f:
                            pickle.dump(params, f)
                    print('Model saved')
        
            if patience <= iter:
                    done_looping = True
                    break
                    
    print("Final Test accuracy: %.1f%%" 
                  % accuracy(test_prediction.eval(), test_labels))
    
    with open('best_model_params.pkl', 'rb') as f:
        (W_1_best, b_1_best, W_2_best, b_2_best,
         W_3_best, b_3_best, W_4_best, b_4_best) = pickle.load(f)
    
    W_1_init, b_1_init = tf.assign(W_1, W_1_best), tf.assign(b_1, b_1_best)
    W_2_init, b_2_init = tf.assign(W_2, W_2_best), tf.assign(b_2, b_2_best)
    W_3_init, b_3_init = tf.assign(W_3, W_3_best), tf.assign(b_3, b_3_best)
    W_4_init, b_4_init = tf.assign(W_4, W_4_best), tf.assign(b_4, b_4_best)
    
    sess.run([W_1_init, b_1_init,
              W_2_init, b_2_init,
              W_3_init, b_3_init,
              W_4_init, b_4_init])
    
    print("Test accuracy with the best model: %.1f%%" 
                  % accuracy(test_prediction.eval(), test_labels))
    
    print("Sanitized test accuracy with the best model: %.1f%%" 
                  % accuracy(sanit_test_prediction.eval(), sanit_test_labels))

    end_time = timeit.default_timer()

    print('Total run time %.4f minutes' % ((end_time - start_time) / 60.))

Initialized
Minibatch loss at epoch 1 and iter 2499: 0.377919
Minibatch  train, validation and sanitized validation accuracy: 87.5%, 85.9%, 85.0%
Model saved
Minibatch loss at epoch 1 and iter 4999: 0.218864
Minibatch  train, validation and sanitized validation accuracy: 93.8%, 86.8%, 86.0%
Model saved
Minibatch loss at epoch 1 and iter 7499: 0.391378
Minibatch  train, validation and sanitized validation accuracy: 87.5%, 88.0%, 87.3%
Model saved
Minibatch loss at epoch 1 and iter 9999: 0.366130
Minibatch  train, validation and sanitized validation accuracy: 87.5%, 89.0%, 88.3%
Model saved
Minibatch loss at epoch 1 and iter 12499: 0.656173
Minibatch  train, validation and sanitized validation accuracy: 81.2%, 89.0%, 88.3%
Minibatch loss at epoch 2 and iter 14999: 0.211207
Minibatch  train, validation and sanitized validation accuracy: 93.8%, 89.5%, 88.8%
Model saved
Minibatch loss at epoch 2 and iter 17499: 0.206505
Minibatch  train, validation and sanitized validation accuracy: 93.8%, 

Minibatch loss at epoch 12 and iter 149999: 0.570828
Minibatch  train, validation and sanitized validation accuracy: 75.0%, 90.3%, 89.7%
Minibatch loss at epoch 13 and iter 152499: 0.154342
Minibatch  train, validation and sanitized validation accuracy: 93.8%, 90.7%, 90.1%
Final Test accuracy: 96.0%
Test accuracy with the best model: 96.4%
Sanitized test accuracy with the best model: 95.9%
Total run time 44.7739 minutes


---
Problem 2
---------

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---

## Solution 2

Let's try LeNet-5 model with dropouts and learning rate decay. 

In [9]:
# Graph with GPU support

batch_size = 32
patch_size = 5

g = tf.Graph()

config = tf.ConfigProto(allow_soft_placement = True)

with tf.device('/gpu:0'):
#with g.as_default():
    # Input data.
    tf_train_dataset = tf.placeholder(
      tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    tf_sanit_valid_dataset = tf.constant(sanit_valid_dataset)
    tf_sanit_test_dataset = tf.constant(sanit_test_dataset)
    
    global_step = tf.Variable(0)
    learning_rate = tf.train.exponential_decay(0.1, global_step, 10000, 0.95, staircase=True)
    
    # Variables.
    W_1 = tf.get_variable("W1", shape=[patch_size, patch_size, num_channels, 6],
                                 initializer=tf.contrib.layers.xavier_initializer())
    b_1 = tf.Variable(tf.zeros([6]))
    
    W_2 = tf.get_variable("W2", shape=[patch_size, patch_size, 6, 16],
                                 initializer=tf.contrib.layers.xavier_initializer())
    b_2 = tf.Variable(tf.zeros([16]))
    
    W_3 = tf.get_variable("W3", shape=[5 * 5 * 16, 120],
                                 initializer=tf.contrib.layers.xavier_initializer())
    b_3 = tf.Variable(tf.zeros([120]))
    
    W_4 = tf.get_variable("W4", shape=[120, 84],
                                 initializer=tf.contrib.layers.xavier_initializer())
    b_4 = tf.Variable(tf.zeros([84]))
    
    W_5 = tf.get_variable("W5", shape=[84, num_labels],
                                 initializer=tf.contrib.layers.xavier_initializer())
    b_5 = tf.Variable(tf.zeros([num_labels]))
    
    # Model.
    def model(data, dropout=False):
        conv = tf.nn.conv2d(data, W_1, [1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + b_1)
        
        pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        
        conv = tf.nn.conv2d(pool, W_2, [1, 1, 1, 1], padding='VALID')
        hidden = tf.nn.relu(conv + b_2)
        
        pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        
        shape = pool.get_shape().as_list()
        reshape = tf.reshape(pool, [shape[0], shape[1] * shape[2] * shape[3]])

        hidden = tf.nn.relu(tf.matmul(reshape, W_3) + b_3)
        if dropout:
            hidden = tf.nn.dropout(hidden, 0.5)

        hidden = tf.nn.relu(tf.matmul(hidden, W_4) + b_4)
        if dropout:
            hidden = tf.nn.dropout(hidden, 0.5)
        
        return tf.matmul(hidden, W_5) + b_5
    
    # Training computation.
    logits = model(tf_train_dataset, dropout=True)
    loss = tf.reduce_mean(
      tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
      
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
    
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))
    sanit_valid_prediction = tf.nn.softmax(model(tf_sanit_valid_dataset))
    sanit_test_prediction = tf.nn.softmax(model(tf_sanit_test_dataset))

In [10]:
# Optimization

n_epochs = 200 # number of epochs

# early-stopping parameters
patience = 5000 # look as this many examples regardless
patience_increase = 2
improvement_threshold = 0.995 # a relative improvement of 
                                  # this much is considered significant
best_valid_loss = np.inf
start_time = timeit.default_timer()
prev_time = start_time

n_train_batches = train_dataset.shape[0] // batch_size

valid_freq = min(n_train_batches, patience // 2)

epoch_freq_ratio = max(1, n_train_batches // (patience // 2))

delta_t = []
with tf.Session(config=config) as sess:
#with tf.Session(graph=g) as sess:
    tf.global_variables_initializer().run()
    print('Initialized')
    done_looping = False
    epoch = 0
    while (epoch < n_epochs) and (not done_looping):
        epoch = epoch + 1
        for minibatch_index in range(n_train_batches):
            
            batch_data = \
                train_dataset[minibatch_index * batch_size:(minibatch_index + 1) * batch_size, :]
            batch_labels = \
                train_labels[minibatch_index * batch_size:(minibatch_index + 1) * batch_size, :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
            _, l, predictions = sess.run(
                [optimizer, loss, train_prediction], feed_dict=feed_dict)
        
            iter = (epoch - 1) * n_train_batches + minibatch_index # cumulative iteration number
        
            if (iter + 1) % valid_freq == 0:
                this_valid_loss = 1. - (accuracy(valid_prediction.eval(), valid_labels) / 100.)
            
                print("Minibatch loss at epoch %i and iter %i: %f and the learning rate: %f" % 
                      (epoch, iter, l, learning_rate.eval()))
                print("Minibatch  train, validation and sanitized validation accuracy: %.3f%%, %.3f%%, %.3f%%" 
                    % (accuracy(predictions, batch_labels), 
                    accuracy(valid_prediction.eval(), valid_labels),
                    accuracy(sanit_valid_prediction.eval(), sanit_valid_labels)))

                end_time = timeit.default_timer()

                delta_t.append(end_time - prev_time)
                
                print('Time interval: %.4f seconds, estimated run time for %i epochs: %.4f hours' % 
                        ((end_time - prev_time), n_epochs, 
                        ((reduce(lambda x, y: x + y, delta_t) / len(delta_t)) 
                         * n_epochs * epoch_freq_ratio) / 3600.))
                
                prev_time = end_time
            
                if this_valid_loss < best_valid_loss:
                    if this_valid_loss < best_valid_loss * improvement_threshold:
                        patience = max(patience, iter * patience_increase)
                    
                    best_valid_loss = this_valid_loss
                    
                    params = (sess.run(W_1), sess.run(b_1),
                              sess.run(W_2), sess.run(b_2),
                              sess.run(W_3), sess.run(b_3),
                              sess.run(W_4), sess.run(b_4),
                              sess.run(W_5), sess.run(b_5))
                    
                    # save the best model
                    with open('best_model_params.pkl', 'wb') as f:
                            pickle.dump(params, f)
                    print('Model saved')
        
            if patience <= iter:
                    done_looping = True
                    break
                    
    print("Final Test accuracy: %.1f%%" 
                  % accuracy(test_prediction.eval(), test_labels))
    
    with open('best_model_params.pkl', 'rb') as f:
        (W_1_best, b_1_best, 
         W_2_best, b_2_best,
         W_3_best, b_3_best, 
         W_4_best, b_4_best,
         W_5_best, b_5_best) = pickle.load(f)
    
    W_1_init, b_1_init = tf.assign(W_1, W_1_best), tf.assign(b_1, b_1_best)
    W_2_init, b_2_init = tf.assign(W_2, W_2_best), tf.assign(b_2, b_2_best)
    W_3_init, b_3_init = tf.assign(W_3, W_3_best), tf.assign(b_3, b_3_best)
    W_4_init, b_4_init = tf.assign(W_4, W_4_best), tf.assign(b_4, b_4_best)
    W_5_init, b_5_init = tf.assign(W_5, W_5_best), tf.assign(b_5, b_5_best)
    
    sess.run([W_1_init, b_1_init,
              W_2_init, b_2_init,
              W_3_init, b_3_init,
              W_4_init, b_4_init,
              W_5_init, b_5_init])
    
    print("Test accuracy with the best model: %.3f%%" 
                  % accuracy(test_prediction.eval(), test_labels))
    
    print("Sanitized test accuracy with the best model: %.3f%%" 
                  % accuracy(sanit_test_prediction.eval(), sanit_test_labels))

    end_time = timeit.default_timer()

    print('Total run time %.4f minutes' % ((end_time - start_time) / 60.))

    sess.close()

Initialized
Minibatch loss at epoch 1 and iter 2499: 0.294367 and the learning rate: 0.100000
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 86.300%, 85.452%
Time interval: 17.2607 seconds, estimated run time for 200 epochs: 1.9179 hours
Model saved
Minibatch loss at epoch 1 and iter 4999: 0.541414 and the learning rate: 0.100000
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 87.480%, 86.587%
Time interval: 14.8643 seconds, estimated run time for 200 epochs: 1.7847 hours
Model saved
Minibatch loss at epoch 2 and iter 7499: 0.475043 and the learning rate: 0.100000
Minibatch  train, validation and sanitized validation accuracy: 84.375%, 88.230%, 87.444%
Time interval: 15.9236 seconds, estimated run time for 200 epochs: 1.7796 hours
Model saved
Minibatch loss at epoch 2 and iter 9999: 0.327063 and the learning rate: 0.095000
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 88.530%, 87.756%
Time interval: 17.0853 s

Minibatch loss at epoch 14 and iter 82499: 0.300528 and the learning rate: 0.066342
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 90.530%, 89.904%
Time interval: 18.4124 seconds, estimated run time for 200 epochs: 2.2770 hours
Minibatch loss at epoch 14 and iter 84999: 0.268215 and the learning rate: 0.066342
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 90.580%, 89.938%
Time interval: 18.1309 seconds, estimated run time for 200 epochs: 2.2693 hours
Minibatch loss at epoch 14 and iter 87499: 0.468058 and the learning rate: 0.066342
Minibatch  train, validation and sanitized validation accuracy: 84.375%, 90.370%, 89.693%
Time interval: 18.5539 seconds, estimated run time for 200 epochs: 2.2633 hours
Minibatch loss at epoch 15 and iter 89999: 0.281861 and the learning rate: 0.063025
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 90.470%, 89.793%
Time interval: 23.6561 seconds, estimated run time for 200 epoch

Minibatch loss at epoch 26 and iter 162499: 0.553767 and the learning rate: 0.044013
Minibatch  train, validation and sanitized validation accuracy: 84.375%, 91.020%, 90.405%
Time interval: 25.1052 seconds, estimated run time for 200 epochs: 2.4184 hours
Minibatch loss at epoch 27 and iter 164999: 0.191960 and the learning rate: 0.044013
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.030%, 90.461%
Time interval: 18.7801 seconds, estimated run time for 200 epochs: 2.4134 hours
Minibatch loss at epoch 27 and iter 167499: 0.356282 and the learning rate: 0.044013
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 90.960%, 90.327%
Time interval: 20.3694 seconds, estimated run time for 200 epochs: 2.4112 hours
Minibatch loss at epoch 28 and iter 169999: 0.271221 and the learning rate: 0.041812
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 91.060%, 90.450%
Time interval: 22.2467 seconds, estimated run time for 200 e

Minibatch loss at epoch 39 and iter 242499: 0.507819 and the learning rate: 0.029199
Minibatch  train, validation and sanitized validation accuracy: 84.375%, 91.050%, 90.405%
Time interval: 23.8940 seconds, estimated run time for 200 epochs: 2.3820 hours
Minibatch loss at epoch 40 and iter 244999: 0.200263 and the learning rate: 0.029199
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 91.190%, 90.561%
Time interval: 19.1381 seconds, estimated run time for 200 epochs: 2.3794 hours
Minibatch loss at epoch 40 and iter 247499: 0.258719 and the learning rate: 0.029199
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 91.060%, 90.427%
Time interval: 18.2763 seconds, estimated run time for 200 epochs: 2.3759 hours
Minibatch loss at epoch 40 and iter 249999: 0.449787 and the learning rate: 0.027739
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 91.060%, 90.427%
Time interval: 20.5043 seconds, estimated run time for 200 e

Minibatch loss at epoch 52 and iter 322499: 0.204918 and the learning rate: 0.019371
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 91.170%, 90.550%
Time interval: 18.2861 seconds, estimated run time for 200 epochs: 2.3364 hours
Minibatch loss at epoch 52 and iter 324999: 0.421708 and the learning rate: 0.019371
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 91.150%, 90.516%
Time interval: 19.4557 seconds, estimated run time for 200 epochs: 2.3350 hours
Minibatch loss at epoch 53 and iter 327499: 0.187558 and the learning rate: 0.019371
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.140%, 90.528%
Time interval: 22.0927 seconds, estimated run time for 200 epochs: 2.3359 hours
Minibatch loss at epoch 53 and iter 329999: 0.403599 and the learning rate: 0.018403
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 91.090%, 90.405%
Time interval: 19.2313 seconds, estimated run time for 200 e

Minibatch  train, validation and sanitized validation accuracy: 96.875%, 91.330%, 90.672%
Time interval: 19.1429 seconds, estimated run time for 200 epochs: 2.3196 hours
Minibatch loss at epoch 65 and iter 404999: 0.403874 and the learning rate: 0.012851
Minibatch  train, validation and sanitized validation accuracy: 84.375%, 91.190%, 90.550%
Time interval: 19.3573 seconds, estimated run time for 200 epochs: 2.3185 hours
Minibatch loss at epoch 66 and iter 407499: 0.238760 and the learning rate: 0.012851
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 91.310%, 90.661%
Time interval: 23.0859 seconds, estimated run time for 200 epochs: 2.3200 hours
Minibatch loss at epoch 66 and iter 409999: 0.356229 and the learning rate: 0.012209
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 91.360%, 90.739%
Time interval: 23.5094 seconds, estimated run time for 200 epochs: 2.3218 hours
Model saved
Minibatch loss at epoch 66 and iter 412499: 0.430617 

Minibatch loss at epoch 78 and iter 484999: 0.276316 and the learning rate: 0.008526
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 91.300%, 90.650%
Time interval: 18.6155 seconds, estimated run time for 200 epochs: 2.3268 hours
Minibatch loss at epoch 78 and iter 487499: 0.410979 and the learning rate: 0.008526
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 91.190%, 90.561%
Time interval: 20.2698 seconds, estimated run time for 200 epochs: 2.3265 hours
Minibatch loss at epoch 79 and iter 489999: 0.130712 and the learning rate: 0.008099
Minibatch  train, validation and sanitized validation accuracy: 96.875%, 91.290%, 90.650%
Time interval: 24.0376 seconds, estimated run time for 200 epochs: 2.3282 hours
Minibatch loss at epoch 79 and iter 492499: 0.484792 and the learning rate: 0.008099
Minibatch  train, validation and sanitized validation accuracy: 84.375%, 91.260%, 90.617%
Time interval: 33.4649 seconds, estimated run time for 200 e

Minibatch  train, validation and sanitized validation accuracy: 87.500%, 91.330%, 90.695%
Time interval: 18.8101 seconds, estimated run time for 200 epochs: 2.4192 hours
Minibatch loss at epoch 91 and iter 567499: 0.408354 and the learning rate: 0.005656
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 91.320%, 90.695%
Time interval: 19.3265 seconds, estimated run time for 200 epochs: 2.4180 hours
Minibatch loss at epoch 92 and iter 569999: 0.219656 and the learning rate: 0.005373
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 91.330%, 90.695%
Time interval: 33.2969 seconds, estimated run time for 200 epochs: 2.4236 hours
Minibatch loss at epoch 92 and iter 572499: 0.229020 and the learning rate: 0.005373
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.290%, 90.672%
Time interval: 28.6185 seconds, estimated run time for 200 epochs: 2.4269 hours
Minibatch loss at epoch 92 and iter 574999: 0.354931 and the lear

Minibatch loss at epoch 104 and iter 647499: 0.292844 and the learning rate: 0.003752
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 91.340%, 90.717%
Time interval: 18.8748 seconds, estimated run time for 200 epochs: 2.4649 hours
Minibatch loss at epoch 104 and iter 649999: 0.394225 and the learning rate: 0.003565
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 91.370%, 90.739%
Time interval: 19.5192 seconds, estimated run time for 200 epochs: 2.4637 hours
Minibatch loss at epoch 105 and iter 652499: 0.091010 and the learning rate: 0.003565
Minibatch  train, validation and sanitized validation accuracy: 96.875%, 91.400%, 90.761%
Time interval: 32.4495 seconds, estimated run time for 200 epochs: 2.4681 hours
Minibatch loss at epoch 105 and iter 654999: 0.379706 and the learning rate: 0.003565
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 91.420%, 90.784%
Time interval: 23.4179 seconds, estimated run time for 2

Minibatch loss at epoch 117 and iter 727499: 0.113718 and the learning rate: 0.002489
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.370%, 90.717%
Time interval: 35.2164 seconds, estimated run time for 200 epochs: 2.5740 hours
Minibatch loss at epoch 117 and iter 729999: 0.505218 and the learning rate: 0.002365
Minibatch  train, validation and sanitized validation accuracy: 84.375%, 91.280%, 90.628%
Time interval: 23.3042 seconds, estimated run time for 200 epochs: 2.5740 hours
Minibatch loss at epoch 118 and iter 732499: 0.169693 and the learning rate: 0.002365
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 91.360%, 90.717%
Time interval: 21.9380 seconds, estimated run time for 200 epochs: 2.5735 hours
Minibatch loss at epoch 118 and iter 734999: 0.158712 and the learning rate: 0.002365
Minibatch  train, validation and sanitized validation accuracy: 96.875%, 91.320%, 90.672%
Time interval: 47.0828 seconds, estimated run time for 2

Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.360%, 90.717%
Time interval: 25.3741 seconds, estimated run time for 200 epochs: 2.6449 hours
Minibatch loss at epoch 130 and iter 809999: 0.253502 and the learning rate: 0.001569
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 91.370%, 90.728%
Time interval: 31.3865 seconds, estimated run time for 200 epochs: 2.6475 hours
Minibatch loss at epoch 130 and iter 812499: 0.470770 and the learning rate: 0.001569
Minibatch  train, validation and sanitized validation accuracy: 84.375%, 91.380%, 90.739%
Time interval: 26.4419 seconds, estimated run time for 200 epochs: 2.6484 hours
Minibatch loss at epoch 131 and iter 814999: 0.141940 and the learning rate: 0.001569
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.410%, 90.784%
Time interval: 25.9715 seconds, estimated run time for 200 epochs: 2.6491 hours
Minibatch loss at epoch 131 and iter 817499: 0.309987 and the 

Minibatch loss at epoch 143 and iter 889999: 0.196445 and the learning rate: 0.001041
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.370%, 90.717%
Time interval: 18.5881 seconds, estimated run time for 200 epochs: 2.6664 hours
Minibatch loss at epoch 143 and iter 892499: 0.412905 and the learning rate: 0.001041
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 91.430%, 90.795%
Time interval: 25.1915 seconds, estimated run time for 200 epochs: 2.6668 hours
Minibatch loss at epoch 144 and iter 894999: 0.182321 and the learning rate: 0.001041
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.370%, 90.717%
Time interval: 20.7469 seconds, estimated run time for 200 epochs: 2.6658 hours
Minibatch loss at epoch 144 and iter 897499: 0.171498 and the learning rate: 0.001041
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.330%, 90.683%
Time interval: 25.8078 seconds, estimated run time for 2

Minibatch loss at epoch 156 and iter 969999: 0.176768 and the learning rate: 0.000691
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.440%, 90.795%
Time interval: 24.8924 seconds, estimated run time for 200 epochs: 2.6609 hours
Minibatch loss at epoch 156 and iter 972499: 0.158898 and the learning rate: 0.000691
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.440%, 90.795%
Time interval: 22.4333 seconds, estimated run time for 200 epochs: 2.6605 hours
Minibatch loss at epoch 156 and iter 974999: 0.453891 and the learning rate: 0.000691
Minibatch  train, validation and sanitized validation accuracy: 84.375%, 91.410%, 90.761%
Time interval: 21.5811 seconds, estimated run time for 200 epochs: 2.6598 hours
Minibatch loss at epoch 157 and iter 977499: 0.097238 and the learning rate: 0.000691
Minibatch  train, validation and sanitized validation accuracy: 96.875%, 91.430%, 90.784%
Time interval: 22.5749 seconds, estimated run time for 2

Minibatch loss at epoch 168 and iter 1049999: 0.375851 and the learning rate: 0.000458
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 91.440%, 90.795%
Time interval: 23.1496 seconds, estimated run time for 200 epochs: 2.6590 hours
Minibatch loss at epoch 169 and iter 1052499: 0.137918 and the learning rate: 0.000458
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.420%, 90.772%
Time interval: 20.7008 seconds, estimated run time for 200 epochs: 2.6582 hours
Minibatch loss at epoch 169 and iter 1054999: 0.350623 and the learning rate: 0.000458
Minibatch  train, validation and sanitized validation accuracy: 84.375%, 91.450%, 90.806%
Time interval: 29.7010 seconds, estimated run time for 200 epochs: 2.6597 hours
Minibatch loss at epoch 170 and iter 1057499: 0.266451 and the learning rate: 0.000458
Minibatch  train, validation and sanitized validation accuracy: 90.625%, 91.460%, 90.817%
Time interval: 24.9532 seconds, estimated run time f

Minibatch loss at epoch 181 and iter 1129999: 0.345548 and the learning rate: 0.000304
Minibatch  train, validation and sanitized validation accuracy: 84.375%, 91.410%, 90.761%
Time interval: 22.2529 seconds, estimated run time for 200 epochs: 2.6631 hours
Minibatch loss at epoch 182 and iter 1132499: 0.149083 and the learning rate: 0.000304
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.410%, 90.761%
Time interval: 20.6360 seconds, estimated run time for 200 epochs: 2.6623 hours
Minibatch loss at epoch 182 and iter 1134999: 0.158705 and the learning rate: 0.000304
Minibatch  train, validation and sanitized validation accuracy: 93.750%, 91.390%, 90.739%
Time interval: 19.0105 seconds, estimated run time for 200 epochs: 2.6611 hours
Minibatch loss at epoch 182 and iter 1137499: 0.444048 and the learning rate: 0.000304
Minibatch  train, validation and sanitized validation accuracy: 87.500%, 91.400%, 90.761%
Time interval: 21.0514 seconds, estimated run time f

Results:

Final Test accuracy: 96.4%

Test accuracy with the best model: 96.440%

Sanitized test accuracy with the best model: 96.004%

Total run time 184.0799 minutes

*I trained this model using the GPU (Nvidia Geforce GT 650M) on my laptop.*

---