Deep Learning
=============

Assignment 3
------------

Previously in `2_fullyconnected.ipynb`, you trained a logistic regression and a neural network model.

The goal of this assignment is to explore regularization techniques.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle

First reload the data we generated in _notmist.ipynb_.

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a shape that's more adapted to the models we're going to train:
- data as a flat matrix,
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 2 to [0.0, 1.0, 0.0 ...], 3 to [0.0, 0.0, 1.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)


In [4]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

---
Problem 1
---------

Introduce and tune L2 regularization for both logistic and neural network models. Remember that L2 amounts to adding a penalty on the norm of the weights to the loss. In TensorFlow, you can compute the L2 loss for a tensor `t` using `nn.l2_loss(t)`. The right amount of regularization should improve your validation / test accuracy.

---

In [56]:
def generate_logistic_graph(batch_size):
    
    #######
    # SETUP
    #######

    flat_image_size = image_size * image_size
    regularization_beta = 5e-4

    graph = tf.Graph()
    with graph.as_default():

        #######
        # Input
        #######

        tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, flat_image_size))
        tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
        tf_valid_dataset = tf.constant(valid_dataset)
        tf_test_dataset = tf.constant(test_dataset)

        ###########
        # Variables
        ###########

        weights = tf.Variable(tf.truncated_normal([flat_image_size, num_labels]))
        biases = tf.Variable(tf.zeros([num_labels]))

        ##########
        # Training
        ##########

        logits = tf.matmul(tf_train_dataset, weights) + biases
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))

        # L2 Regularization

        regularization = tf.nn.l2_loss(weights) + tf.nn.l2_loss(biases)
        loss = loss + (regularization_beta * regularization)

        # Optimizer

        optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

        ############
        # Prediction
        ############

        train_prediction = tf.nn.softmax(logits)
        valid_prediction = tf.nn.softmax(tf.matmul(tf_valid_dataset, weights) + biases)
        test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)
        
    return {
        'graph': graph,
        'tf_train_dataset': tf_train_dataset,
        'tf_train_labels': tf_train_labels,
        'tf_valid_dataset': tf_valid_dataset,
        'tf_test_dataset': tf_test_dataset,
        'loss': loss,
        'optimizer': optimizer,
        'train_prediction': train_prediction,
        'valid_prediction': valid_prediction,
        'test_prediction': test_prediction
    }

In [63]:
def generate_nn_graph(batch_size, use_dropout = False):
    
    #######
    # SETUP
    #######

    flat_image_size = image_size * image_size
    num_hidden_nodes = 1024
    regularization_beta = 5e-4

    graph = tf.Graph()
    with graph.as_default():

        #######
        # Input
        #######

        tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, flat_image_size))
        tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
        tf_valid_dataset = tf.constant(valid_dataset)
        tf_test_dataset = tf.constant(test_dataset)

        ###########
        # Variables
        ###########

        # Hidden Layer (RELU)

        hidden_weights = tf.Variable(tf.truncated_normal([flat_image_size, num_hidden_nodes]))
        hidden_biases = tf.Variable(tf.zeros([num_hidden_nodes]))
        hidden_results = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights) + hidden_biases)

        # Output Layer

        output_weights = tf.Variable(tf.truncated_normal([num_hidden_nodes, num_labels]))
        output_biases = tf.Variable(tf.zeros([num_labels]))

        ##########
        # Training
        ##########

        logits = 0
        if (use_dropout):
            logits = tf.matmul(tf.nn.dropout(hidden_results, 0.5), output_weights) + output_biases
        else:
            logits = tf.matmul(hidden_results, output_weights) + output_biases
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))

        # L2 Regularization

        regularization = tf.nn.l2_loss(hidden_weights) + tf.nn.l2_loss(hidden_biases) + tf.nn.l2_loss(output_weights) + tf.nn.l2_loss(output_biases)
        loss = loss + (regularization_beta * regularization)

        # Optimizer

        optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

        ############
        # Prediction
        ############

        train_prediction = tf.nn.softmax(logits)

        hidden_valid_prediction = tf.nn.relu(tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases)
        valid_prediction = tf.nn.softmax(tf.matmul(hidden_valid_prediction, output_weights) + output_biases)

        hidden_test_prediction = tf.nn.relu(tf.matmul(tf_test_dataset, hidden_weights) + hidden_biases)
        test_prediction = tf.nn.softmax(tf.matmul(hidden_test_prediction, output_weights) + output_biases)
        
    return {
        'graph': graph,
        'tf_train_dataset': tf_train_dataset,
        'tf_train_labels': tf_train_labels,
        'tf_valid_dataset': tf_valid_dataset,
        'tf_test_dataset': tf_test_dataset,
        'loss': loss,
        'optimizer': optimizer,
        'train_prediction': train_prediction,
        'valid_prediction': valid_prediction,
        'test_prediction': test_prediction
    }

In [58]:
def execute_graph(graph_info, num_steps = 3001):
    
    ###########
    # EXECUTION
    ###########

    graph = graph_info['graph'] 
    tf_train_dataset = graph_info['tf_train_dataset']
    tf_train_labels = graph_info['tf_train_labels']
    tf_valid_dataset = graph_info['tf_valid_dataset']
    tf_test_dataset = graph_info['tf_test_dataset']
    loss = graph_info['loss']
    optimizer = graph_info['optimizer']
    train_prediction = graph_info['train_prediction']
    valid_prediction = graph_info['valid_prediction'] 
    test_prediction = graph_info['test_prediction']

    with tf.Session(graph=graph) as Session:

        tf.initialize_all_variables().run()

        for step in range(num_steps):

            # Generate a mini-batch
            offset = (batch_size * step) % (train_labels.shape[0] - batch_size)
            batch_dataset = train_dataset[offset:(offset + batch_size), :]
            batch_labels = train_labels[offset:(offset + batch_size), :]

            # Feed the mini-batch to the session
            feed_dict = {
                tf_train_dataset: batch_dataset,
                tf_train_labels: batch_labels
            }
            _, l, predictions = Session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)

            # Print progress
            if (step % 500 == 0):
                print("Mini-batch loss at step %d: %f" % (step, l))
                print("Mini-batch accuracy: %.1f%%" % (accuracy(predictions, batch_labels)))
                print("Validation accuracy: %.1f%%" % (accuracy(valid_prediction.eval(), valid_labels)))

        print("Test accuracy: %.1f%%" % (accuracy(test_prediction.eval(), test_labels)))
        

In [52]:
####################
# Training Variables
####################

batch_size = 128
num_steps = 1001

In [39]:
#####################
# Logistic Regression
#####################

logistic_graph_info = generate_logistic_graph(batch_size)
execute_graph(logistic_graph_info, num_steps)

Mini-batch loss at step 0: 18.274147
Mini-batch accuracy: 7.8%
Validation accuracy: 8.3%
Mini-batch loss at step 500: 2.390781
Mini-batch accuracy: 75.8%
Validation accuracy: 75.5%
Mini-batch loss at step 1000: 1.673430
Mini-batch accuracy: 78.9%
Validation accuracy: 76.8%
Test accuracy: 84.0%


In [53]:
################
# Neural Network
################

nn_graph_info = generate_nn_graph(batch_size)
execute_graph(nn_graph_info, num_steps)

Mini-batch loss at step 0: 535.249634
Mini-batch accuracy: 7.0%
Validation accuracy: 32.4%
Mini-batch loss at step 500: 135.204346
Mini-batch accuracy: 85.2%
Validation accuracy: 80.5%
Mini-batch loss at step 1000: 99.102890
Mini-batch accuracy: 82.8%
Validation accuracy: 79.7%
Test accuracy: 86.4%


---
Problem 2
---------
Let's demonstrate an extreme case of overfitting. Restrict your training data to just a few batches. What happens?

---

In [67]:
####################
# Training Variables
####################

batch_size = 16
num_steps = 1001

In [68]:
#####################
# Logistic Regression
#####################

logistic_graph_info = generate_logistic_graph(batch_size)
execute_graph(logistic_graph_info, num_steps)

Mini-batch loss at step 0: 17.954659
Mini-batch accuracy: 0.0%
Validation accuracy: 13.6%
Mini-batch loss at step 500: 2.436039
Mini-batch accuracy: 81.2%
Validation accuracy: 74.4%
Mini-batch loss at step 1000: 2.996360
Mini-batch accuracy: 68.8%
Validation accuracy: 69.7%
Test accuracy: 75.1%


In [69]:
################
# Neural Network
################

nn_graph_info = generate_nn_graph(batch_size)
execute_graph(nn_graph_info, num_steps)

Mini-batch loss at step 0: 479.940735
Mini-batch accuracy: 12.5%
Validation accuracy: 17.6%
Mini-batch loss at step 500: 334.586060
Mini-batch accuracy: 68.8%
Validation accuracy: 62.9%
Mini-batch loss at step 1000: 183.660278
Mini-batch accuracy: 56.2%
Validation accuracy: 57.6%
Test accuracy: 62.8%


---
Problem 3
---------
Introduce Dropout on the hidden layer of the neural network. Remember: Dropout should only be introduced during training, not evaluation, otherwise your evaluation results would be stochastic as well. TensorFlow provides `nn.dropout()` for that, but you have to make sure it's only inserted during training.

What happens to our extreme overfitting case?

---

In [70]:
################
# Neural Network
################

nn_graph_info = generate_nn_graph(batch_size, True)
execute_graph(nn_graph_info, num_steps)

Mini-batch loss at step 0: 672.162598
Mini-batch accuracy: 0.0%
Validation accuracy: 24.9%
Mini-batch loss at step 500: 9037046597388599296.000000
Mini-batch accuracy: 31.2%
Validation accuracy: 45.3%
Mini-batch loss at step 1000: 10629556054103407020477756460761088.000000
Mini-batch accuracy: 37.5%
Validation accuracy: 32.1%
Test accuracy: 33.9%


---
Problem 4
---------

Try to get the best performance you can using a multi-layer model! The best reported test accuracy using a deep network is [97.1%](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html?showComment=1391023266211#c8758720086795711595).

One avenue you can explore is to add multiple layers.

Another one is to use learning rate decay:

    global_step = tf.Variable(0)  # count the number of steps taken.
    learning_rate = tf.train.exponential_decay(0.5, step, ...)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
 
 ---


---
References
===

https://github.com/giubil/udacity-deep-learning