Deep Learning
=============

Assignment 3
------------

Previously in `2_fullyconnected.ipynb`, you trained a logistic regression and a neural network model.

The goal of this assignment is to explore regularization techniques.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle

First reload the data we generated in `1_notmnist.ipynb`.

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a shape that's more adapted to the models we're going to train:
- data as a flat matrix,
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 1 to [0.0, 1.0, 0.0 ...], 2 to [0.0, 0.0, 1.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)


In [4]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

---
Problem 1
---------

Introduce and tune L2 regularization for both logistic and neural network models. Remember that L2 amounts to adding a penalty on the norm of the weights to the loss. In TensorFlow, you can compute the L2 loss for a tensor `t` using `nn.l2_loss(t)`. The right amount of regularization should improve your validation / test accuracy.

---

In [36]:
TRAIN_BATCH_SIZE = 128
NUM_OF_STEPS = 3001
BETA = 0.001

def create_graph(
        tf_train_dataset,
        tf_train_labels,
        tf_valid_dataset,
        tf_test_dataset,
        train_model_func,
        prepare_model_params_func,
    ):
    """
    Create graph using specified function to train model (for example logistic regression).
    Return tf specifications for train prediction, validation prediction and test prediction.
    Graph should be initialized before calling this function
    """
    number_of_labels = tf_train_labels.shape[1].value

    model_params = prepare_model_params_func(
        tf_train_dataset.shape[1].value,
        number_of_labels)
    model = train_model_func(
        tf_train_dataset, *model_params)

    weights = model_params[0]
    
    loss = tf.reduce_mean(
       tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=model) + \
       BETA * tf.nn.l2_loss(weights))
  
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  
    train_prediction = tf.nn.softmax(model)
    valid_prediction = tf.nn.softmax(
        train_model_func(tf_valid_dataset, *model_params))
    test_prediction = tf.nn.softmax(
        train_model_func(tf_test_dataset, *model_params))
    return (
        optimizer,
        loss,
        train_prediction,
        valid_prediction,
        test_prediction)

def prepare_params_for_logistic_model(
        num_features, num_labels):
    """
    Prepare parameters for logistic model
    """
    weights = tf.Variable(
        tf.truncated_normal([num_features, num_labels]))
    biases = tf.Variable(tf.zeros([num_labels]))
    return weights, biases
    
def train_logistic_model(dataset, weights, biases):
    """
    Create tf representation for logistic model
    """
    return tf.matmul(dataset, weights) + biases


def prepare_params_for_nn_model(
        num_features, num_labels):
    """
    Prepare paraneters for nn model
    """
    weights_before_relu = tf.Variable(
        tf.truncated_normal([num_features, num_labels]))
    biases_before_relu = tf.Variable(tf.zeros([num_labels]))
    
    weights_after_relu = tf.Variable(
        tf.truncated_normal([num_labels, num_labels]))
    biases_after_relu = tf.Variable(tf.zeros([num_labels]))
    return (
        weights_before_relu, biases_before_relu,
        weights_after_relu, biases_after_relu,
    )

def train_nn_model(
        dataset, 
        weights_before_relu, biases_before_relu,
        weights_after_relu, biases_after_relu
    ):
    """
    Create tf representation for nn model
    """
    return tf.matmul(
            tf.nn.relu(
                tf.matmul(dataset, weights_before_relu) + biases_before_relu),
            weights_after_relu
            ) + biases_after_relu


def execute_session(
        session,
    
        num_steps,
        batch_size,
    
        optimizer,
        loss,
        train_prediction,
        valid_prediction,
        test_prediction,
    
        train_labels,
        valid_labels,
        test_labels,
    
        tf_train_dataset,
        tf_train_labels,
    ):
    """
    Execute session and print prediction result
    """
    def _run_step(step):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        # Generate a minibatch.
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        
        feed_dict = {
            tf_train_dataset : batch_data, 
            tf_train_labels : batch_labels
        }
        _, l, predictions = session.run(
           [
               optimizer, 
               loss, 
               train_prediction
           ], 
           feed_dict=feed_dict)
        if (step % 500 == 0):
            print("Minibatch loss at step %d: %f" % (step, l))
            print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
            print("Validation accuracy: %.1f%%" % accuracy(valid_prediction.eval(), valid_labels))
        

    for step in range(num_steps):
        _run_step(step)
    print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
        

def run_model(
       train_dataset,
       train_labels,

       valid_dataset,
       valid_labels,
    
       test_dataset,
       test_labels,
    
       train_model_func,
       prepare_model_params_func
    ):
    """
    Define & Run tf model
    """
    num_labels = train_labels.shape[1]

    tf.reset_default_graph()
    graph = tf.Graph()
    with graph.as_default():
        tf_train_dataset = tf.placeholder(
            tf.float32,
            shape=(TRAIN_BATCH_SIZE, train_dataset.shape[1]))
        tf_train_labels = tf.placeholder(tf.float32, shape=(TRAIN_BATCH_SIZE, num_labels))
        tf_valid_dataset = tf.constant(valid_dataset)
        tf_test_dataset = tf.constant(test_dataset) 
        
        (
            optimizer,
            loss,
            train_prediction,
            valid_prediction,
            test_prediction
        ) = create_graph(
            tf_train_dataset,
            tf_train_labels,
            tf_valid_dataset,
            tf_test_dataset,
            train_model_func,
            prepare_model_params_func
        )
    with tf.Session(graph=graph) as session:
        tf.global_variables_initializer().run()
        execute_session(
            session,
            NUM_OF_STEPS,
            TRAIN_BATCH_SIZE,
    
            optimizer,
            loss,
            train_prediction,
            valid_prediction,
            test_prediction,
    
            train_labels,
            valid_labels,
            test_labels,
    
            tf_train_dataset,
            tf_train_labels)
        
        
# Run logistic regression model
print('\n\nRunning logistic regression')
run_model(
    train_dataset,
    train_labels,

    valid_dataset,
    valid_labels,
    
    test_dataset,
    test_labels,

    train_logistic_model,
    prepare_params_for_logistic_model)  

# Run nn model
print('\n\nRunning nn model')
run_model(
    train_dataset,
    train_labels,

    valid_dataset,
    valid_labels,
    
    test_dataset,
    test_labels,

    train_nn_model,
    prepare_params_for_nn_model)




Running logistic regression
Minibatch loss at step 0: 16.308277
Minibatch accuracy: 12.5%
Validation accuracy: 17.8%
Minibatch loss at step 500: 3.426898
Minibatch accuracy: 69.5%
Validation accuracy: 76.2%
Minibatch loss at step 1000: 1.734720
Minibatch accuracy: 76.6%
Validation accuracy: 78.2%
Minibatch loss at step 1500: 1.298187
Minibatch accuracy: 75.8%
Validation accuracy: 80.0%
Minibatch loss at step 2000: 1.024662
Minibatch accuracy: 84.4%
Validation accuracy: 80.0%
Minibatch loss at step 2500: 1.027295
Minibatch accuracy: 75.8%
Validation accuracy: 81.2%
Minibatch loss at step 3000: 0.748582
Minibatch accuracy: 78.1%
Validation accuracy: 82.0%
Test accuracy: 89.0%


Running nn model
Minibatch loss at step 0: 30.514893
Minibatch accuracy: 7.8%
Validation accuracy: 20.7%
Minibatch loss at step 500: 2.827505
Minibatch accuracy: 68.8%
Validation accuracy: 75.4%
Minibatch loss at step 1000: 1.671303
Minibatch accuracy: 82.0%
Validation accuracy: 78.7%
Minibatch loss at step 1500

---
Problem 2
---------
Let's demonstrate an extreme case of overfitting. Restrict your training data to just a few batches. What happens?

---

---
Problem 3
---------
Introduce Dropout on the hidden layer of the neural network. Remember: Dropout should only be introduced during training, not evaluation, otherwise your evaluation results would be stochastic as well. TensorFlow provides `nn.dropout()` for that, but you have to make sure it's only inserted during training.

What happens to our extreme overfitting case?

---

---
Problem 4
---------

Try to get the best performance you can using a multi-layer model! The best reported test accuracy using a deep network is [97.1%](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html?showComment=1391023266211#c8758720086795711595).

One avenue you can explore is to add multiple layers.

Another one is to use learning rate decay:

    global_step = tf.Variable(0)  # count the number of steps taken.
    learning_rate = tf.train.exponential_decay(0.5, global_step, ...)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
 
 ---
