Deep Learning
=============

Assignment 3
------------

Previously in `2_fullyconnected.ipynb`, you trained a logistic regression and a neural network model.

The goal of this assignment is to explore regularization techniques.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle

First reload the data we generated in `1_notmnist.ipynb`.

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a shape that's more adapted to the models we're going to train:
- data as a flat matrix,
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 1 to [0.0, 1.0, 0.0 ...], 2 to [0.0, 0.0, 1.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)


In [4]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

---
Problem 1
---------

Introduce and tune L2 regularization for both logistic and neural network models. Remember that L2 amounts to adding a penalty on the norm of the weights to the loss. In TensorFlow, you can compute the L2 loss for a tensor `t` using `nn.l2_loss(t)`. The right amount of regularization should improve your validation / test accuracy.

---

### Logistic model

In [5]:
batch_size = 128

regularization_strength = 0.01

graph = tf.Graph()
with graph.as_default():

    # Input data. For the training data, we use a placeholder that will be fed
    # at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32,
                                      shape=(batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    weights = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_labels]))
    biases = tf.Variable(tf.zeros([num_labels]))

    # Training computation.
    logits = tf.matmul(tf_train_dataset, weights) + biases
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))

    # Add regularization
    regularization = tf.nn.l2_loss(weights) + tf.nn.l2_loss(biases)
    loss = tf.reduce_mean(loss + regularization_strength * regularization)

    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(tf.matmul(tf_valid_dataset, weights) + biases)
    test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

In [6]:
train_subset = 10000

num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 50.880775
Minibatch accuracy: 3.9%
Validation accuracy: 9.7%
Minibatch loss at step 500: 0.751164
Minibatch accuracy: 84.4%
Validation accuracy: 81.2%
Minibatch loss at step 1000: 0.714058
Minibatch accuracy: 81.2%
Validation accuracy: 80.9%
Minibatch loss at step 1500: 0.792133
Minibatch accuracy: 78.1%
Validation accuracy: 81.3%
Minibatch loss at step 2000: 0.764250
Minibatch accuracy: 82.0%
Validation accuracy: 80.7%
Minibatch loss at step 2500: 0.923356
Minibatch accuracy: 75.8%
Validation accuracy: 81.4%
Minibatch loss at step 3000: 0.779274
Minibatch accuracy: 85.2%
Validation accuracy: 80.6%
Test accuracy: 87.3%


### Neural Network

In [7]:
hidden_layer = 1024
graph = tf.Graph()
with graph.as_default():

    # Input data. For the training data, we use a placeholder that will be fed
    # at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    weights = {
        'h': tf.Variable(tf.random_normal([image_size * image_size, hidden_layer])),
        'out': tf.Variable(tf.random_normal([hidden_layer, num_labels]))
    }
    biases = {
        'h': tf.Variable(tf.zeros([hidden_layer])),
        'out': tf.Variable(tf.zeros([num_labels]))
    }

    # Training computation.
    def get_network_out(train_set):
        # Hidden layer is relu
        hidden = tf.matmul(train_set, weights['h']) + biases['h']
        hidden = tf.nn.relu(hidden)

        # Out is an array with probabilities for every class
        return tf.matmul(hidden, weights['out']) + biases['out']

    train_out = get_network_out(tf_train_dataset)

    loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=train_out))

    # Add regularization
    regularization = tf.nn.l2_loss(weights['h']) + tf.nn.l2_loss(biases['h'])
    regularization += tf.nn.l2_loss(weights['out']) + tf.nn.l2_loss(biases['out'])
    loss = tf.reduce_mean(loss + regularization_strength * regularization)

    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(train_out)
    valid_prediction = tf.nn.softmax(get_network_out(tf_valid_dataset))
    test_prediction = tf.nn.softmax(get_network_out(tf_test_dataset))

In [8]:
num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 4376.293457
Minibatch accuracy: 15.6%
Validation accuracy: 35.3%
Minibatch loss at step 500: 27.065189
Minibatch accuracy: 85.9%
Validation accuracy: 84.6%
Minibatch loss at step 1000: 0.857595
Minibatch accuracy: 86.7%
Validation accuracy: 82.5%
Minibatch loss at step 1500: 0.691564
Minibatch accuracy: 81.2%
Validation accuracy: 83.6%
Minibatch loss at step 2000: 0.690555
Minibatch accuracy: 82.8%
Validation accuracy: 83.0%
Minibatch loss at step 2500: 0.879966
Minibatch accuracy: 82.0%
Validation accuracy: 83.6%
Minibatch loss at step 3000: 0.684967
Minibatch accuracy: 88.3%
Validation accuracy: 83.2%
Test accuracy: 89.7%


---
Problem 2
---------
Let's demonstrate an extreme case of overfitting. Restrict your training data to just a few batches. What happens?

---

In [9]:
num_steps = 3001

train_dataset_small = train_dataset[:200]
train_labels_small = train_labels[:200]

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels_small.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset_small[offset:(offset + batch_size), :]
    batch_labels = train_labels_small[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 4510.875000
Minibatch accuracy: 10.2%
Validation accuracy: 27.1%
Minibatch loss at step 500: 27.207613
Minibatch accuracy: 100.0%
Validation accuracy: 71.0%
Minibatch loss at step 1000: 0.385517
Minibatch accuracy: 100.0%
Validation accuracy: 75.1%
Minibatch loss at step 1500: 0.193194
Minibatch accuracy: 100.0%
Validation accuracy: 75.0%
Minibatch loss at step 2000: 0.187091
Minibatch accuracy: 100.0%
Validation accuracy: 75.0%
Minibatch loss at step 2500: 0.184617
Minibatch accuracy: 100.0%
Validation accuracy: 74.8%
Minibatch loss at step 3000: 0.186826
Minibatch accuracy: 100.0%
Validation accuracy: 74.7%
Test accuracy: 81.2%


As we can see on every batch of the training data the accuracy is 100% but on the validation set and on the test set the accuracy is lower. The model is so strong that is fits perfectly all the training cases and does not generalizes well. If we choose bigger reguarization strenth this might be solved

---
Problem 3
---------
Introduce Dropout on the hidden layer of the neural network. Remember: Dropout should only be introduced during training, not evaluation, otherwise your evaluation results would be stochastic as well. TensorFlow provides `nn.dropout()` for that, but you have to make sure it's only inserted during training.

What happens to our extreme overfitting case?

---

In [10]:
hidden_layer = 1024
graph = tf.Graph()
with graph.as_default():

    # Input data. For the training data, we use a placeholder that will be fed
    # at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    weights = {
        'h': tf.Variable(tf.random_normal([image_size * image_size, hidden_layer])),
        'out': tf.Variable(tf.random_normal([hidden_layer, num_labels]))
    }
    biases = {
        'h': tf.Variable(tf.zeros([hidden_layer])),
        'out': tf.Variable(tf.zeros([num_labels]))
    }
    
    keep_probability = tf.placeholder(tf.float32)

    # Training computation.
    def get_network_out(train_set, add_dropout=False):
        # Hidden layer is relu
        hidden = tf.matmul(train_set, weights['h']) + biases['h']
        hidden = tf.nn.relu(hidden)
        if add_dropout:
            hidden = tf.nn.dropout(hidden, keep_probability)

        # Out is an array with probabilities for every class
        return tf.matmul(hidden, weights['out']) + biases['out']

    train_out = get_network_out(tf_train_dataset, add_dropout=True)

    loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=train_out))

    # Add regularization
    regularization = tf.nn.l2_loss(weights['h']) + tf.nn.l2_loss(biases['h'])
    regularization += tf.nn.l2_loss(weights['out']) + tf.nn.l2_loss(biases['out'])
    loss = tf.reduce_mean(loss + regularization_strength * regularization)

    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(train_out)
    valid_prediction = tf.nn.softmax(get_network_out(tf_valid_dataset))
    test_prediction = tf.nn.softmax(get_network_out(tf_test_dataset))

In [11]:
num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data,
                 tf_train_labels : batch_labels,
                 keep_probability : 0.8}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 4540.448730
Minibatch accuracy: 10.9%
Validation accuracy: 29.8%
Minibatch loss at step 500: 27.128433
Minibatch accuracy: 86.7%
Validation accuracy: 84.7%
Minibatch loss at step 1000: 0.881972
Minibatch accuracy: 85.9%
Validation accuracy: 82.4%
Minibatch loss at step 1500: 0.715899
Minibatch accuracy: 82.8%
Validation accuracy: 83.3%
Minibatch loss at step 2000: 0.716403
Minibatch accuracy: 81.2%
Validation accuracy: 82.8%
Minibatch loss at step 2500: 0.894382
Minibatch accuracy: 82.0%
Validation accuracy: 83.5%
Minibatch loss at step 3000: 0.724915
Minibatch accuracy: 87.5%
Validation accuracy: 83.1%
Test accuracy: 89.7%


In [12]:
num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data,
                 tf_train_labels : batch_labels,
                 keep_probability : 0.4}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 4938.639648
Minibatch accuracy: 5.5%
Validation accuracy: 28.1%
Minibatch loss at step 500: 27.619091
Minibatch accuracy: 84.4%
Validation accuracy: 83.7%
Minibatch loss at step 1000: 0.979211
Minibatch accuracy: 85.2%
Validation accuracy: 81.4%
Minibatch loss at step 1500: 0.787910
Minibatch accuracy: 82.8%
Validation accuracy: 82.8%
Minibatch loss at step 2000: 0.781576
Minibatch accuracy: 81.2%
Validation accuracy: 81.8%
Minibatch loss at step 2500: 1.000817
Minibatch accuracy: 77.3%
Validation accuracy: 82.4%
Minibatch loss at step 3000: 0.795899
Minibatch accuracy: 85.9%
Validation accuracy: 82.0%
Test accuracy: 88.9%


#### On the extremely overfitting model

In [13]:
num_steps = 3001

train_dataset_small = train_dataset[:200]
train_labels_small = train_labels[:200]

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels_small.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset_small[offset:(offset + batch_size), :]
    batch_labels = train_labels_small[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data,
                 tf_train_labels : batch_labels,
                keep_probability:0.2}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 4989.529297
Minibatch accuracy: 9.4%
Validation accuracy: 36.1%
Minibatch loss at step 500: 28.025345
Minibatch accuracy: 100.0%
Validation accuracy: 75.0%
Minibatch loss at step 1000: 0.430845
Minibatch accuracy: 99.2%
Validation accuracy: 75.5%
Minibatch loss at step 1500: 0.231648
Minibatch accuracy: 100.0%
Validation accuracy: 75.0%
Minibatch loss at step 2000: 0.215540
Minibatch accuracy: 100.0%
Validation accuracy: 75.3%
Minibatch loss at step 2500: 0.215686
Minibatch accuracy: 100.0%
Validation accuracy: 75.2%
Minibatch loss at step 3000: 0.213877
Minibatch accuracy: 100.0%
Validation accuracy: 75.3%
Test accuracy: 81.6%


---
Problem 4
---------

Try to get the best performance you can using a multi-layer model! The best reported test accuracy using a deep network is [97.1%](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html?showComment=1391023266211#c8758720086795711595).

One avenue you can explore is to add multiple layers.

Another one is to use learning rate decay:

    global_step = tf.Variable(0)  # count the number of steps taken.
    learning_rate = tf.train.exponential_decay(0.5, global_step, ...)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
 
 ---


In [14]:
hidden_layer_1 = 4096
hidden_layer_2 = 2048
hidden_layer_3 = 1024
hidden_layer_4 = 512
graph = tf.Graph()
regularization_strength = 0.001
import math
with graph.as_default():

    # Input data. For the training data, we use a placeholder that will be fed
    # at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    weights = {
        'h1': tf.Variable(tf.truncated_normal([image_size * image_size, hidden_layer_1],
                          stddev=math.sqrt(2.0/(image_size * image_size)))),
        'h2': tf.Variable(tf.truncated_normal([hidden_layer_1, hidden_layer_2],
                          stddev=math.sqrt(2.0/hidden_layer_1))),
        'h3': tf.Variable(tf.truncated_normal([hidden_layer_2, hidden_layer_3],
                          stddev=math.sqrt(2.0/hidden_layer_2))),
        'h4': tf.Variable(tf.truncated_normal([hidden_layer_3, hidden_layer_4],
                          stddev=math.sqrt(2.0/hidden_layer_3))),
        'out': tf.Variable(tf.truncated_normal([hidden_layer_4, num_labels],
                           stddev=math.sqrt(2.0/hidden_layer_4)))
    }
    biases = {
        'h1': tf.Variable(tf.zeros([hidden_layer_1])),
        'h2': tf.Variable(tf.zeros([hidden_layer_2])),
        'h3': tf.Variable(tf.zeros([hidden_layer_3])),
        'h4': tf.Variable(tf.zeros([hidden_layer_4])),
        'out': tf.Variable(tf.zeros([num_labels]))
    }
    
    keep_probability = tf.placeholder(tf.float32)
    
    def forward(current_in, layer, add_dropout=False):
        # Hidden layer is relu
        next_in = tf.matmul(current_in, weights[layer]) + biases[layer]
        next_in = tf.nn.relu(next_in)
        if add_dropout:
            next_in = tf.nn.dropout(next_in, keep_probability)
        
        return next_in

    # Training computation.
    def get_network_out(train_set, add_dropout=False):
        hidden = forward(train_set, 'h1', add_dropout=add_dropout)
        hidden = forward(hidden, 'h2', add_dropout=add_dropout)
        hidden = forward(hidden, 'h3', add_dropout=add_dropout)
        hidden = forward(hidden, 'h4', add_dropout=add_dropout)

        # Out is an array with probabilities for every class
        return tf.matmul(hidden, weights['out']) + biases['out']

    train_out = get_network_out(tf_train_dataset, add_dropout=True)

    loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=train_out))

    # Add regularization
    weights_reg = [tf.nn.l2_loss(weigth) for weigth in weights.values()]
    biases_reg = [tf.nn.l2_loss(bias) for bias in biases.values()]
    regularization = sum(weights_reg) + sum(biases_reg)
    loss = tf.reduce_mean(loss + regularization_strength * regularization)

    # Optimizer.
    global_step = tf.Variable(0)  # count the number of steps taken.
    learning_rate = tf.train.exponential_decay(0.1, global_step, 100000, 0.96, staircase=True)
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss, global_step=global_step)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(train_out)
    valid_prediction = tf.nn.softmax(get_network_out(tf_valid_dataset))
    test_prediction = tf.nn.softmax(get_network_out(tf_test_dataset))

In [15]:
num_steps = 10001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data,
                 tf_train_labels : batch_labels,
                 keep_probability : 0.5}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 9.063753
Minibatch accuracy: 10.9%
Validation accuracy: 10.0%
Minibatch loss at step 500: 4.273272
Minibatch accuracy: 87.5%
Validation accuracy: 84.2%
Minibatch loss at step 1000: 2.856750
Minibatch accuracy: 88.3%
Validation accuracy: 85.5%
Minibatch loss at step 1500: 2.017052
Minibatch accuracy: 86.7%
Validation accuracy: 86.0%
Minibatch loss at step 2000: 1.559755
Minibatch accuracy: 82.8%
Validation accuracy: 85.6%
Minibatch loss at step 2500: 1.327234
Minibatch accuracy: 82.8%
Validation accuracy: 86.5%
Minibatch loss at step 3000: 0.935926
Minibatch accuracy: 88.3%
Validation accuracy: 86.6%
Minibatch loss at step 3500: 0.887735
Minibatch accuracy: 85.9%
Validation accuracy: 87.0%
Minibatch loss at step 4000: 0.712365
Minibatch accuracy: 87.5%
Validation accuracy: 86.9%
Minibatch loss at step 4500: 0.759854
Minibatch accuracy: 87.5%
Validation accuracy: 86.8%
Minibatch loss at step 5000: 0.821552
Minibatch accuracy: 82.8%
Validation accurac