# Deep Neural Network - L2 Regularization, Dropout, Learning Rate Decay

---

Created as a part of Google's 'Machine Learning to Deep Learning' course on Udacity.

(Created with Google Colab)

## Step 1 - Imports and Data Setup

In [1]:
## IMPORTS

# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range
import numpy as np
import os
import sys
import tarfile
from IPython.display import display, Image
from scipy import ndimage
from sklearn.linear_model import LogisticRegression
from six.moves.urllib.request import urlretrieve
from six.moves import cPickle as pickle

!pip install imageio

import imageio

# Config the matlotlib backend as plotting inline in IPython
%matplotlib inline

## DATA SETUP

# FUSE Drive to access Drive Data
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

# Generate auth tokens for Colab
from google.colab import auth
auth.authenticate_user()

# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

gpg: keybox '/tmp/tmp718zm2mj/pubring.gpg' created
gpg: /tmp/tmp718zm2mj/trustdb.gpg: trustdb created
gpg: key AD5F235DF639B041: public key "Launchpad PPA for Alessandro Strada" imported
gpg: Total number processed: 1
gpg:               imported: 1
··········


In [2]:
# Create a directory and mount Google Drive using that directory.
!mkdir -p drive
!google-drive-ocamlfuse drive

pickle_file = 'drive/Data/notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory

image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 2 to [0.0, 1.0, 0.0 ...], 3 to [0.0, 0.0, 1.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels

train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)

print('\nTraining set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

fuse: mountpoint is not empty
fuse: if you are sure this is safe, use the 'nonempty' mount option

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)


### Accuracy

Defining an accuracy function

In [0]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

---

## Step 2 - L2 Regularization

### Problem 1

Introduce and tune L2 regularization for both logistic and neural network models. Remember that L2 amounts to adding a penalty on the norm of the weights to the loss. In TensorFlow, you can compute the L2 loss for a tensor t using nn.l2_loss(t). The right amount of regularization should improve your validation / test accuracy.

### Multinomial Logistic Regression

In [0]:
# With gradient descent training, even this much data is prohibitive.
# Subset the training data for faster turnaround.
train_subset = 10000
beta = 0.01

graph = tf.Graph()
with graph.as_default():

  # Input data.
  # Load the training, validation and test data into constants that are
  # attached to the graph.
  tf_train_dataset = tf.constant(train_dataset[:train_subset, :])
  tf_train_labels = tf.constant(train_labels[:train_subset])
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Variables.
  # These are the parameters that we are going to be training. The weight
  # matrix will be initialized using random values following a (truncated)
  # normal distribution. The biases get initialized to zero.
  weights = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_labels]))
  biases = tf.Variable(tf.zeros([num_labels]))
  
  # Training computation.
  # We multiply the inputs with the weight matrix, and add biases. We compute
  # the softmax and cross-entropy (it's one operation in TensorFlow, because
  # it's very common, and it can be optimized). We take the average of this
  # cross-entropy across all training examples: that's our loss.
  logits = tf.matmul(tf_train_dataset, weights) + biases
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf_train_labels, logits=logits))
  
  ## Loss Function using L2 regularization
  reg = tf.nn.l2_loss(weights)
  loss = tf.reduce_mean(loss + beta * reg)
  
  # Optimizer.
  # We are going to find the minimum of this loss using gradient descent.
  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  # These are not part of training, but merely here so that we can report
  # accuracy figures as we train.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(tf.matmul(tf_valid_dataset, weights) + biases)
  test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

In [5]:
num_steps = 5001

def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
            / predictions.shape[0])

with tf.Session(graph=graph) as session:
    # This is a one-time operation which ensures the parameters get initialized as
    # we described in the graph: random weights for the matrix, zeros for the
    # biases. 
    tf.initialize_all_variables().run()
    
    print('\nInitialized\n')
    
    for step in range(num_steps):
    # Run the computations. We tell .run() that we want to run the optimizer,
    # and get the loss value and the training predictions returned as numpy
    # arrays.
        _, l, predictions = session.run([optimizer, loss, train_prediction])
        if (step % 500 == 0):
            print('\nLoss at step {}: {}'.format(step, l))
            print('Training accuracy: {:.1f}'.format(accuracy(predictions, 
                                                         train_labels[:train_subset, :])))
            # Calling .eval() on valid_prediction is basically like calling run(), but
            # just to get that one numpy array. Note that it recomputes all its graph
            # dependencies.
            
            # You don't have to do .eval above because we already ran the session for the
            # train_prediction
            print('Validation accuracy: {:.1f}'.format(accuracy(valid_prediction.eval(), 
                                                           valid_labels)))
    print('\nTest accuracy: {:.1f}'.format(accuracy(test_prediction.eval(), test_labels)))

Instructions for updating:
Use `tf.global_variables_initializer` instead.

Initialized


Loss at step 0: 45.35413360595703
Training accuracy: 14.3
Validation accuracy: 18.1

Loss at step 500: 0.8454488515853882
Training accuracy: 84.1
Validation accuracy: 82.1

Loss at step 1000: 0.6894906759262085
Training accuracy: 84.5
Validation accuracy: 82.3

Loss at step 1500: 0.6886019706726074
Training accuracy: 84.4
Validation accuracy: 82.3

Loss at step 2000: 0.688593327999115
Training accuracy: 84.5
Validation accuracy: 82.3

Loss at step 2500: 0.6885926723480225
Training accuracy: 84.4
Validation accuracy: 82.3

Loss at step 3000: 0.6885926127433777
Training accuracy: 84.4
Validation accuracy: 82.3

Loss at step 3500: 0.6885926723480225
Training accuracy: 84.4
Validation accuracy: 82.3

Loss at step 4000: 0.6885926127433777
Training accuracy: 84.4
Validation accuracy: 82.3

Loss at step 4500: 0.6885926127433777
Training accuracy: 84.4
Validation accuracy: 82.3

Loss at step 5000: 0.688592

### One Layer Neural Network

In [0]:
num_nodes= 1024
batch_size = 128
beta = 0.001

graph = tf.Graph()
with graph.as_default():

    # Input data. For the training data, we use a placeholder that will be fed
    # at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    weights_1 = tf.Variable(tf.truncated_normal([image_size * image_size, num_nodes]))
    biases_1 = tf.Variable(tf.zeros([num_nodes]))
    weights_2 = tf.Variable(tf.truncated_normal([num_nodes, num_labels]))
    biases_2 = tf.Variable(tf.zeros([num_labels]))

    # Training computation.
    logits_1 = tf.matmul(tf_train_dataset, weights_1) + biases_1
    relu_layer= tf.nn.relu(logits_1)
    logits_2 = tf.matmul(relu_layer, weights_2) + biases_2
    
    # Normal loss function
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits_2, labels=tf_train_labels))
    
    # Loss function with L2 Regularization with beta=0.01
    regularizers = tf.nn.l2_loss(weights_1) + tf.nn.l2_loss(weights_2)
    loss = tf.reduce_mean(loss + beta * regularizers)

    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # Predictions for the training
    train_prediction = tf.nn.softmax(logits_2)
    
    # Predictions for validation 
    logits_1 = tf.matmul(tf_valid_dataset, weights_1) + biases_1
    relu_layer= tf.nn.relu(logits_1)
    logits_2 = tf.matmul(relu_layer, weights_2) + biases_2
    
    valid_prediction = tf.nn.softmax(logits_2)
    
    # Predictions for test
    logits_1 = tf.matmul(tf_test_dataset, weights_1) + biases_1
    relu_layer= tf.nn.relu(logits_1)
    logits_2 = tf.matmul(relu_layer, weights_2) + biases_2
    
    test_prediction =  tf.nn.softmax(logits_2)

In [7]:
num_steps = 10001

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print("\nInitialized")
    for step in range(num_steps):
        # Pick an offset within the training data, which has been randomized.
        # Note: we could use better randomization across epochs.
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        
        # Generate a minibatch.
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if (step % 1000 == 0):
            print("\nMinibatch loss at step {}: {}".format(step, l))
            print("Minibatch accuracy: {:.1f}".format(accuracy(predictions, batch_labels)))
            print("Validation accuracy: {:.1f}".format(accuracy(valid_prediction.eval(), valid_labels)))
    print("\nTest accuracy: {:.1f}".format(accuracy(test_prediction.eval(), test_labels)))


Initialized

Minibatch loss at step 0: 569.9794921875
Minibatch accuracy: 12.5
Validation accuracy: 27.5

Minibatch loss at step 1000: 114.4655990600586
Minibatch accuracy: 83.6
Validation accuracy: 81.1

Minibatch loss at step 2000: 41.4669303894043
Minibatch accuracy: 85.9
Validation accuracy: 85.2

Minibatch loss at step 3000: 15.450621604919434
Minibatch accuracy: 88.3
Validation accuracy: 87.3

Minibatch loss at step 4000: 6.124664306640625
Minibatch accuracy: 85.9
Validation accuracy: 88.4

Minibatch loss at step 5000: 2.5495200157165527
Minibatch accuracy: 86.7
Validation accuracy: 88.3

Minibatch loss at step 6000: 1.300624132156372
Minibatch accuracy: 85.9
Validation accuracy: 88.5

Minibatch loss at step 7000: 0.8731435537338257
Minibatch accuracy: 86.7
Validation accuracy: 88.4

Minibatch loss at step 8000: 0.6539442539215088
Minibatch accuracy: 89.1
Validation accuracy: 88.6

Minibatch loss at step 9000: 0.5593432784080505
Minibatch accuracy: 88.3
Validation accuracy: 89.1

### Problem 2

Let's demonstrate an extreme case of overfitting. Restrict your training data to just a few batches. What happens?

In [8]:
num_steps = 5001

train_dataset_2 = train_dataset[:500, :]
train_labels_2 = train_labels[:500]

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print("\nInitialized")
    for step in range(num_steps):
        # Pick an offset within the training data, which has been randomized.
        # Note: we could use better randomization across epochs.
        offset = (step * batch_size) % (train_labels_2.shape[0] - batch_size)
        
        # Generate a minibatch.
        batch_data = train_dataset_2[offset:(offset + batch_size), :]
        batch_labels = train_labels_2[offset:(offset + batch_size), :]
        
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.
        
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 500 == 0):
            print("\nMinibatch loss at step {}: {}".format(step, l))
            print("Minibatch accuracy: {:.1f}".format(accuracy(predictions, batch_labels)))
            print("Validation accuracy: {:.1f}".format(accuracy(valid_prediction.eval(), valid_labels)))
    print("\nTest accuracy: {:.1f}".format(accuracy(test_prediction.eval(), test_labels)))


Initialized

Minibatch loss at step 0: 743.8604736328125
Minibatch accuracy: 9.4
Validation accuracy: 31.9

Minibatch loss at step 500: 190.55657958984375
Minibatch accuracy: 100.0
Validation accuracy: 74.3

Minibatch loss at step 1000: 115.56396484375
Minibatch accuracy: 100.0
Validation accuracy: 74.3

Minibatch loss at step 1500: 70.0843276977539
Minibatch accuracy: 100.0
Validation accuracy: 74.3

Minibatch loss at step 2000: 42.503047943115234
Minibatch accuracy: 100.0
Validation accuracy: 74.4

Minibatch loss at step 2500: 25.776491165161133
Minibatch accuracy: 100.0
Validation accuracy: 74.5

Minibatch loss at step 3000: 15.633467674255371
Minibatch accuracy: 100.0
Validation accuracy: 75.0

Minibatch loss at step 3500: 9.484251022338867
Minibatch accuracy: 100.0
Validation accuracy: 75.7

Minibatch loss at step 4000: 5.758632183074951
Minibatch accuracy: 100.0
Validation accuracy: 76.7

Minibatch loss at step 4500: 3.503783702850342
Minibatch accuracy: 100.0
Validation accurac

High training accuracy, and low validation accuracy - Overfitting.

---

### Problem 3

Introduce Dropout on the hidden layer of the neural network. Remember: Dropout should only be introduced during training, not evaluation, otherwise your evaluation results would be stochastic as well. TensorFlow provides nn.dropout() for that, but you have to make sure it's only inserted during training.


What happens to our extreme overfitting case?

In [0]:
num_nodes= 1024
batch_size = 128
beta = 0.01

graph = tf.Graph()
with graph.as_default():

    # Input data. For the training data, we use a placeholder that will be fed
    # at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    weights_1 = tf.Variable(tf.truncated_normal([image_size * image_size, num_nodes]))
    biases_1 = tf.Variable(tf.zeros([num_nodes]))
    weights_2 = tf.Variable(tf.truncated_normal([num_nodes, num_labels]))
    biases_2 = tf.Variable(tf.zeros([num_labels]))
    
    # Training computation.
    logits_1 = tf.matmul(tf_train_dataset, weights_1) + biases_1
    relu_layer= tf.nn.relu(logits_1)
    
    # Dropout on hidden layer: RELU layer
    keep_prob = tf.placeholder("float")
    relu_layer_dropout = tf.nn.dropout(relu_layer, keep_prob)
    
    logits_2 = tf.matmul(relu_layer_dropout, weights_2) + biases_2
    
    # Normal loss function
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits_2, labels=tf_train_labels))
    
    # Loss function with L2 Regularization with beta=0.01
    regularizers = tf.nn.l2_loss(weights_1) + tf.nn.l2_loss(weights_2)
    loss = tf.reduce_mean(loss + beta * regularizers)

    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # Predictions for the training
    train_prediction = tf.nn.softmax(logits_2)
    
    # Predictions for validation 
    logits_1 = tf.matmul(tf_valid_dataset, weights_1) + biases_1
    relu_layer= tf.nn.relu(logits_1)
    logits_2 = tf.matmul(relu_layer, weights_2) + biases_2
    
    valid_prediction = tf.nn.softmax(logits_2)
    
    # Predictions for test
    logits_1 = tf.matmul(tf_test_dataset, weights_1) + biases_1
    relu_layer= tf.nn.relu(logits_1)
    logits_2 = tf.matmul(relu_layer, weights_2) + biases_2
    
    test_prediction =  tf.nn.softmax(logits_2)

In [10]:
num_steps = 5001

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print("\nInitialized")
    
    for step in range(num_steps):
        # Pick an offset within the training data, which has been randomized.
        # Note: we could use better randomization across epochs.
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        
        # Generate a minibatch.
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, keep_prob : 0.5}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if (step % 500 == 0):
            print("\nMinibatch loss at step {}: {}".format(step, l))
            print("Minibatch accuracy: {:.1f}".format(accuracy(predictions, batch_labels)))
            print("Validation accuracy: {:.1f}".format(accuracy(valid_prediction.eval(), valid_labels)))
    print("\nTest accuracy: {:.1f}".format(accuracy(test_prediction.eval(), test_labels)))


Initialized

Minibatch loss at step 0: 3628.618408203125
Minibatch accuracy: 7.8
Validation accuracy: 30.8

Minibatch loss at step 500: 21.480159759521484
Minibatch accuracy: 81.2
Validation accuracy: 83.9

Minibatch loss at step 1000: 1.063875675201416
Minibatch accuracy: 81.2
Validation accuracy: 84.1

Minibatch loss at step 1500: 0.9375864267349243
Minibatch accuracy: 77.3
Validation accuracy: 83.4

Minibatch loss at step 2000: 0.796303391456604
Minibatch accuracy: 85.2
Validation accuracy: 83.5

Minibatch loss at step 2500: 0.8694121241569519
Minibatch accuracy: 80.5
Validation accuracy: 83.1

Minibatch loss at step 3000: 0.7428845167160034
Minibatch accuracy: 85.9
Validation accuracy: 83.7

Minibatch loss at step 3500: 0.7343540191650391
Minibatch accuracy: 83.6
Validation accuracy: 83.0

Minibatch loss at step 4000: 0.9939125180244446
Minibatch accuracy: 80.5
Validation accuracy: 82.6

Minibatch loss at step 4500: 0.7109951376914978
Minibatch accuracy: 84.4
Validation accuracy: 

#### Extreme Overfitting

In [11]:
num_steps = 5001
beta = 0.01

train_dataset_2 = train_dataset[:500, :]
train_labels_2 = train_labels[:500]

with tf.Session(graph=graph) as session:
    
    tf.initialize_all_variables().run()
    print("\nInitialized")
    
    for step in range(num_steps):
        # Pick an offset within the training data, which has been randomized.
        # Note: we could use better randomization across epochs.
        offset = (step * batch_size) % (train_labels_2.shape[0] - batch_size)
        
        # Generate a minibatch.
        batch_data = train_dataset_2[offset:(offset + batch_size), :]
        batch_labels = train_labels_2[offset:(offset + batch_size), :]
        
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels,  keep_prob : 0.5}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if (step % 500 == 0):
            print("\nMinibatch loss at step {}: {}".format(step, l))
            print("Minibatch accuracy: {:.1f}".format(accuracy(predictions, batch_labels)))
            print("Validation accuracy: {:.1f}".format(accuracy(valid_prediction.eval(), valid_labels)))
    print("\nTest accuracy: {:.1f}".format(accuracy(test_prediction.eval(), test_labels)))


Initialized

Minibatch loss at step 0: 3669.89697265625
Minibatch accuracy: 8.6
Validation accuracy: 30.8

Minibatch loss at step 500: 21.113550186157227
Minibatch accuracy: 100.0
Validation accuracy: 78.4

Minibatch loss at step 1000: 0.49936801195144653
Minibatch accuracy: 100.0
Validation accuracy: 78.2

Minibatch loss at step 1500: 0.31613820791244507
Minibatch accuracy: 100.0
Validation accuracy: 78.6

Minibatch loss at step 2000: 0.3108214735984802
Minibatch accuracy: 99.2
Validation accuracy: 78.0

Minibatch loss at step 2500: 0.29215267300605774
Minibatch accuracy: 100.0
Validation accuracy: 78.1

Minibatch loss at step 3000: 0.28601858019828796
Minibatch accuracy: 100.0
Validation accuracy: 78.6

Minibatch loss at step 3500: 0.29813283681869507
Minibatch accuracy: 100.0
Validation accuracy: 78.4

Minibatch loss at step 4000: 0.2962633967399597
Minibatch accuracy: 100.0
Validation accuracy: 78.3

Minibatch loss at step 4500: 0.29470157623291016
Minibatch accuracy: 100.0
Valida

### Problem 4

Try to get the best performance you can using a multi-layer model! The best reported test accuracy using a deep network is 97.1%.

One avenue you can explore is to add multiple layers.

Another one is to use learning rate decay:

> `global_step = tf.Variable(0)  # count the number of steps taken.`

> `learning_rate = tf.train.exponential_decay(0.5, global_step, ...)`

> `optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)`

#### Model -


*   3 Hidden Layers (RELU)
*   L2 Regularization
*   Learning Rate Decay (Exponential)
*   Dropout





In [0]:
import math as math

In [0]:
batch_size = 128
beta = 0.001

hidden_nodes_1 = 1024
hidden_nodes_2 = int(1024 * 0.5)
hidden_nodes_3 = int(1024 * np.power(0.5, 2))

graph = tf.Graph()
with graph.as_default():
  
  '''INPUT DATA'''
  # For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  '''VARIABLES'''
  
  # Hidden RELU 1
  weights_1 = tf.Variable(tf.truncated_normal([image_size * image_size, hidden_nodes_1], stddev= math.sqrt(2.0/(image_size * image_size))))
  biases_1 = tf.Variable(tf.zeros([hidden_nodes_1]))
  
  # Hidden RELU 2
  weights_2 = tf.Variable(tf.truncated_normal([hidden_nodes_1, hidden_nodes_2], stddev= math.sqrt(2.0/(hidden_nodes_1))))
  biases_2 = tf.Variable(tf.zeros([hidden_nodes_2]))
  
  # Hidden RELU 1
  weights_3 = tf.Variable(tf.truncated_normal([hidden_nodes_2, hidden_nodes_3], stddev= math.sqrt(2.0/(hidden_nodes_2))))
  biases_3 = tf.Variable(tf.zeros([hidden_nodes_3]))
  
  # OUTPUT LAYER
  weights_o = tf.Variable(tf.truncated_normal([hidden_nodes_3, num_labels], stddev=math.sqrt(2.0/(hidden_nodes_3))))
  biases_o = tf.Variable(tf.zeros([num_labels]))
  
  '''TRAINING'''
  
  keep_prob = tf.placeholder("float")
  
  # Hidden RELU 1
  logits_1 = tf.matmul(tf_train_dataset, weights_1) + biases_1
  hidden_1 = tf.nn.relu(logits_1)
  hidden_dropout_1 = tf.nn.dropout(hidden_1, keep_prob)
  
  # Hidden RELU 2
  logits_2 = tf.matmul(hidden_dropout_1, weights_2) + biases_2
  hidden_2 = tf.nn.relu(logits_2)
  hidden_dropout_2 = tf.nn.dropout(hidden_2, keep_prob)
  
  # Hidden RELU 2
  logits_3 = tf.matmul(hidden_dropout_2, weights_3) + biases_3
  hidden_3 = tf.nn.relu(logits_3)
  hidden_dropout_3 = tf.nn.dropout(hidden_3, keep_prob)
  
  # Output Layer
  logits_o = tf.matmul(hidden_dropout_3, weights_o) + biases_o
  
  '''LOSS FUNCTION'''
  loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits_o, labels=tf_train_labels))
  
  '''L2 REGULARIZATION'''
  reg = tf.nn.l2_loss(weights_1) + tf.nn.l2_loss(weights_2) + tf.nn.l2_loss(weights_3) + tf.nn.l2_loss(weights_o)
  loss = tf.reduce_mean(loss + beta * reg)
  
  '''OPTIMIZER'''
  # Decaying Learning Rate
  global_step = tf.Variable(0)
  start_alpha = 0.5
  alpha = tf.train.exponential_decay(start_alpha, global_step, 100000, 0.96, staircase=True)
  optimizer = tf.train.GradientDescentOptimizer(alpha).minimize(loss, global_step)
  
  # Predictions for the training
  train_prediction = tf.nn.softmax(logits_o)
  
  '''VALIDATION'''
  valid_logits_1 = tf.matmul(tf_valid_dataset, weights_1) + biases_1
  valid_relu_1 = tf.nn.relu(valid_logits_1)

  valid_logits_2 = tf.matmul(valid_relu_1, weights_2) + biases_2
  valid_relu_2 = tf.nn.relu(valid_logits_2)

  valid_logits_3 = tf.matmul(valid_relu_2, weights_3) + biases_3
  valid_relu_3 = tf.nn.relu(valid_logits_3)

  valid_logits_o = tf.matmul(valid_relu_3, weights_o) + biases_o

  valid_prediction = tf.nn.softmax(valid_logits_o)
  
  '''TESTING'''
  test_logits_1 = tf.matmul(tf_test_dataset, weights_1) + biases_1
  test_relu_1 = tf.nn.relu(test_logits_1)

  test_logits_2 = tf.matmul(test_relu_1, weights_2) + biases_2
  test_relu_2 = tf.nn.relu(test_logits_2)

  test_logits_3 = tf.matmul(test_relu_2, weights_3) + biases_3
  test_relu_3 = tf.nn.relu(test_logits_3)

  test_logits_o = tf.matmul(test_relu_3, weights_o) + biases_o

  test_prediction = tf.nn.softmax(test_logits_o)

In [14]:
num_steps = 30001

with tf.Session(graph=graph) as session:
    
    tf.initialize_all_variables().run()
    print("\nInitialized")
    
    for step in range(num_steps):
        # Pick an offset within the training data, which has been randomized.
        # Note: we could use better randomization across epochs.
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        
        # Generate a minibatch.
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, keep_prob : 0.5}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        
        if (step % 3000 == 0):
            print("\nMinibatch loss at step {}: {}".format(step, l))
            print("Minibatch accuracy: {:.1f}".format(accuracy(predictions, batch_labels)))
            print("Validation accuracy: {:.1f}".format(accuracy(valid_prediction.eval(), valid_labels)))
    print("\nTest accuracy: {:.1f}".format(accuracy(test_prediction.eval(), test_labels)))


Initialized

Minibatch loss at step 0: 4.088729381561279
Minibatch accuracy: 8.6
Validation accuracy: 25.6

Minibatch loss at step 3000: 0.7149103879928589
Minibatch accuracy: 83.6
Validation accuracy: 87.4

Minibatch loss at step 6000: 0.6740361452102661
Minibatch accuracy: 85.9
Validation accuracy: 87.7

Minibatch loss at step 9000: 0.6119821667671204
Minibatch accuracy: 88.3
Validation accuracy: 88.0

Minibatch loss at step 12000: 0.6812489032745361
Minibatch accuracy: 84.4
Validation accuracy: 87.8

Minibatch loss at step 15000: 0.6741139888763428
Minibatch accuracy: 85.2
Validation accuracy: 88.2

Minibatch loss at step 18000: 0.6329600811004639
Minibatch accuracy: 85.9
Validation accuracy: 88.2

Minibatch loss at step 21000: 0.6186057329177856
Minibatch accuracy: 86.7
Validation accuracy: 87.7

Minibatch loss at step 24000: 0.8241613507270813
Minibatch accuracy: 80.5
Validation accuracy: 88.5

Minibatch loss at step 27000: 0.5892763733863831
Minibatch accuracy: 89.8
Validation a