# Fully Connected Naural Networks - Regularization/Dropout - No Convolutions

In [28]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
import pandas as pd
from six.moves import cPickle as pickle

First reload the data we generated in __notMNIST_nonTensorFlow_comparisons.ipynb__.

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a shape that's more adapted to the models we're going to train:

* data as a flat matrix,
* labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 1 to [0.0, 1.0, 0.0 ...], 2 to [0.0, 0.0, 1.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)


In [4]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

## Dropout
Let's introduce Dropout on the hidden layers of the neural networks. 
Remember: __Dropout should only be introduced during training, not evaluation__, otherwise your evaluation results would be stochastic as well. 
__TensorFlow provides nn.dropout() for that, but we have to make sure it's only inserted during training__.

### `tf.nn.dropout(x, keep_prob, noise_shape=None, seed=None, name=None)` {#dropout}

Computes dropout.

With probability `keep_prob`, outputs the input element scaled up by
`1 / keep_prob`, otherwise outputs `0`.  The scaling is so that the expected
sum is unchanged.

By default, each element is kept or dropped independently.  If `noise_shape`
is specified, it must be
[broadcastable](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
to the shape of `x`, and only dimensions with `noise_shape[i] == shape(x)[i]`
will make independent decisions.  For example, if `shape(x) = [k, l, m, n]`
and `noise_shape = [k, 1, 1, n]`, each batch and channel component will be
kept independently and each row and column will be kept or not kept together.

##### Args:


*  <b>`x`</b>: A tensor.
*  <b>`keep_prob`</b>: A scalar `Tensor` with the same type as x. The probability
    that each element is kept.
*  <b>`noise_shape`</b>: A 1-D `Tensor` of type `int32`, representing the
    shape for randomly generated keep/drop flags.
*  <b>`seed`</b>: A Python integer. Used to create random seeds. See
    [`set_random_seed`](../../api_docs/python/constant_op.md#set_random_seed)
    for behavior.
*  <b>`name`</b>: A name for this operation (optional).

##### Returns:

  A Tensor of the same shape of `x`.

##### Raises:


*  <b>`ValueError`</b>: If `keep_prob` is not in `(0, 1]`.
                                                   
### HowTo
We create a placeholder for the probability that a neuron's output is kept during dropout. 
This allows us to turn dropout on during training, and turn it off during testing. 
TensorFlow's tf.nn.dropout op automatically handles scaling neuron outputs in addition to masking them, so dropout just works without any additional scaling

Further details: https://www.tensorflow.org/versions/r0.11/tutorials/mnist/pros/ 

### Neural Networks models: 1 hidden layer

In [16]:
import math 

def create_nn1_model_dropout_and_run(graph,
                         train_dataset,
                         train_labels,
                         valid_dataset,
                         valid_labels,
                         test_dataset,
                         test_labels,
                         dropout,
                         num_steps,
                         hidden_size = 1024, 
                         num_labels=10,batch_size = 128):
    
    uniMax = 1/math.sqrt(hidden_size)
    
    with graph.as_default():
      # Input data. For the training data, we use a placeholder that will be fed
      # at run time with a training minibatch.
      tf_train_dataset = tf.placeholder(tf.float32,shape=(batch_size, image_size * image_size))
      tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
        
      tf_valid_dataset = tf.constant(valid_dataset)
      tf_test_dataset = tf.constant(test_dataset)

      # Hidden 1
      weights_1 = tf.Variable(tf.random_uniform([image_size * image_size, hidden_size], minval=-uniMax, maxval=uniMax),
                             name='weights_1')
      biases_1 = tf.Variable(tf.random_uniform([hidden_size],minval=-uniMax, maxval=uniMax),name='biases_1')
      hidden_1 = tf.nn.relu(tf.matmul(tf_train_dataset, weights_1) + biases_1)
      
      if dropout>0: 
        dropped = tf.nn.dropout(hidden_1, dropout)
      else:
        dropped = hidden_1

      # Softmax 
      weights_2 = tf.Variable(tf.random_uniform([hidden_size, num_labels],minval=-uniMax, maxval=uniMax), name='weights_2')
      biases_2 = tf.Variable(tf.random_uniform([num_labels],minval=-uniMax, maxval=uniMax),name='biases_2')
      logits = tf.matmul(dropped, weights_2) + biases_2

      # 
      loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))

      # Optimizer.
      global_step = tf.Variable(0)  # count the number of steps taken.
      learning_rate = tf.train.exponential_decay(0.5, global_step, 100000, 0.96, staircase=True)
      optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
      #optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)


      # Predictions for the training, validation, and test data.
      train_prediction = tf.nn.softmax(logits)
    
      valid_prediction = tf.nn.softmax(
        tf.matmul(tf.nn.relu(tf.matmul(tf_valid_dataset, weights_1) + biases_1), weights_2) + biases_2)
      test_prediction = tf.nn.softmax(
        tf.matmul(tf.nn.relu(tf.matmul(tf_test_dataset, weights_1) + biases_1), weights_2) + biases_2)

    test_accuracy = 0
    with tf.Session(graph=graph) as session:
        tf.global_variables_initializer().run()
        print("Initialized")
        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            
            batch_data = train_dataset[offset:(offset + batch_size), :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
           
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
            _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
            
            if (step % 500 == 0):
              print("Minibatch loss at step %d: %f" % (step, l))
              print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
              print("Validation accuracy: %.1f%%" % accuracy(
              valid_prediction.eval(), valid_labels))
              test_accuracy = accuracy(test_prediction.eval(), test_labels)
              print("Test accuracy: %.1f%%" % test_accuracy)
    return test_accuracy

In [17]:
num_steps = 3001

keep_probs = [0, 0.3,0.4, 0.5, 0.6,0.7]
test_accuracy = np.zeros(len(keep_probs))
i = 0
for keep_prob in keep_probs:
  print("\n>>>>>>>>>> keep_prob: %f" % keep_prob)
  graph = tf.Graph()
  test_accuracy[i] = create_nn1_model_dropout_and_run(graph,
                         train_dataset,
                         train_labels,
                         valid_dataset,
                         valid_labels,
                         test_dataset,
                         test_labels,
                         keep_prob,
                         num_steps)
   
  i = i +1


>>>>>>>>>> keep_prob: 0.000000
Initialized
Minibatch loss at step 0: 2.315333
Minibatch accuracy: 8.6%
Validation accuracy: 34.9%
Test accuracy: 37.2%
Minibatch loss at step 500: 0.357309
Minibatch accuracy: 90.6%
Validation accuracy: 85.5%
Test accuracy: 91.8%
Minibatch loss at step 1000: 0.540357
Minibatch accuracy: 85.2%
Validation accuracy: 86.3%
Test accuracy: 92.7%
Minibatch loss at step 1500: 0.293043
Minibatch accuracy: 92.2%
Validation accuracy: 87.5%
Test accuracy: 93.9%
Minibatch loss at step 2000: 0.282493
Minibatch accuracy: 93.8%
Validation accuracy: 87.8%
Test accuracy: 94.2%
Minibatch loss at step 2500: 0.339409
Minibatch accuracy: 89.8%
Validation accuracy: 87.9%
Test accuracy: 94.2%
Minibatch loss at step 3000: 0.354558
Minibatch accuracy: 89.8%
Validation accuracy: 88.2%
Test accuracy: 94.3%

>>>>>>>>>> keep_prob: 0.300000
Initialized
Minibatch loss at step 0: 2.343650
Minibatch accuracy: 6.2%
Validation accuracy: 37.4%
Test accuracy: 40.7%
Minibatch loss at step 50

In [19]:
print("*** Best keep_prob:"+str(keep_probs[np.argmax(test_accuracy)])+ " -- accuracy:" + str(test_accuracy[np.argmax(test_accuracy)]))

*** Best keep_prob:0 -- accuracy:94.26


__We did not get an improvement in test accuracy__ by using dropout as the best accuracy occours for 
keep_prob=0 that means no dropout.  

### Neural Networks models: 2 hidden layers

In [66]:
def create_nn2_model_dropout_and_run(graph,
                         train_dataset,
                         train_labels,
                         valid_dataset,
                         valid_labels,
                         test_dataset,
                         test_labels,
                         dropout_vect,
                         num_steps,
                         hidden_size = 1024, 
                         num_labels=10,batch_size = 128):
    
    assert dropout_vect.shape == (2,)
    
    uniMax = 1/math.sqrt(hidden_size)
    
    with graph.as_default():
      # Input data. For the training data, we use a placeholder that will be fed
      # at run time with a training minibatch.
      tf_train_dataset = tf.placeholder(tf.float32,shape=(batch_size, image_size * image_size))
      tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
        
      tf_valid_dataset = tf.constant(valid_dataset)
      tf_test_dataset = tf.constant(test_dataset)

      # Hidden 1
      weights_1 = tf.Variable(tf.random_uniform([image_size * image_size, hidden_size], minval=-uniMax, maxval=uniMax),
                             name='weights_1')
      biases_1 = tf.Variable(tf.random_uniform([hidden_size],minval=-uniMax, maxval=uniMax),name='biases_1')
      hidden_1 = tf.nn.relu(tf.matmul(tf_train_dataset, weights_1) + biases_1)
      
      if dropout_vect[0]>0: 
        dropped_1 = tf.nn.dropout(hidden_1, dropout_vect[0])
      else:
        dropped_1 = hidden_1
    
      # Hidden 2
      weights_2 = tf.Variable(tf.random_uniform([hidden_size, hidden_size], minval=-uniMax, maxval=uniMax),name='weights_2')
      biases_2 = tf.Variable(tf.random_uniform([hidden_size],minval=-uniMax, maxval=uniMax),name='biases_2')
      hidden_2 = tf.nn.relu(tf.matmul(dropped_1, weights_2) + biases_2)
    
      if dropout_vect[1]>0: 
        dropped_2 = tf.nn.dropout(hidden_2, dropout_vect[1])
      else:
        dropped_2 = hidden_2
        
      # Softmax 
      weights_3 = tf.Variable(tf.random_uniform([hidden_size, num_labels],minval=-uniMax, maxval=uniMax), name='weights_3')
      biases_3 = tf.Variable(tf.random_uniform([num_labels],minval=-uniMax, maxval=uniMax),name='biases_3')
      logits = tf.matmul(dropped_2, weights_3) + biases_3

      # 
      loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))

      # Optimizer.
      global_step = tf.Variable(0)  # count the number of steps taken.
      learning_rate = tf.train.exponential_decay(0.5, global_step, 100000, 0.96, staircase=True)
      optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
      #optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)


      # Predictions for the training, validation, and test data.
      train_prediction = tf.nn.softmax(logits)
    
      valid_prediction = tf.nn.softmax(
        tf.matmul(tf.nn.relu(tf.matmul(tf.nn.relu(tf.matmul(tf_valid_dataset, weights_1) + biases_1), weights_2) + biases_2),
                  weights_3) + biases_3)
      test_prediction = tf.nn.softmax(
        tf.matmul(tf.nn.relu(tf.matmul(tf.nn.relu(tf.matmul(tf_test_dataset, weights_1) + biases_1), weights_2) + biases_2), 
                   weights_3) + biases_3)

    test_accuracy = 0
    with tf.Session(graph=graph) as session:
        tf.global_variables_initializer().run()
        print("Initialized")
        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            
            batch_data = train_dataset[offset:(offset + batch_size), :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
           
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
            _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
            
            if (step % 500 == 0):
              print("Minibatch loss at step %d: %f" % (step, l))
              print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
              print("Validation accuracy: %.1f%%" % accuracy(
              valid_prediction.eval(), valid_labels))
              test_accuracy = accuracy(test_prediction.eval(), test_labels)
              print("Test accuracy: %.1f%%" % test_accuracy)
    return test_accuracy

In [67]:
keep_probs = [0, 0.3,0.4, 0.5, 0.6,0.7]
tuneGrid = pd.DataFrame.from_records([(kp1,kp2,0) for kp1 in keep_probs for kp2 in keep_probs],
                          columns=['drop_1','drop_2','test_accuracy'])
#tuneGrid.head()
for i in range(0,tuneGrid.shape[0]):
  drop_1 , drop_2 = tuneGrid.iloc[i,0] , tuneGrid.iloc[i,1]
  print("\n>>>>>>>>>> keep_prob_1: %f ---- keep_prob_2: %f" % (drop_1 , drop_2))
  graph = tf.Graph()
  tuneGrid.iloc[i,2] = create_nn2_model_dropout_and_run(graph,
                         train_dataset,
                         train_labels,
                         valid_dataset,
                         valid_labels,
                         test_dataset,
                         test_labels,
                         np.array([drop_1,drop_2]),
                         num_steps)


>>>>>>>>>> keep_prob_1: 0.000000 ---- keep_prob_2: 0.000000
Initialized
Minibatch loss at step 0: 2.312819
Minibatch accuracy: 6.2%
Validation accuracy: 31.7%
Test accuracy: 34.7%
Minibatch loss at step 500: 0.345798
Minibatch accuracy: 89.1%
Validation accuracy: 85.5%
Test accuracy: 92.2%
Minibatch loss at step 1000: 0.479616
Minibatch accuracy: 85.2%
Validation accuracy: 86.7%
Test accuracy: 93.1%
Minibatch loss at step 1500: 0.254761
Minibatch accuracy: 90.6%
Validation accuracy: 88.2%
Test accuracy: 94.0%
Minibatch loss at step 2000: 0.240032
Minibatch accuracy: 93.8%
Validation accuracy: 88.3%
Test accuracy: 94.7%
Minibatch loss at step 2500: 0.304383
Minibatch accuracy: 90.6%
Validation accuracy: 89.0%
Test accuracy: 95.0%
Minibatch loss at step 3000: 0.336706
Minibatch accuracy: 88.3%
Validation accuracy: 89.1%
Test accuracy: 94.9%

>>>>>>>>>> keep_prob_1: 0.000000 ---- keep_prob_2: 0.300000
Initialized
Minibatch loss at step 0: 2.302322
Minibatch accuracy: 6.2%
Validation accu

In [70]:
tuneGrid.sort_values(by=['test_accuracy'],ascending=[False]).head(10)

Unnamed: 0,drop_1,drop_2,test_accuracy
0,0.0,0.0,94.93
3,0.0,0.5,94.87
5,0.0,0.7,94.79
30,0.7,0.0,94.64
4,0.0,0.6,94.58
24,0.6,0.0,94.53
35,0.7,0.7,94.5
1,0.0,0.3,94.47
28,0.6,0.6,94.46
29,0.6,0.7,94.44


__We did not get an improvement in test accuracy by using dropout__ as the best accuracy occours for keep_prob=0 both for hidden layer 1 and hidden layer 2 that means no dropout.

## Neural Networks models: 3 hidden layers

In [75]:
def create_nn3_model_dropout_and_run(graph,
                         train_dataset,
                         train_labels,
                         valid_dataset,
                         valid_labels,
                         test_dataset,
                         test_labels,
                         dropout_vect,
                         num_steps,
                         hidden_size = 1024, 
                         num_labels=10,batch_size = 128):
    
    assert dropout_vect.shape == (3,)
    
    uniMax = 1/math.sqrt(hidden_size)
    
    with graph.as_default():
      # Input data. For the training data, we use a placeholder that will be fed
      # at run time with a training minibatch.
      tf_train_dataset = tf.placeholder(tf.float32,shape=(batch_size, image_size * image_size))
      tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
        
      tf_valid_dataset = tf.constant(valid_dataset)
      tf_test_dataset = tf.constant(test_dataset)

      # Hidden 1
      weights_1 = tf.Variable(tf.random_uniform([image_size * image_size, hidden_size], minval=-uniMax, maxval=uniMax),
                             name='weights_1')
      biases_1 = tf.Variable(tf.random_uniform([hidden_size],minval=-uniMax, maxval=uniMax),name='biases_1')
      hidden_1 = tf.nn.relu(tf.matmul(tf_train_dataset, weights_1) + biases_1)
      
      if dropout_vect[0]>0: 
        dropped_1 = tf.nn.dropout(hidden_1, dropout_vect[0])
      else:
        dropped_1 = hidden_1
    
      # Hidden 2
      weights_2 = tf.Variable(tf.random_uniform([hidden_size, hidden_size], minval=-uniMax, maxval=uniMax),name='weights_2')
      biases_2 = tf.Variable(tf.random_uniform([hidden_size],minval=-uniMax, maxval=uniMax),name='biases_2')
      hidden_2 = tf.nn.relu(tf.matmul(dropped_1, weights_2) + biases_2)
    
      if dropout_vect[1]>0: 
        dropped_2 = tf.nn.dropout(hidden_2, dropout_vect[1])
      else:
        dropped_2 = hidden_2
    
      # Hidden 3
      weights_3 = tf.Variable(tf.random_uniform([hidden_size, hidden_size], minval=-uniMax, maxval=uniMax),name='weights_3')
      biases_3 = tf.Variable(tf.random_uniform([hidden_size],minval=-uniMax, maxval=uniMax),name='biases_3')
      hidden_3 = tf.nn.relu(tf.matmul(dropped_2, weights_3) + biases_3)
    
      if dropout_vect[2]>0: 
        dropped_3 = tf.nn.dropout(hidden_3, dropout_vect[2])
      else:
        dropped_3 = hidden_3
        
      # Softmax 
      weights_4 = tf.Variable(tf.random_uniform([hidden_size, num_labels],minval=-uniMax, maxval=uniMax), name='weights_4')
      biases_4 = tf.Variable(tf.random_uniform([num_labels],minval=-uniMax, maxval=uniMax),name='biases_4')
      logits = tf.matmul(dropped_3, weights_4) + biases_4

      # 
      loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))

      # Optimizer.
      global_step = tf.Variable(0)  # count the number of steps taken.
      learning_rate = tf.train.exponential_decay(0.5, global_step, 100000, 0.96, staircase=True)
      optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
      #optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)


      # Predictions for the training, validation, and test data.
      train_prediction = tf.nn.softmax(logits)
    
      valid_prediction = tf.nn.softmax(
        tf.matmul(tf.nn.relu(tf.matmul(tf.nn.relu(tf.matmul(tf.nn.relu(tf.matmul(tf_valid_dataset, weights_1) + biases_1), weights_2) + biases_2),
                  weights_3) + biases_3), weights_3) + biases_3)
      test_prediction = tf.nn.softmax(
        tf.matmul(tf.nn.relu(tf.matmul(tf.nn.relu(tf.matmul(tf.nn.relu(tf.matmul(tf_test_dataset, weights_1) + biases_1), weights_2) + biases_2), 
                   weights_3) + biases_3), weights_3) + biases_3)

    test_accuracy = 0
    with tf.Session(graph=graph) as session:
        tf.global_variables_initializer().run()
        print("Initialized")
        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            
            batch_data = train_dataset[offset:(offset + batch_size), :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
           
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
            _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
            
            if (step % 500 == 0):
              print("Minibatch loss at step %d: %f" % (step, l))
              print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
              print("Validation accuracy: %.1f%%" % accuracy(
              valid_prediction.eval(), valid_labels))
              test_accuracy = accuracy(test_prediction.eval(), test_labels)
              print("Test accuracy: %.1f%%" % test_accuracy)
    return test_accuracy

In [79]:
keep_probs = [0, 0.5, 0.7]
tuneGrid = pd.DataFrame.from_records([(kp1,kp2,kp3,0) for kp1 in keep_probs for kp2 in keep_probs for kp3 in keep_probs],
                          columns=['drop_1','drop_2','drop_3','test_accuracy'])
#tuneGrid.head()
for i in range(0,tuneGrid.shape[0]):
  drop_1 , drop_2 , drop_3 = tuneGrid.iloc[i,0] , tuneGrid.iloc[i,1] , tuneGrid.iloc[i,2]
  print("\n>>>>>>>>>> keep_prob_1: %f ---- keep_prob_2: %f ---- keep_prob_3: %f" % (drop_1 , drop_2, drop_3))
  graph = tf.Graph()
  tuneGrid.iloc[i,3] = create_nn2_model_dropout_and_run(graph,
                         train_dataset,
                         train_labels,
                         valid_dataset,
                         valid_labels,
                         test_dataset,
                         test_labels,
                         np.array([drop_1,drop_2]),
                         num_steps)


>>>>>>>>>> keep_prob_1: 0.000000 ---- keep_prob_2: 0.000000 ---- keep_prob_3: 0.000000
Initialized
Minibatch loss at step 0: 2.307701
Minibatch accuracy: 8.6%
Validation accuracy: 37.3%
Test accuracy: 40.4%
Minibatch loss at step 500: 0.360584
Minibatch accuracy: 89.8%
Validation accuracy: 85.6%
Test accuracy: 92.3%
Minibatch loss at step 1000: 0.469022
Minibatch accuracy: 84.4%
Validation accuracy: 86.7%
Test accuracy: 92.9%
Minibatch loss at step 1500: 0.253226
Minibatch accuracy: 91.4%
Validation accuracy: 88.2%
Test accuracy: 94.1%
Minibatch loss at step 2000: 0.240038
Minibatch accuracy: 93.8%
Validation accuracy: 88.5%
Test accuracy: 94.6%
Minibatch loss at step 2500: 0.289389
Minibatch accuracy: 90.6%
Validation accuracy: 89.2%
Test accuracy: 95.1%
Minibatch loss at step 3000: 0.322523
Minibatch accuracy: 88.3%
Validation accuracy: 89.2%
Test accuracy: 95.1%

>>>>>>>>>> keep_prob_1: 0.000000 ---- keep_prob_2: 0.000000 ---- keep_prob_3: 0.500000
Initialized
Minibatch loss at ste

In [80]:
tuneGrid.sort_values(by=['test_accuracy'],ascending=[False]).head(10)

Unnamed: 0,drop_1,drop_2,drop_3,test_accuracy
1,0.0,0.0,0.5,95.15
0,0.0,0.0,0.0,95.11
2,0.0,0.0,0.7,95.07
8,0.0,0.7,0.7,95.02
6,0.0,0.7,0.0,94.84
5,0.0,0.5,0.7,94.83
18,0.7,0.0,0.0,94.82
24,0.7,0.7,0.0,94.74
3,0.0,0.5,0.0,94.73
7,0.0,0.7,0.5,94.72


__We did not get an improvement in test accuracy by using dropout__ as although the best accuracy occours for keep_prob = 0 for hidden layers 1 and 2 and keep_prob = 0.5 for layers 3 the difference with the second best combination that is the no dropout option for all hidden layers is very tiny. 

## Conclusions 

* __Neural Networks models: 1 hidden layer__: __dropout not effective__
  * We did not get an improvement in test accuracy by using dropout as the best accuracy occours for keep_prob=0 that means no dropout
* __Neural Networks models: 2 hidden layers__: __dropout not effective__
  * We did not get an improvement in test accuracy by using dropout as the best accuracy occours for keep_prob=0 both for hidden layer 1 and hidden layer 2 that means no dropout
* __Neural Networks models: 3 hidden layers__: __dropout not effective__
  * We did not get an improvement in test accuracy by using dropout as although the best accuracy occours for keep_prob = 0 for hidden layers 1 and 2 and keep_prob = 0.5 for layers 3 the difference with the second best combination that is the no dropout option for all hidden layers is very tiny.

Also, __considering at most 3 hidden layers, the best model comes without L2 regularization and/or dropout with a test accuracy ~95.1%__