<h1 style="text-align:center">Deep Learning   </h1>
<h1 style="text-align:center"> Lab Session 2 - 1.5 Hours </h1>
<h1 style="text-align:center"> Convolutional Neural Network (CNN) for Handwritten Digits Recognition</h1>

- Amyn KASSARA
- Anas BOUZAFOUR

The aim of this session is to practice with Convolutional Neural Networks. Each group should fill and run appropriate notebook cells. 

Follow instructions step by step until the end and submit your complete notebook as an archive (tar -cf groupXnotebook.tar DL_lab2/).

Do not forget to run all your cells before generating your final report and do not forget to include the names of all participants in the group. The lab session should be completed by May 29th 2019 (23:59:59 CET).

# Introduction

In the last Lab Session, you built a Multilayer Perceptron for recognizing hand-written digits from the MNIST data-set. The best achieved accuracy on testing data was about 97%.  Can  you do better than these results using a deep CNN ?
In this Lab Session, you will build, train and optimize in TensorFlow one of the early Convolutional Neural Networks,  **LeNet-5**, to go to  more than 99% of accuracy. 






# Load MNIST Data in TensorFlow
Run the cell below to load the MNIST data that comes with TensorFlow. You will use this data in **Section 1** and **Section 2**.

In [2]:
from time import time
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
X_train, y_train           = mnist.train.images, mnist.train.labels
X_validation, y_validation = mnist.validation.images, mnist.validation.labels
X_test, y_test             = mnist.test.images, mnist.test.labels
print("Image Shape: {}".format(X_train[0].shape))
print("Training Set:   {} samples".format(len(X_train)))
print("Validation Set: {} samples".format(len(X_validation)))
print("Test Set:       {} samples".format(len(X_test)))

epsilon = 1e-10 # this is a parameter you will use later

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Image Shape: (784,)
Training Set:   55000 samples
Validation Set: 5000 samples
Test Set:       10000 samples


# Section 1 : My First Model in TensorFlow

Before starting with CNN, let's train and test in TensorFlow the example
**y=softmax(Wx+b)** seen in the first lab. 

This model reaches an accuracy of about 92 %.
You will also learn how to launch the TensorBoard https://www.tensorflow.org/get_started/summaries_and_tensorboard to visualize the computation graph, statistics and learning curves. 

<b> Part 1 </b> : Read carefully the code in the cell below. Run it to perform training. 

In [3]:
#STEP 1

# Parameters
learning_rate = 0.01
training_epochs = 40
batch_size = 128
display_step = 1
logs_path = 'log_files/'  # useful for tensorboard

# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, [None, 784], name='InputData')
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.float32, [None, 10], name='LabelData')

# Set model weights
W = tf.Variable(tf.zeros([784, 10]), name='Weights')
b = tf.Variable(tf.zeros([10]), name='Bias')

# Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
with tf.name_scope('Model'):
    # Model
    pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
with tf.name_scope('Loss'):
    # Minimize error using cross entropy
    # We use tf.clip_by_value to avoid having too low numbers in the log function
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))
with tf.name_scope('SGD'):
    # Gradient Descent
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
with tf.name_scope('Accuracy'):
    # Accuracy
    acc = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(acc, tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()

#STEP 2 

# Launch the graph for training
with tf.Session() as sess:
    sess.run(init)
    # op to write logs to Tensorboard
    summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size, shuffle=(i==0))
            # Run optimization op (backprop), cost op (to get loss value)
            # and summary nodes
            _, c, summary = sess.run([optimizer, cost, merged_summary_op],
                                     feed_dict={x: batch_xs, y: batch_ys})
            # Write logs at every iteration
            summary_writer.add_summary(summary, epoch * total_batch + i)
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            print("Epoch: ", '%02d' % (epoch+1), "  =====> Loss=", "{:.9f}".format(avg_cost))

    print("Optimization Finished!")
    summary_writer.flush()

    # Test model
    # Calculate accuracy
    print("Accuracy:", acc.eval({x: mnist.test.images, y: mnist.test.labels}))

Epoch:  01   =====> Loss= 1.288795405
Epoch:  02   =====> Loss= 0.732733053
Epoch:  03   =====> Loss= 0.600515388
Epoch:  04   =====> Loss= 0.536665128
Epoch:  05   =====> Loss= 0.497798177
Epoch:  06   =====> Loss= 0.471231114
Epoch:  07   =====> Loss= 0.451124308
Epoch:  08   =====> Loss= 0.435805831
Epoch:  09   =====> Loss= 0.423496278
Epoch:  10   =====> Loss= 0.413127124
Epoch:  11   =====> Loss= 0.404389117
Epoch:  12   =====> Loss= 0.396879473
Epoch:  13   =====> Loss= 0.390215650
Epoch:  14   =====> Loss= 0.384576511
Epoch:  15   =====> Loss= 0.379182230
Epoch:  16   =====> Loss= 0.374522811
Epoch:  17   =====> Loss= 0.370235896
Epoch:  18   =====> Loss= 0.366593191
Epoch:  19   =====> Loss= 0.362945681
Epoch:  20   =====> Loss= 0.359673449
Epoch:  21   =====> Loss= 0.356555592
Epoch:  22   =====> Loss= 0.353806502
Epoch:  23   =====> Loss= 0.351471897
Epoch:  24   =====> Loss= 0.348860044
Epoch:  25   =====> Loss= 0.346522481
Epoch:  26   =====> Loss= 0.344210360
Epoch:  27  

<b> Part 2  </b>: Using Tensorboard, we can  now visualize the created graph, giving you an overview of your architecture and how all of the major components  are connected. You can also see and analyse the learning curves. 

To launch tensorBoard: 
- Open a Terminal and run the command line **"tensorboard --logdir=lab_2/log_files/"**
- Click on "Tensorboard web interface" in Zoe  


Enjoy It !! 


<img src="MNIST_figures/Accuracy1.png" width="300" height="150" align="center">
<center><span>Figure 1: Validation Accuracy Graph</span></center>
<img src="MNIST_figures/Loss1.png" width="300" height="150" align="center">
<center><span>Figure 2: Training Loss Graph</span></center>
<img src="MNIST_figures/graph1.png" width="600" height="300" align="center">
<center><span>Figure 3: Network Graph</span></center>

# Section 2 : The 99% MNIST Challenge !

<b> Part 1 </b> : LeNet5 implementation

You are now familar with **TensorFlow** and **TensorBoard**. In this section, you are to build, train and test the baseline [LeNet-5](http://yann.lecun.com/exdb/lenet/)  model for the MNIST digits recognition problem.  

Then, you will make some optimizations to get more than 99% of accuracy.

For more informations, have a look at this list of results: http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html


<img src="lenet.png" width="800" height="600" align="center">





The LeNet architecture takes a 28x28xC image as input, where C is the number of color channels. Since MNIST images are grayscale, C is 1 in this case.

--------------------------
**Layer 1 - Convolution (5x5):** The output shape should be 28x28x6. **Activation:** ReLU. **MaxPooling:** The output shape should be 14x14x6.

**Layer 2 - Convolution (5x5):** The output shape should be 10x10x16. **Activation:** ReLU. **MaxPooling:** The output shape should be 5x5x16.

**Flatten:** Flatten the output shape of the final pooling layer such that it's 1D instead of 3D.  You may need to use tf.reshape.

**Layer 3 - Fully Connected:** This should have 120 outputs. **Activation:** ReLU.

**Layer 4 - Fully Connected:** This should have 84 outputs. **Activation:** ReLU.

**Layer 5 - Fully Connected:** This should have 10 outputs. **Activation:** softmax.


<b> Question 2.1.1 </b>  Implement the Neural Network architecture described above.
For that, your will use classes and functions from  https://www.tensorflow.org/api_docs/python/tf/nn. 

We give you some helper functions for weigths and bias initilization. Also you can refer to section 1. 


In [4]:
# Functions for weigths and bias initilization 
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0., shape=shape)
  return tf.Variable(initial)

In [5]:
def LeNet5_Model(image):    
    
    #First layer 
    #We first compute the convolution of the image with the filter filt1, then the result is  the input of the activation function, 
    #and finally we compute the pooling operation  (28x28x1 -> conv_layer_1 -> 28x28x6 -> pool_layer_1 -> 14x14x6)
    filter1= weight_variable([5, 5, 1, 6])
    conv_layer_1 = tf.nn.conv2d(image, filter1, [1,1,1,1], "SAME")
    bias_1 = bias_variable([6])
    hidden_layer_1 = tf.nn.relu(conv_layer_1 + bias_1)
    pool_layer_1 = tf.nn.max_pool(hidden_layer_1, [1,2,2,1], [1,2,2,1], "VALID")
    
    #Second layer
    # Same operations as the first layer, but this time the convolution with filt2 will change the shape of the input
    #(14x14x6 -> conv_layer_2 -> 10x10x16 -> pool_layer_2 -> 5x5x16 -> flat2 -> 400)
    filter2 = weight_variable([5, 5, 6, 16])
    conv_layer_2 = tf.nn.conv2d(pool_layer_1, filter2, [1,1,1,1], "VALID")
    bias_2 = bias_variable([16])
    hidden_layer_2 = tf.nn.relu(conv_layer_2 + bias_2)
    pool_layer_2 = tf.nn.max_pool(hidden_layer_2, [1,2,2,1], [1, 2, 2, 1], "VALID")
    flat2 = tf.contrib.layers.flatten(pool_layer_2)
    
    #Third layer
    #This layer is a fully connected layer. It is a classic hidden layer of a neural network where all the neurons of layer n-1 is connected
    #to every neuron of layer n (400 -> fc_layer_3 -> 120)
    weights3 = weight_variable([400, 120])
    bias_3 = bias_variable([120])
    fc_layer_3 = tf.nn.relu(tf.matmul(flat2, weights3) + bias_3)
    
    #Fourth layer
    #This layer is a fully connected layer (120 -> fc_layer_4-> 84)
    weights4 = weight_variable([120, 84])
    bias_4 = bias_variable([84])
    fc_layer_4 = tf.nn.relu(tf.matmul(fc_layer_3 , weights4) + bias_4)
    
    
    #fifth layer
    #This layer is a fully connected layer (84 -> fc_layer_5-> 10)
    weights5 = weight_variable([84, 10])
    bias_5 = bias_variable([10])
    output = tf.nn.softmax(tf.matmul(fc_layer_4 , weights5) + bias_5)
    
    return output
    

<b> Question 2.1.2. </b>  Calculate the number of parameters of this model 

In [6]:
#In a CNN, each layer ( convolutionnal and fully connected) has two kinds of parameters : weights and biases.The total number of parameters is just the 
#sum of all weights and biases.There are no parameters associated with a MaxPool layer. The pool size, stride, and padding are hyperparameters.
#For a convolutionnal layer, the number of weights is the size of the matrix representing the filter multiplied by the number of channels. For a fully 
#connected layer, the number of weights is the size (width) of the output image of the previous Layer multiplied by the number of neurons in the fully
#connected layer
param_layer_1 = 6*5**2 + 6
param_layer_2 = 16*5**2 + 16
param_layer_3 = 400*120 + 120
param_layer_4 = 120*84 + 84
param_layer_5 = 84*10 + 10

count_param = param_layer_1 + param_layer_2 + param_layer_3 + param_layer_4 + param_layer_5
print("Total number of parameters : " ,count_param)

Total number of parameters :  59706


 Your answer goes here in details 

<b> Question 2.1.3. </b>  Define your model, its accuracy and the loss function according to the following parameters (you can look at Section 1 to see what is expected):

     Learning rate: 0.001
     Loss Fucntion: Cross-entropy
     Optimizer: tf.train.GradientDescentOptimizer
     Number of epochs: 40
     Batch size: 128

In [7]:
tf.reset_default_graph() # reset the default graph before defining a new model
! rm -rf ./log_files # Clear logs

# Parameters
learning_rate = 0.001
training_epochs = 40
batch_size = 128
logs_path = 'log_files/'


#placeholder:
x = tf.placeholder(tf.float32, [None,28,28,1], name='input')
y = tf.placeholder(tf.float32, [None,10], name='output')
# Model, loss function and accuracy
#Model:
with tf.name_scope('Model'):
    pred = LeNet5_Model(x)
#Loss function:
with tf.name_scope('Loss'):
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))
#Optimizer:
with tf.name_scope('SGD'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)


<b> Question 2.1.4. </b>  Implement the evaluation function for accuracy computation 

In [8]:
def evaluate(logits, labels):
    # logits will be the outputs of your model, labels will be one-hot vectors corresponding to the actual labels
    # logits and labels are numpy arrays
    # this function should return the accuracy of your model
    correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
    return tf.reduce_mean(tf.cast(correct, tf.float32))

with tf.name_scope('Accuracy'):
    acc = evaluate(pred, y)

<b> Question 2.1.5. </b>  Implement training pipeline and run the training data through it to train the model.

- Before each epoch, shuffle the training set. 
- Print the loss per mini batch and the training/validation accuracy per epoch. (Display results every 100 epochs)
- Save the model after training
- Print after training the final testing accuracy 



In [9]:
# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss_LeNet-5_SGD", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy_LeNet-5_SGD", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()

#reshaping X_train:
x_train = X_train.reshape(-1, 28, 28, 1)
#reshaping X_val:
x_val = X_validation.reshape(-1, 28, 28, 1)
#reshaping X_test
x_test = X_test.reshape(-1, 28, 28, 1)


def train(init, sess, logs_path, n_epochs, batch_size, optimizer, cost, merged_summary_op):
    startTime = time()
    with tf.Session() as sess:
        sess.run(init)
        # op to write logs to Tensorboard
        summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
        # optimizer and cost are the same kinds of objects as in Section 1
        # Train your model
        # Training cycle
        for epoch in range(training_epochs):
            avg_cost = 0.
            total_batch = int(mnist.train.num_examples/batch_size)
            # Loop over all batches
            for i in range(total_batch):
                batch_xs, batch_ys = mnist.train.next_batch(batch_size, shuffle=(i==0))
                batch_xs = batch_xs.reshape(batch_size,28,28,1)
                # Run optimization op (backprop), cost op (to get loss value)
                # and summary nodes
                _, c, summary = sess.run([optimizer, cost, merged_summary_op],feed_dict={x: batch_xs, y: batch_ys})
                # Write logs at every iteration
                summary_writer.add_summary(summary, epoch * total_batch + i)
                # Compute average loss
                avg_cost += c / total_batch
            if (epoch+1) % 10 == 0:
                # Display loss per 10 epochs
                print("Epoch: ", '%02d' % (epoch+1), "  =====> Loss=", "{:.9f}".format(avg_cost))
                # Display accuracy per 10 epochs
                acc_train= acc.eval({x: x_train, y: y_train})
                acc_val = acc.eval({x: x_val, y: y_validation})
                print("Train accuracy =", "{:.2f}%\n".format(acc_train*100)," | ","Validation accuracy =", "{:.2f}%\n".format(acc_val*100))
        
                
            
        endTime = time()
        print("Optimization Finished!")
        print("Training time =", "{:.3f} seconds".format(endTime-startTime))
        summary_writer.flush()
        pass
        # Print the accuracy on testing data
        print("Accuracy:", acc.eval({x: x_test, y: mnist.test.labels}))
        pass

with tf.Session() as sess:
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)

Epoch:  10   =====> Loss= 0.461601113
Train accuracy = 86.96%
  |  Validation accuracy = 87.92%

Epoch:  20   =====> Loss= 0.267179752
Train accuracy = 92.12%
  |  Validation accuracy = 93.02%

Epoch:  30   =====> Loss= 0.203872303
Train accuracy = 94.05%
  |  Validation accuracy = 94.76%

Epoch:  40   =====> Loss= 0.165260443
Train accuracy = 95.13%
  |  Validation accuracy = 95.84%

Optimization Finished!
Training time = 698.624 seconds
Accuracy: 0.9547


<b> Question 2.1.6 </b> : Use TensorBoard to visualise and save loss and accuracy curves. 
You will save figures in the folder **"lab_2/MNIST_figures"** and display them in your notebook.

Please put your loss and accuracy curves here.

<img src="MNIST_figures/Accuracy_LeNet5.PNG" width="300" height="150" align="center">
<center><span>Figure 1: Validation Accuracy Graph</span></center>
<img src="MNIST_figures/Loss_LeNet5.PNG" width="300" height="150" align="center">
<center><span>Figure 2: Training Loss Graph</span></center>

<b> Part 2 </b> : LeNET 5 Optimization


<b> Question 2.2.1 </b>

- Retrain your network with AdamOptimizer and then fill the table above:


| Optimizer            |  Gradient Descent  |    AdamOptimizer    |
|----------------------|--------------------|---------------------|
| Testing Accuracy     |         0.95       |        0.99          |       
| Training Time        |         698s       |        705s          |  

- Which optimizer gives the best accuracy on test data?

**Your answer:** ...


In [10]:
tf.reset_default_graph() # your implementation goes here
! rm -rf ./log_files # Clear logs
#placeholder:
x = tf.placeholder(tf.float32, [None,28,28,1], name='input')
y = tf.placeholder(tf.float32, [None,10], name='output')
# Model, loss function and accuracy
#Model:
with tf.name_scope('Model'):
    pred = LeNet5_Model(x)
#Loss function:
with tf.name_scope('Loss'):
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))
#Optimizer:
with tf.name_scope('AdamOptimizer'):
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    
#Accuracy
with tf.name_scope('Accuracy'):
    acc = evaluate(pred, y)
    
# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss_LeNet-5_Adam", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy_LeNet-5_Adam", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()


# LAUNCH SESSION
with tf.Session() as sess:
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)



Epoch:  10   =====> Loss= 0.021137630
Train accuracy = 99.08%
  |  Validation accuracy = 98.60%

Epoch:  20   =====> Loss= 0.009455812
Train accuracy = 99.71%
  |  Validation accuracy = 98.92%

Epoch:  30   =====> Loss= 0.003682482
Train accuracy = 99.75%
  |  Validation accuracy = 98.94%

Epoch:  40   =====> Loss= 0.003524325
Train accuracy = 99.91%
  |  Validation accuracy = 99.20%

Optimization Finished!
Training time = 705.503 seconds
Accuracy: 0.9908


The Adam Opptimizer (99% of accuracy) gives a better accuracy than the SGD (95% of accuracy), and it takes only 7s more to compute. 
The main reason is that the Adam optimizer computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients,while the learning rate in the Stochastic Gradient Descent is constant through the iterations. However, we could increase the accuracy with the SGD optimizer by tuning the learning rate empirically.

<b> Question 2.2.2</b> Try to add dropout (keep_prob = 0.75) before the first fully connected layer. You will use tf.nn.dropout for that purpose. What accuracy do you achieve on testing data?

**Accuracy achieved on testing data:** ...

In [18]:
keep_prob = 0.75
def LeNet5_Model_Dropout(image):    
    #First layer 
    #We first compute the convolution of the image with the filter filt1, then the result is  the input of the activation function, 
    #and finally we compute the pooling operation  (28x28x1 -> conv_layer_1 -> 28x28x6 -> pool_layer_1 -> 14x14x6)
    filter1= weight_variable([5, 5, 1, 6])
    conv_layer_1 = tf.nn.conv2d(image, filter1, [1,1,1,1], "SAME")
    bias_1 = bias_variable([6])
    hidden_layer_1 = tf.nn.relu(conv_layer_1 + bias_1)
    pool_layer_1 = tf.nn.max_pool(hidden_layer_1, [1,2,2,1], [1,2,2,1], "VALID")
    
    #Second layer
    # Same operations as the first layer, but this time the convolution with filt2 will change the shape of the input
    #(14x14x6 -> conv_layer_2 -> 10x10x16 -> pool_layer_2 -> 5x5x16 -> flat2 -> 400)
    filter2 = weight_variable([5, 5, 6, 16])
    conv_layer_2 = tf.nn.conv2d(pool_layer_1, filter2, [1,1,1,1], "VALID")
    bias_2 = bias_variable([16])
    hidden_layer_2 = tf.nn.relu(conv_layer_2 + bias_2)
    pool_layer_2 = tf.nn.max_pool(hidden_layer_2, [1,2,2,1], [1, 2, 2, 1], "VALID")
    flat2 = tf.contrib.layers.flatten(pool_layer_2)
    
    # --- dropout layer --- #
    drop_layer = tf.nn.dropout(flat2, keep_prob)
    
    #Third layer
    #This layer is a fully connected layer. It is a classic hidden layer of a neural network where all the neurons of layer n-1 is connected
    #to every neuron of layer n (400 -> fc_layer_3 -> 120)
    weights3 = weight_variable([400, 120])
    bias_3 = bias_variable([120])
    fc_layer_3 = tf.nn.relu(tf.matmul(drop_layer, weights3) + bias_3)
    
    #Fourth layer
    #This layer is a fully connected layer (120 -> fc_layer_4-> 84)
    weights4 = weight_variable([120, 84])
    bias_4 = bias_variable([84])
    fc_layer_4 = tf.nn.relu(tf.matmul(fc_layer_3 , weights4) + bias_4)
    
    
    #fifth layer
    #This layer is a fully connected layer (84 -> fc_layer_5-> 10)
    weights5 = weight_variable([84, 10])
    bias_5 = bias_variable([10])
    output = tf.nn.softmax(tf.matmul(fc_layer_4 , weights5) + bias_5)
    
    return output

In [19]:
tf.reset_default_graph() # your implementation goes here
! rm -rf ./log_files # Clear logs
#placeholder:
x = tf.placeholder(tf.float32, [None,28,28,1], name='input')
y = tf.placeholder(tf.float32, [None,10], name='output')
# Model, loss function and accuracy
#Model:
with tf.name_scope('Model'):
    pred = LeNet5_Model_Dropout(x)
#Loss function:
with tf.name_scope('Loss'):
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))
#Optimizer:
with tf.name_scope('AdamOptimizer'):
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    
#Accuracy
with tf.name_scope('Accuracy'):
    acc = evaluate(pred, y)
    
# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss_LeNet-5_Adam", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy_LeNet-5_Adam", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()


# LAUNCH SESSION
with tf.Session() as sess:
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)



Epoch:  10   =====> Loss= 0.035672985
Train accuracy = 99.07%
  |  Validation accuracy = 98.52%

Epoch:  20   =====> Loss= 0.019310901
Train accuracy = 99.45%
  |  Validation accuracy = 98.88%

Epoch:  30   =====> Loss= 0.013152931
Train accuracy = 99.53%
  |  Validation accuracy = 98.80%

Epoch:  40   =====> Loss= 0.010225293
Train accuracy = 99.61%
  |  Validation accuracy = 98.96%

Optimization Finished!
Training time = 703.228 seconds
Accuracy: 0.9874


The accuracy on test data is slighlty smaller when we use dropout. Dropout is a regularization technique, and is most effective at preventing overfitting, thus we should have had better results using this function. However our network is relatively small and we didn't notice the overfitting before using dropout, as the accuracy on test data (99.08%), is just slightly smaller than the accuracy on training data (99.91%).