<h1 style="text-align:center">Deep Learning   </h1>
<h1 style="text-align:center"> Convolutional Neural Network (CNN) for Handwritten Digits Recognition</h1>

The aim of this session is to practice with Convolutional Neural Networks. Each group should fill and run appropriate notebook cells. 


Generate your final report (export as HTML) and upload it on the submission website http://bigfoot-m1.eurecom.fr/teachingsub/login (using your deeplearnXX/password). Do not forget to run all your cells before generating your final report and do not forget to include the names of all participants in the group. The lab session should be completed and submitted by May 30th 2018 (23:59:59 CET).

# Introduction

In the previous Lab Session, you built a Multilayer Perceptron for recognizing hand-written digits from the MNIST data-set. The best achieved accuracy on testing data was about 97%. Can you do better than these results using a deep CNN ?
In this Lab Session, you will build, train and optimize in TensorFlow one of the early Convolutional Neural Networks,  **LeNet-5**, to go to more than 99% of accuracy. 






In [33]:
!pwd

/Users/liangjianzhong/Desktop/course/github_project/Project-Lab/Deep-Learning/LAB2_CNN


# Load MNIST Data in TensorFlow
Run the cell below to load the MNIST data that comes with TensorFlow. You will use this data in **Section 1** and **Section 2**.

In [4]:
# SUPPRESS WARNINGS
import warnings
warnings.filterwarnings("ignore")

# IMPORT LIBRARIES
import tensorflow as tf
import numpy as np
from time import time
from tensorflow.examples.tutorials.mnist import input_data


# LOAD DATA (RESHAPING THE IMAGES)
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
X_train, y_train = mnist.train.images.reshape((-1,28,28,1)), mnist.train.labels
X_val, y_val     = mnist.validation.images.reshape((-1,28,28,1)), mnist.validation.labels
X_test, y_test   = mnist.test.images.reshape((-1,28,28,1)), mnist.test.labels


# PRINT SHAPES
print("Images Shape:   {}".format(X_train[0].shape))
print("Training Set:   {} samples".format(len(X_train)))
print("Validation Set: {} samples".format(len(X_val)))
print("Test Set:       {} samples".format(len(X_test)))


# DECLARE NOISE
epsilon = 1e-10

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Images Shape:   (28, 28, 1)
Training Set:   55000 samples
Validation Set: 5000 samples
Test Set:       10000 samples


# Section 1 : Neural Network in TensorFlow

Before starting with CNN, let's train and test in TensorFlow the example
**y=softmax(Wx+b)** seen in the first lab. 

In [6]:
# PARAMETERS
learning_rate = 0.01
training_epochs = 40
batch_size = 128
display_step = 1
logs_path = 'log_files/softmax'  # useful for tensorboard


# PLACEHOLDERS
# Images with 784 pixels
x = tf.placeholder(tf.float32, [None, 784], name='InputData')
# 0-9 digits recognition, 10 classes
y = tf.placeholder(tf.float32, [None, 10], name='LabelData')


# WEIGHTS AND BIAS
W = tf.Variable(tf.zeros([784, 10]), name='Weights')
b = tf.Variable(tf.zeros([10]), name='Bias')


# MODEL ARCHITECTURE
# Model (softmax activation)
with tf.name_scope('Model'):
    pred = tf.nn.softmax(tf.matmul(x, W) + b)

# Loss function (cross-entropy)
with tf.name_scope('Loss'):
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))
    
# Optimizer (SGD)
with tf.name_scope('SGD'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
    
# Accuracy
with tf.name_scope('Accuracy'):
    acc = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(acc, tf.float32))


# INIT SUMMARY
init = tf.global_variables_initializer()
tf.summary.scalar("Loss", cost)
tf.summary.scalar("Accuracy", acc)
merged_summary_op = tf.summary.merge_all()


# LAUNCH THE SESSION
with tf.Session() as sess:
    
    sess.run(init)
    summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
    
    for epoch in range(training_epochs):
        
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        
        for i in range(total_batch):
            
            batch_xs, batch_ys = mnist.train.next_batch(batch_size, shuffle=(i==0))

            _, c, summary = sess.run([optimizer, cost, merged_summary_op],
                                     feed_dict={x: batch_xs, y: batch_ys})
            
            summary_writer.add_summary(summary, epoch * total_batch + i)
            
            avg_cost += c / total_batch
        
        if (epoch+1) % display_step == 0:
            print("Epoch:", '%02d' % (epoch+1), " =====>  Loss=", "{:.9f}".format(avg_cost))

    print("Optimization Finished!")
    summary_writer.flush()

    print("Accuracy:", acc.eval({x: mnist.test.images, y: mnist.test.labels}))

Epoch: 01  =====>  Loss= 1.288826695
Epoch: 02  =====>  Loss= 0.732409473
Epoch: 03  =====>  Loss= 0.600389052
Epoch: 04  =====>  Loss= 0.536720089
Epoch: 05  =====>  Loss= 0.497859464
Epoch: 06  =====>  Loss= 0.471154518
Epoch: 07  =====>  Loss= 0.451148307
Epoch: 08  =====>  Loss= 0.436046376
Epoch: 09  =====>  Loss= 0.423529351
Epoch: 10  =====>  Loss= 0.413163693
Epoch: 11  =====>  Loss= 0.404344185
Epoch: 12  =====>  Loss= 0.396952076
Epoch: 13  =====>  Loss= 0.390443056
Epoch: 14  =====>  Loss= 0.384207741
Epoch: 15  =====>  Loss= 0.379257517
Epoch: 16  =====>  Loss= 0.374483205
Epoch: 17  =====>  Loss= 0.370433719
Epoch: 18  =====>  Loss= 0.366613644
Epoch: 19  =====>  Loss= 0.362859343
Epoch: 20  =====>  Loss= 0.359751099
Epoch: 21  =====>  Loss= 0.356533978
Epoch: 22  =====>  Loss= 0.353698811
Epoch: 23  =====>  Loss= 0.351298389
Epoch: 24  =====>  Loss= 0.348955568
Epoch: 25  =====>  Loss= 0.346460257
Epoch: 26  =====>  Loss= 0.344384526
Epoch: 27  =====>  Loss= 0.342060804
E

<img src="images/softmax_board.png" width="600" height="400" align="center">
<center><span>Figure 1: accuracy and loss</span></center>

# Section 2 : The 99% MNIST Challenge!

<b> Part 1 </b> : LeNet5 implementation

The LeNet architecture takes a 28x28xC image as input, where C is the number of color channels. Since MNIST images are grayscale, C is 1 in this case.

--------------------------
**Layer 1 - Convolution (5x5):** The output shape should be 28x28x6. **Activation:** ReLU. **MaxPooling:** The output shape should be 14x14x6.

**Layer 2 - Convolution (5x5):** The output shape should be 10x10x16. **Activation:** ReLU. **MaxPooling:** The output shape should be 5x5x16.

**Flatten:** Flatten the output shape of the final pooling layer such that it's 1D instead of 3D.  You may need to use tf.reshape.

**Layer 3 - Fully Connected:** This should have 120 outputs. **Activation:** ReLU.

**Layer 4 - Fully Connected:** This should have 84 outputs. **Activation:** ReLU.

**Layer 5 - Fully Connected:** This should have 10 outputs. **Activation:** softmax.


In [22]:
def Conv_layer(inputs, size_in, size_out, padding1, padding2):
    w = tf.Variable(tf.truncated_normal([5, 5, size_in, size_out], stddev=0.1), name="W")
    b = tf.Variable(tf.constant(0.1, shape=[size_out]), name="b")
    conv = tf.nn.conv2d(inputs, w, strides=[1, 1, 1, 1], padding= padding1)
    act = tf.nn.relu(conv + b)
    tf.summary.histogram("weights", w)
    tf.summary.histogram("biases", b)
    tf.summary.histogram("activations", act)
    return tf.nn.max_pool(act, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding= padding2)

def FC_layer(inputs, size_in, size_out, act = 'relu'):
    w = tf.Variable(tf.truncated_normal([size_in, size_out], stddev=0.1), name="W")
    b = tf.Variable(tf.constant(0.1, shape=[size_out]), name="b")
    if act == 'softmax':
        act = tf.nn.softmax(tf.matmul(inputs, w) + b)
    else:
        act = tf.nn.relu(tf.matmul(inputs, w) + b)
    tf.summary.histogram("weights", w)
    tf.summary.histogram("biases", b)
    tf.summary.histogram("activations", act)
    return act

In [31]:
def LeNet5_Model(images, dropout_rate = 1):    
    
    # --- INTERNAL MODEL DEFINITION --- #
    with tf.name_scope("LeNet5"):
        # CONVOLUTIONAL 1 LAYER
        with tf.name_scope("Conv1"):
            # Convolution (28x28x1 -> CONV1 -> 28x28x6 -> MAXPOOL1 -> 14x14x6)
            pool1 = Conv_layer(inputs=images, size_in = 1, size_out = 6, padding1 = 'SAME', padding2 = 'VALID')

        # CONVOLUTIONAL 2 LAYER
        with tf.name_scope('Conv2'):
            # Convolution (14x14x6 -> CONV2 -> 10x10x16 -> MAXPOOL2 -> 5x5x16)
            pool2 = Conv_layer(inputs=pool1, size_in = 6, size_out = 16, padding1 = 'VALID', padding2 = 'VALID')
        
        # FLATTEN LAYER
        with tf.name_scope('Flatten'):
            # Flatten (5x5x16 -> 400)
            flat = tf.contrib.layers.flatten(pool2)
            if dropout_rate < 1:
                flat = tf.nn.dropout(x=flat,keep_prob = dropout_rate)
        
        # FULLY CONNECTED LAYER 3
        with tf.name_scope('FC1'):
            # Activation (400 -> FC3 -> 120)
            hidden3 = FC_layer(flat, 400, 120, act = 'relu')
            if dropout_rate < 1:
                hidden3 = tf.nn.dropout(x=hidden3,keep_prob = dropout_rate)            
            
        # FULLY CONNECTED LAYER 4
        with tf.name_scope('FC2'):
            # Activation (120 -> FC4 -> 84)
            hidden4 = FC_layer(hidden3, 120, 84, act = 'relu')
            if dropout_rate < 1:
                hidden4 = tf.nn.dropout(x=hidden4,keep_prob = dropout_rate)
                
        # FULLY CONNECTED LAYER 5
        with tf.name_scope('Output'):
            # Activation (84 -> FC5 -> 10)
            predictions = hidden4 = FC_layer(hidden4, 84, 10, act = 'softmax')
        
        return predictions

<div class='alert alert-success'>
The LeNet architecture now has been declared. The filter sizes and strides have been chosen in order to respect requested sizes between the various layers, remembering the formula:
$$ I_{i+1} = \frac{I_i - F + 2P}{S} + 1 $$

<ul>
    <li><b>Convolutional Layer 1</b>: 28x28x1 ----- <i>[5x5]</i> -----> 28x28x6</li>
    we've chosen $S = 1$. The padding will be $P = 2$ (chosen by the TensorFlow option <tt>'SAME'</tt>).<br><br>
    <li><b>MaxPooling Layer 1</b>: 28x28x6 -----> 14x14x6</li>
    To avoid "intersections" between the various pools, the kernel size value has been chosen also for the stride, so:
    $ 14 = \frac{28 - K}{K} + 1 $. We've chosen $ K = 2 $ as feasible value.<br><br>
    <li><b>Convolutional Layer 2</b>: 14x14x6 ----- <i>[5x5]</i> ----->10x10x16</li>
    $I_{i+1} = \frac{I_i - F}{S} + 1 = \frac{14 - 5}{S} + 1 $, so $ S = 1 $ is a feasible value.<br><br>
    <li><b>MaxPooling Layer 2</b>: 10x10x16 -----> 5x5x16</li>
    Also in this case, to avoid "intersections" between the various pools, the kernel size value has been chosen also for the stride, so:
    $ 5 = \frac{10 - K}{K} + 1 $. We've chosen $ K = 2 $ as feasible value.
</ul>
</div>

<b> Question 2.1.2. </b>  Calculate the number of parameters of this model 

In [28]:
# COMPUTE THE NUMBER OF PARAMETERS
count = np.sum([np.prod(v.shape.as_list()) for v in tf.trainable_variables()])

# PRINT THE RESULT
print("Total number of parameters:", count)

Total number of parameters: 61706


<div class='alert alert-success'>
The number of trainable parameters is <b>61706</b>. This number has been computed thanks to the <tt>tf.trainable_variables()</tt> function available in TensorFlow:<br>
<br>
- For each variable, the shape is computed.<br>
- The components of the shape are multiplied, so the number of parameters regarding that shape is obtained.<br>
- All the parameters regarding each shape are then summed together.<br>
<br>
The total number so, can be calculated also by hand:
<code>
5 x 5 x 1 x 6    +
6                +
5 x 5 x 6 x 16   +
16               +
400 x 120        +
120              +
120 x 84         +
84               +
84 x 10          +
10               =
<b>61706</b>
</code>


<br>
&#42;<i>note: this cell, even if it's located before, has been executed <b>after</b> the following cell, to be run with the right instance of the model (the CNN), and not with the one of the classic neural network.</i>
</div>

<b> Question 2.1.3. </b>  Define your model, its accuracy and the loss function according to the following parameters (you can look at Section 1 to see what is expected):

     Learning rate: 0.001
     Loss Function: Cross-entropy
     Optimizer: tf.train.GradientDescentOptimizer
     Number of epochs: 40
     Batch size: 128

In [26]:
# RESET THE GRAPH AND CLEAR LOGS
! rm -rf ./log_files
tf.reset_default_graph()


# PARAMETERS
learning_rate = 0.001
training_epochs = 40
batch_size = 128
loss_display_step = 1
acc_display_step = 10
logs_path = 'log_files/CNN'


# PLACEHOLDERS
x = tf.placeholder(tf.float32, [None,28,28,1], name='input')
y = tf.placeholder(tf.float32, [None,10], name='output')


# MODEL DEFINITION
# Model structure
with tf.name_scope('model'):
    pred = LeNet5_Model(images=x)
    
# Loss
with tf.name_scope('loss'):
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))

# Optimizer
with tf.name_scope('optimizer'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

<b> Question 2.1.4. </b>  Implement the evaluation function for accuracy computation 

In [27]:
# EVALUATION FUNCTION
def evaluate(logits, labels):
    
    correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
    return tf.reduce_mean(tf.cast(correct, tf.float32))

# Accuracy
with tf.name_scope('accuracy'):
    acc = evaluate(pred, y)

<b> Question 2.1.5. </b>  Implement training pipeline and run the training data through it to train the model.

- Before each epoch, shuffle the training set. 
- Print the loss per mini batch and the training/validation accuracy per epoch. (Display results every 100 epochs)
- Save the model after training
- Print after training the final testing accuracy 

In [29]:
# TRAIN FUNCTION
def train(init, sess, logs_path, n_epochs, batch_size, optimizer, cost, merged_summary_op):
    
    # Start
    print("[OPTIMIZATION STARTED]\n")
    startTime = time()
    sess.run(init)
    summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
    
    # Training
    for epoch in range(n_epochs):
        
        # Init
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        
        # Loop over all batches
        for i in range(total_batch):
            
            # Next batch
            batch_xs, batch_ys = mnist.train.next_batch(batch_size, shuffle=(i==0))
            batch_xs = batch_xs.reshape(batch_size,28,28,1)
            
            # Run optimization
            _, c, summary = sess.run([optimizer, cost, merged_summary_op],
                                     feed_dict={x: batch_xs, y: batch_ys})
            
            # Write logs
            summary_writer.add_summary(summary, epoch * total_batch + i)
            
            # Compute average loss
            avg_cost += c / total_batch
        
        # Display logs
        if (epoch+1) % loss_display_step == 0:
            print("Epoch:", '%02d' % (epoch+1), " =====>  Loss =", "{:.4f}".format(avg_cost))
        
        if (epoch+1) % acc_display_step == 0:
            acc_train = acc.eval({x: X_train, y: y_train})
            acc_val = acc.eval({x: X_val, y: y_val})
            
            print("Train accuracy =", "{:.2f}%".format(acc_train*100),
                  " | ",
                  "Validation accuracy =", "{:.2f}%\n".format(acc_val*100))
            
    
    endTime = time()
    print("[OPTIMIZATION FINISHED]")
    print("Training time =", "{:.3f} seconds".format(endTime-startTime))
    summary_writer.flush()
    
    
    # --- ACCURACY ------------------------------------------------------ #
    acc_test = acc.eval({x: X_test, y: y_test})
    print("Test accuracy =", "{:.2f}%".format(acc_test*100))
    
    return endTime-startTime

<div class='label label-success'>LeNet architecture (SGD optimizer)</div>

In [30]:
# MAKE SUMMARIES
init = tf.global_variables_initializer()
tf.summary.scalar("loss-LeNet", cost)
tf.summary.scalar("accuracy-LeNet", acc)
merged_summary_op = tf.summary.merge_all()


# LAUNCH SESSION
with tf.Session() as sess:
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)

[OPTIMIZATION STARTED]

Epoch: 01  =====>  Loss = 2.2634
Epoch: 02  =====>  Loss = 2.1361
Epoch: 03  =====>  Loss = 1.8475
Epoch: 04  =====>  Loss = 1.2495
Epoch: 05  =====>  Loss = 0.7923
Epoch: 06  =====>  Loss = 0.6004
Epoch: 07  =====>  Loss = 0.5038
Epoch: 08  =====>  Loss = 0.4459
Epoch: 09  =====>  Loss = 0.4069
Epoch: 10  =====>  Loss = 0.3778
Train accuracy = 89.34%  |  Validation accuracy = 89.98%

Epoch: 11  =====>  Loss = 0.3554
Epoch: 12  =====>  Loss = 0.3368
Epoch: 13  =====>  Loss = 0.3211
Epoch: 14  =====>  Loss = 0.3067
Epoch: 15  =====>  Loss = 0.2949
Epoch: 16  =====>  Loss = 0.2832
Epoch: 17  =====>  Loss = 0.2737
Epoch: 18  =====>  Loss = 0.2649
Epoch: 19  =====>  Loss = 0.2561
Epoch: 20  =====>  Loss = 0.2486
Train accuracy = 92.66%  |  Validation accuracy = 93.32%

Epoch: 21  =====>  Loss = 0.2412
Epoch: 22  =====>  Loss = 0.2341
Epoch: 23  =====>  Loss = 0.2275
Epoch: 24  =====>  Loss = 0.2219
Epoch: 25  =====>  Loss = 0.2158
Epoch: 26  =====>  Loss = 0.2105
Ep

<b> Question 2.1.6 </b> : Use TensorBoard to visualise and save loss and accuracy curves. 

<div class='alert alert-success'>
This is the graph showing the general architecture of the network:<br>
<img src="images/LeNet.png" width="600px" align="center"><br>
<br>
Learning progress:<br>
<img src="images/cnn_board.png" width="600px" align="center"><br>
<br>

</div>

<b> Part 2 </b> : LeNET 5 Optimization

<div class='label label-success'>LeNet architecture (Adam optimizer)</div>

In [12]:
# RESET THE GRAPH AND CLEAR LOGS
! rm -rf ./log_files
tf.reset_default_graph()


# PARAMETERS
learning_rate = 0.001
training_epochs = 40
batch_size = 128
loss_display_step = 1
acc_display_step = 10
logs_path = 'log_files/'


# PLACEHOLDERS
x = tf.placeholder(tf.float32, [None,28,28,1], name='input')
y = tf.placeholder(tf.float32, [None,10], name='output')


# MODEL DEFINITION
# Model structure
with tf.name_scope('model'):
    pred = LeNet5_Model(images=x)
    
# Loss
with tf.name_scope('loss'):
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))

# Optimizer
with tf.name_scope('optimizer'):
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

# Accuracy
with tf.name_scope('accuracy'):
    acc = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(acc, tf.float32))
    

# MAKE SUMMARIES
init = tf.global_variables_initializer()
tf.summary.scalar("loss-LeNet", cost)
tf.summary.scalar("accuracy-LeNet", acc)
merged_summary_op = tf.summary.merge_all()


# LAUNCH SESSION
with tf.Session() as sess:
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)

[OPTIMIZATION STARTED]

Epoch: 01  =====>  Loss = 0.3601
Epoch: 02  =====>  Loss = 0.0956
Epoch: 03  =====>  Loss = 0.0645
Epoch: 04  =====>  Loss = 0.0490
Epoch: 05  =====>  Loss = 0.0401
Epoch: 06  =====>  Loss = 0.0331
Epoch: 07  =====>  Loss = 0.0277
Epoch: 08  =====>  Loss = 0.0233
Epoch: 09  =====>  Loss = 0.0204
Epoch: 10  =====>  Loss = 0.0194
Train accuracy = 99.44%  |  Validation accuracy = 98.78%

Epoch: 11  =====>  Loss = 0.0165
Epoch: 12  =====>  Loss = 0.0133
Epoch: 13  =====>  Loss = 0.0131
Epoch: 14  =====>  Loss = 0.0120
Epoch: 15  =====>  Loss = 0.0096
Epoch: 16  =====>  Loss = 0.0091
Epoch: 17  =====>  Loss = 0.0095
Epoch: 18  =====>  Loss = 0.0076
Epoch: 19  =====>  Loss = 0.0074
Epoch: 20  =====>  Loss = 0.0068
Train accuracy = 99.87%  |  Validation accuracy = 99.04%

Epoch: 21  =====>  Loss = 0.0055
Epoch: 22  =====>  Loss = 0.0092
Epoch: 23  =====>  Loss = 0.0038
Epoch: 24  =====>  Loss = 0.0063
Epoch: 25  =====>  Loss = 0.0069
Epoch: 26  =====>  Loss = 0.0044
Ep

<b> Question 2.2.1 </b>

- Retrain your network with AdamOptimizer and then fill the table above:


| Optimizer            |  Gradient Descent  |    AdamOptimizer    |
|----------------------|--------------------|---------------------|
| Testing Accuracy     |       95.51%       |       98.98%        |       
| Training Time        |      909 sec       |      877 sec        |  

- Which optimizer gives the best accuracy on test data?

<div class='alert alert-success'>
The adam optimizer gives a <b>better</b> accuracy. It takes a little bit of time more than the model with the standard SGD, but is able to classify in a better way the <tt>MNIST</tt> dataset.<br>
<img src="https://github.com/claudioscalzo/deep-learning/raw/f429d807dbd2b5171f8e9e0acc62fe0b8bdc1fbb/cnn-on-mnist/MNIST_figures/adam.png" width=300px>
<br>
The <b>Adam</b> optimizer, indeed, is a combination of the improvements brought by other optimizers: it adds momentum to the RMSProp optimizer. It adapts not only the learning rates basing it on the first moments (the mean) as in RMSProp, but it also makes use of the average of the second moments of the gradients (the variance).
</div>

<b> Question 2.2.2</b> Try to add dropout (keep_prob = 0.75) before the first fully connected layer. You will use tf.nn.dropout for that purpose. What accuracy do you achieve on testing data?

**Accuracy achieved on testing data:** <b><u>98.99%</u></b> (with 0.75-dropout)

<div class='alert alert-success'>
The accuracy obtained with the dropout layer is almost the same, just slightly better than the LeNet CNN without it. Dropout is, in general, useful to generalize the network avoid overfitting: the performances will be a bit worse on the training data but, in general, on validation and test data will be less overfitted, so the performances could be better. In this case, the obtained accuracy on test set is almost the same, just slightly better (98.98% without dropout vs 98.99% with dropout), while on the train data the accuracy of the model with dropout is of course a bit worse.<br>
The computational time is slightly smaller: maybe this is due to a TensorFlow code optimization, which can avoid to compute the hidden outputs of dropped neurons.<br>
</div>

<div class='label label-success'>LeNet architecture with Dropout (Adam optimizer)</div>

In [32]:
# RESET THE GRAPH AND CLEAR LOGS
! rm -rf ./log_files
tf.reset_default_graph()


# PARAMETERS
learning_rate = 0.001
training_epochs = 40
batch_size = 128
loss_display_step = 1
acc_display_step = 10
logs_path = 'log_files/'


# PLACEHOLDERS
x = tf.placeholder(tf.float32, [None,28,28,1], name='input')
y = tf.placeholder(tf.float32, [None,10], name='output')


# MODEL DEFINITION
# Model structure (with dropout)
with tf.name_scope('model'):
    pred = LeNet5_Model(images=x, dropout_rate = 0.75)
    
# Loss
with tf.name_scope('loss'):
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))

# Optimizer
with tf.name_scope('optimizer'):
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

# Accuracy
with tf.name_scope('accuracy'):
    acc = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(acc, tf.float32))
    

# MAKE SUMMARIES
init = tf.global_variables_initializer()
tf.summary.scalar("loss-LeNet", cost)
tf.summary.scalar("accuracy-LeNet", acc)
merged_summary_op = tf.summary.merge_all()


# LAUNCH SESSION
with tf.Session() as sess:
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)

[OPTIMIZATION STARTED]

Epoch: 01  =====>  Loss = 0.5821
Epoch: 02  =====>  Loss = 0.1632
Epoch: 03  =====>  Loss = 0.1159
Epoch: 04  =====>  Loss = 0.0986
Epoch: 05  =====>  Loss = 0.0811
Epoch: 06  =====>  Loss = 0.0733
Epoch: 07  =====>  Loss = 0.0660
Epoch: 08  =====>  Loss = 0.0626
Epoch: 09  =====>  Loss = 0.0579
Epoch: 10  =====>  Loss = 0.0514
Train accuracy = 98.62%  |  Validation accuracy = 98.44%

Epoch: 11  =====>  Loss = 0.0490
Epoch: 12  =====>  Loss = 0.0476
Epoch: 13  =====>  Loss = 0.0460
Epoch: 14  =====>  Loss = 0.0416
Epoch: 15  =====>  Loss = 0.0394
Epoch: 16  =====>  Loss = 0.0392
Epoch: 17  =====>  Loss = 0.0379
Epoch: 18  =====>  Loss = 0.0353
Epoch: 19  =====>  Loss = 0.0338
Epoch: 20  =====>  Loss = 0.0327
Train accuracy = 99.05%  |  Validation accuracy = 98.44%

Epoch: 21  =====>  Loss = 0.0319
Epoch: 22  =====>  Loss = 0.0300
Epoch: 23  =====>  Loss = 0.0286
Epoch: 24  =====>  Loss = 0.0272
Epoch: 25  =====>  Loss = 0.0298
Epoch: 26  =====>  Loss = 0.0287
Ep