<h1 style="text-align:center">Deep Learning   </h1>
<h1 style="text-align:center"> Lab Session 2 - 1.5 Hours </h1>
<h1 style="text-align:center"> Convolutional Neural Network (CNN) for Handwritten Digits Recognition</h1>

<b> Student 1:</b> Benigmim Mohammed Yasser
<b> Student 2:</b> Lopez Colombe

The aim of this session is to practice with Convolutional Neural Networks. Each group should fill and run appropriate notebook cells. 

Follow instructions step by step until the end and submit your complete notebook as an archive (tar -cf groupXnotebook.tar DL_lab2/).

Do not forget to run all your cells before generating your final report and do not forget to include the names of all participants in the group. The lab session should be completed by May 29th 2019 (23:59:59 CET).

# Introduction

In the last Lab Session, you built a Multilayer Perceptron for recognizing hand-written digits from the MNIST data-set. The best achieved accuracy on testing data was about 97%.  Can  you do better than these results using a deep CNN ?
In this Lab Session, you will build, train and optimize in TensorFlow one of the early Convolutional Neural Networks,  **LeNet-5**, to go to  more than 99% of accuracy. 






# Load MNIST Data in TensorFlow
Run the cell below to load the MNIST data that comes with TensorFlow. You will use this data in **Section 1** and **Section 2**.

In [2]:
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
X_train, y_train           = mnist.train.images, mnist.train.labels
X_validation, y_validation = mnist.validation.images, mnist.validation.labels
X_test, y_test             = mnist.test.images, mnist.test.labels
print("Image Shape: {}".format(X_train[0].shape))
print("Training Set:   {} samples".format(len(X_train)))
print("Validation Set: {} samples".format(len(X_validation)))
print("Test Set:       {} samples".format(len(X_test)))

epsilon = 1e-10 # this is a parameter you will use later

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Image Shape: (784,)
Training Set:   55000 samples
Validation Set: 5000 samples
Test Set:       10000 samples


# Section 1 : My First Model in TensorFlow

Before starting with CNN, let's train and test in TensorFlow the example
**y=softmax(Wx+b)** seen in the first lab. 

This model reaches an accuracy of about 92 %.
You will also learn how to launch the TensorBoard https://www.tensorflow.org/get_started/summaries_and_tensorboard to visualize the computation graph, statistics and learning curves. 

<b> Part 1 </b> : Read carefully the code in the cell below. Run it to perform training. 

In [3]:
#STEP 1

# Parameters
learning_rate = 0.01
training_epochs = 40
batch_size = 128
display_step = 1
logs_path = 'log_files/'  # useful for tensorboard

# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, [None, 784], name='InputData')
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.float32, [None, 10], name='LabelData')

# Set model weights
W = tf.Variable(tf.zeros([784, 10]), name='Weights')
b = tf.Variable(tf.zeros([10]), name='Bias')

# Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
with tf.name_scope('Model'):
    # Model
    pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
with tf.name_scope('Loss'):
    # Minimize error using cross entropy
    # We use tf.clip_by_value to avoid having too low numbers in the log function
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))
with tf.name_scope('SGD'):
    # Gradient Descent
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
with tf.name_scope('Accuracy'):
    # Accuracy
    acc = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(acc, tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()

#STEP 2 

# Launch the graph for training
with tf.Session() as sess:
    sess.run(init)
    # op to write logs to Tensorboard
    summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size, shuffle=(i==0))
            # Run optimization op (backprop), cost op (to get loss value)
            # and summary nodes
            _, c, summary = sess.run([optimizer, cost, merged_summary_op],
                                     feed_dict={x: batch_xs, y: batch_ys})
            # Write logs at every iteration
            summary_writer.add_summary(summary, epoch * total_batch + i)
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            print("Epoch: ", '%02d' % (epoch+1), "  =====> Loss=", "{:.9f}".format(avg_cost))

    print("Optimization Finished!")
    summary_writer.flush()

    # Test model
    # Calculate accuracy
    print("Accuracy:", acc.eval({x: mnist.test.images, y: mnist.test.labels}))

Epoch:  01   =====> Loss= 1.288720207
Epoch:  02   =====> Loss= 0.732706248
Epoch:  03   =====> Loss= 0.600367205
Epoch:  04   =====> Loss= 0.536793501
Epoch:  05   =====> Loss= 0.497855682
Epoch:  06   =====> Loss= 0.471093028
Epoch:  07   =====> Loss= 0.450922387
Epoch:  08   =====> Loss= 0.435924568
Epoch:  09   =====> Loss= 0.423351202
Epoch:  10   =====> Loss= 0.413254377
Epoch:  11   =====> Loss= 0.404583712
Epoch:  12   =====> Loss= 0.396899246
Epoch:  13   =====> Loss= 0.390303331
Epoch:  14   =====> Loss= 0.384377501
Epoch:  15   =====> Loss= 0.379212090
Epoch:  16   =====> Loss= 0.374694992
Epoch:  17   =====> Loss= 0.370271073
Epoch:  18   =====> Loss= 0.366335135
Epoch:  19   =====> Loss= 0.362733373
Epoch:  20   =====> Loss= 0.359501061
Epoch:  21   =====> Loss= 0.356150374
Epoch:  22   =====> Loss= 0.354020533
Epoch:  23   =====> Loss= 0.351282963
Epoch:  24   =====> Loss= 0.348624653
Epoch:  25   =====> Loss= 0.346463552
Epoch:  26   =====> Loss= 0.344494880
Epoch:  27  

<b> Part 2  </b>: Using Tensorboard, we can  now visualize the created graph, giving you an overview of your architecture and how all of the major components  are connected. You can also see and analyse the learning curves. 

To launch tensorBoard: 
- Open a Terminal and run the command line **"tensorboard --logdir=lab_2/log_files/"**
- Click on "Tensorboard web interface" in Zoe  


Enjoy It !! 


# Section 2 : The 99% MNIST Challenge !

<b> Part 1 </b> : LeNet5 implementation

You are now familar with **TensorFlow** and **TensorBoard**. In this section, you are to build, train and test the baseline [LeNet-5](http://yann.lecun.com/exdb/lenet/)  model for the MNIST digits recognition problem.  

Then, you will make some optimizations to get more than 99% of accuracy.

For more informations, have a look at this list of results: http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html


<img src="lenet.png" width="800" height="600" align="center">





The LeNet architecture takes a 28x28xC image as input, where C is the number of color channels. Since MNIST images are grayscale, C is 1 in this case.

--------------------------
**Layer 1 - Convolution (5x5):** The output shape should be 28x28x6. **Activation:** ReLU. **MaxPooling:** The output shape should be 14x14x6.

**Layer 2 - Convolution (5x5):** The output shape should be 10x10x16. **Activation:** ReLU. **MaxPooling:** The output shape should be 5x5x16.

**Flatten:** Flatten the output shape of the final pooling layer such that it's 1D instead of 3D.  You may need to use tf.reshape.

**Layer 3 - Fully Connected:** This should have 120 outputs. **Activation:** ReLU.

**Layer 4 - Fully Connected:** This should have 84 outputs. **Activation:** ReLU.

**Layer 5 - Fully Connected:** This should have 10 outputs. **Activation:** softmax.


<b> Question 2.1.1 </b>  Implement the Neural Network architecture described above.
For that, your will use classes and functions from  https://www.tensorflow.org/api_docs/python/tf/nn. 

We give you some helper functions for weigths and bias initilization. Also you can refer to section 1. 


In [4]:
# Functions for weigths and bias initilization 
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0., shape=shape)
  return tf.Variable(initial)

In [5]:
from tensorflow.contrib.layers import flatten
def create_conv(prev, filter_size, nb, padding):#fonction to create the convolution layer
    #Set model weights
    conv_W = weight_variable((filter_size, filter_size, int(prev.get_shape()[-1]), nb))
    conv_b = bias_variable([nb])
    #creation of the convolution layer
    conv   = tf.nn.conv2d(prev, conv_W, strides=[1, 1, 1, 1], padding=padding) + conv_b
    # Activation: relu
    conv = tf.nn.relu(conv)
    # Pooling
    conv = tf.nn.max_pool(conv, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],padding=padding) #"VALID" ?? (c'est ce qu'il se fait généralement)
    return conv


def LeNet5_Model(image):    
    # your inmplementation goes here
    conv = create_conv(image,5,6,"SAME") #zero padding
    conv = create_conv(conv,5,16,"VALID") #without padding
    flat=flatten(conv)

    # First fully connected layer
    fc1_W = tf.Variable(tf.truncated_normal(shape=(int(flat.get_shape()[1]), 120)))
    fc1_b = tf.Variable(tf.zeros(120))
    fc1   = tf.matmul(flat, fc1_W) + fc1_b
    fc1 = tf.nn.relu(fc1)

    #Second fully connected layer
    fc2_W = tf.Variable(tf.truncated_normal(shape=(int(fc1.get_shape()[1]), 84)))
    fc2_b = tf.Variable(tf.zeros(84))
    fc2   = tf.matmul(fc1, fc2_W) + fc2_b
    fc2 = tf.nn.relu(fc2)
    
    #Third fully connected layer
    fc3_W = tf.Variable(tf.truncated_normal(shape=(int(fc2.get_shape()[1]), 10)))
    fc3_b = tf.Variable(tf.zeros(10))
    fc3   = tf.matmul(fc2, fc3_W) + fc3_b
    logits = tf.nn.softmax(fc3)
    return logits

In [6]:
# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, [None, 28,28,1], name='InputData')
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.float32, [None, 10], name='LabelData')

In [7]:
LeNet5_Model(x)

<tf.Tensor 'Softmax:0' shape=(?, 10) dtype=float32>

<b> Question 2.1.2. </b>  Calculate the number of parameters of this model 

In [8]:
for v in tf.trainable_variables():
    print(v)

<tf.Variable 'Weights:0' shape=(784, 10) dtype=float32_ref>
<tf.Variable 'Bias:0' shape=(10,) dtype=float32_ref>
<tf.Variable 'Variable:0' shape=(5, 5, 1, 6) dtype=float32_ref>
<tf.Variable 'Variable_1:0' shape=(6,) dtype=float32_ref>
<tf.Variable 'Variable_2:0' shape=(5, 5, 6, 16) dtype=float32_ref>
<tf.Variable 'Variable_3:0' shape=(16,) dtype=float32_ref>
<tf.Variable 'Variable_4:0' shape=(400, 120) dtype=float32_ref>
<tf.Variable 'Variable_5:0' shape=(120,) dtype=float32_ref>
<tf.Variable 'Variable_6:0' shape=(120, 84) dtype=float32_ref>
<tf.Variable 'Variable_7:0' shape=(84,) dtype=float32_ref>
<tf.Variable 'Variable_8:0' shape=(84, 10) dtype=float32_ref>
<tf.Variable 'Variable_9:0' shape=(10,) dtype=float32_ref>


In [9]:
for v in tf.trainable_variables():
    print(np.prod(v.get_shape().as_list()))

7840
10
150
6
2400
16
48000
120
10080
84
840
10


In [10]:
np.sum([np.prod(v.get_shape().as_list()) for v in tf.trainable_variables()])

69556

<h3>Answer</h3>
<h4>Convolutionnal layers</h4>
<li>The first convolution layer has 6 filters (the depth of the output is equal to the number of filters) of size 5x5. It takes an input of depth one (there is only one grayscale image). Thus,<br>
    -the number of parameters for the weights is 5x5x1x6=150 <br>
    -the number of parameters for the biases is 6<br>
    =>Therefore, this layer has 150+6 = <b>156</b> parameters.</li>
The pooling layer doesn't have any parameter.
<li>The second convolution layer has 16 filters of size 5x5. The depth of the inputs is the depth of the outputs of the previous layer, thus the depth of the inputs is 6. Thus,<br>
    -the number of parameters for the weights is 5x5x6x16=2400 <br>
    -the number of parameters for the biases is 16.<br>
    Therefore, this layer has 2400+16 = <b>2416</b> parameters.</li>
As before, the pooling layer doesn't have any parameter by definition.
<h4>Fully connected layers</h4>
The number of parameters for fully connnected layers is (number_of_inputs+1)xnumber_of_outputs.
<li>After the flatten step, there will be 5x5x16 inputs in the first fully connected layer. Also, there are 120 outputs from this layer. Thus, this layer will have (5x5x16+1)x120 = <b>48120</b> parameters.</li>
<li>The second fully connected layer has 120 inputs and 84 outputs. Thus, this layer will have (120+1)x84 = <b>10164</b> parameters.</li>
<li>The last fully connected layer has 84 inputs and 10 outputs. Thus, it will have (84+1)x10 = <b>850</b> parameters.</li><BR>
<li><b>Finally, by summing the number of parameters of all the layers, we end up with a total of 61 706 parameters for this model.</b></li>    

In [11]:
a = 5*5*6+6
b =  5*5*16*6+16
c = (16 *5*5+1) *120
d = (120+1)*84
e = (84+1)*10
total = a +b +c +d +e
print(total)

61706


<h4>NB</h4>
The two first Variables "Weights:0" and "Biases:0" come from the algorithm implemented in Section 1 (we had 28x28x10=7840 weights and 10 biases).<br>
Thus, the number of parameters corresponding to our model is 69556-7840-10=61 706.

<b> Question 2.1.3. </b>  Define your model, its accuracy and the loss function according to the following parameters (you can look at Section 1 to see what is expected):

     Learning rate: 0.001
     Loss Fucntion: Cross-entropy
     Optimizer: tf.train.GradientDescentOptimizer
     Number of epochs: 40
     Batch size: 128

In [12]:
tf.reset_default_graph() # reset the default graph before defining a new model

# Parameters
learning_rate = 0.001
training_epochs = 40
batch_size = 128
logs_path = 'log_files/'

# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, [None, 28, 28, 1], name="x")
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.float32, [None, 10], name="y")

# Model, loss function and accuracy
    # Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
with tf.name_scope('Model'):
    # Model
    pred = LeNet5_Model(x) # The LeNet architecture we implemented previously
with tf.name_scope('Loss'):
    # Minimize error using cross entropy
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))
with tf.name_scope('SGD'):
    # Gradient Descent
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

<b> Question 2.1.4. </b>  Implement the evaluation function for accuracy computation 

In [13]:
def evaluate(logits, labels):
    # logits will be the outputs of your model, labels will be one-hot vectors corresponding to the actual labels
    # logits and labels are numpy arrays
    # this function should return the accuracy of your model
    predicted_number = tf.argmax(logits, axis=1)
    correct_prediction = tf.equal(predicted_number, tf.argmax(labels, axis=1))    
    accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    return accuracy_operation

with tf.name_scope('Accuracy'):
    acc=evaluate(pred,y)

<b> Question 2.1.5. </b>  Implement training pipeline and run the training data through it to train the model.

- Before each epoch, shuffle the training set. 
- Print the loss per mini batch and the training/validation accuracy per epoch. (Display results every 100 epochs)
- Save the model after training
- Print after training the final testing accuracy 



In [14]:
import time 

# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss_LeNet-5_SGD", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy_LeNet-5_SGD", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()

display_step = 100 #display loss every 10 mini-batch operations

#reshape the inputs
x_train=X_train.reshape(-1, 28, 28, 1)
x_validation=X_validation.reshape(-1, 28, 28, 1)

def train(init, sess, logs_path, n_epochs, batch_size, optimizer, cost, merged_summary_op):
    # optimizer and cost are the same kinds of objects as in Section 1
    # Train your model
    # Launch the graph for training
    sess.run(init)
    # op to write logs to Tensorboard
    summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
    saver=tf.train.Saver()
    # Training cycle
    start = time.time()
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            #We shuffle the inputs at the first iteration on the batch
            batch_xs, batch_ys = mnist.train.next_batch(batch_size, shuffle=(i==0))
            # Run optimization op (backprop), cost op (to get loss value)
            # and summary nodes
            batch_xs=batch_xs.reshape(-1,28,28,1)
            _, c, summary = sess.run([optimizer, cost, merged_summary_op],
                                     feed_dict={x: batch_xs, y: batch_ys})
            # Write logs at every iteration
            summary_writer.add_summary(summary, epoch * total_batch + i)
            if i % display_step == 0:
                print("Epoch: ", '%02d' % (epoch+1), "mini batch: ", '%02d' %i, "  =====> Loss=", "{:.9f}".format(c))
            
        # Compute average loss
            avg_cost += c / total_batch
        #Display logs per epoch step
        print("Epoch: ", '%02d' % (epoch+1), "  =====> Loss=", "{:.9f}".format(avg_cost),
              ",   Train accuracy=", acc.eval({x: batch_xs, y:batch_ys}),
              ",   Validation accuracy=", acc.eval({x: x_validation, y: y_validation}))

    end = time.time()
    duration = end - start 
        
    print("Optimization Finished!")
    summary_writer.flush()
    # Save the model after training
    saver.save(sess, 'MNIST_figures/model'+str(optimizer)[7:10])
    
    # Test model
    # Calculate accuracy
    print("Accuracy:", acc.eval({x: X_test.reshape(-1,28,28,1), y: y_test}))
    #display time of the training phase
    print("Time:", duration, "s")

with tf.Session() as sess:
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)

Epoch:  01 mini batch:  00   =====> Loss= 21.635761261
Epoch:  01 mini batch:  100   =====> Loss= 2.294750929
Epoch:  01 mini batch:  200   =====> Loss= 2.208283424
Epoch:  01 mini batch:  300   =====> Loss= 2.072838306
Epoch:  01 mini batch:  400   =====> Loss= 2.091166496
Epoch:  01   =====> Loss= 4.913833133 ,   Train accuracy= 0.2578125 ,   Validation accuracy= 0.222
Epoch:  02 mini batch:  00   =====> Loss= 2.053969383
Epoch:  02 mini batch:  100   =====> Loss= 1.904825211
Epoch:  02 mini batch:  200   =====> Loss= 1.896517754
Epoch:  02 mini batch:  300   =====> Loss= 2.076156139
Epoch:  02 mini batch:  400   =====> Loss= 2.258590460
Epoch:  02   =====> Loss= 1.902920057 ,   Train accuracy= 0.4609375 ,   Validation accuracy= 0.455
Epoch:  03 mini batch:  00   =====> Loss= 1.689344168
Epoch:  03 mini batch:  100   =====> Loss= 1.500219822
Epoch:  03 mini batch:  200   =====> Loss= 1.910304308
Epoch:  03 mini batch:  300   =====> Loss= 1.576107502
Epoch:  03 mini batch:  400   ====

<b> Question 2.1.6 </b> : Use TensorBoard to visualise and save loss and accuracy curves. 
You will save figures in the folder **"lab_2/MNIST_figures"** and display them in your notebook.

Please put your loss and accuracy curves here.

![title](MNIST_figures/accSGD1.png)
![title](MNIST_figures/lossSGD1.png)

<b> Part 2 </b> : LeNET 5 Optimization


<b> Question 2.2.1 </b>

- Retrain your network with AdamOptimizer and then fill the table above:


| Optimizer            |  Gradient Descent  |    AdamOptimizer    |
|----------------------|--------------------|---------------------|
| Testing Accuracy     |         ...        |        ...          |       
| Training Time        |         ...        |        ...          |  

- Which optimizer gives the best accuracy on test data?

**Your answer:** ...


In [15]:
tf.reset_default_graph()
# your implementation goes here

# Parameters
learning_rate = 0.001
training_epochs = 40
batch_size = 128
logs_path = 'log_files/'

# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, [None, 28, 28, 1], name="x")
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.float32, [None, 10], name="y")

# Model, loss function and accuracy
    # Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
with tf.name_scope('Model'):
    # Model
    pred = LeNet5_Model(x) # The LeNet architecture we implemented previously
with tf.name_scope('Loss'):
    # Minimize error using cross entropy
    # We use tf.clip_by_value to avoid having too low numbers in the log function
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))
with tf.name_scope('Adam'):
    # Adam
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
with tf.name_scope('Accuracy'):
    # Accuracy
    acc=evaluate(pred,y)
    
# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss_LeNet-5_Adam", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy_LeNet-5_Adam", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()
###

display_step = 100

x_train=X_train.reshape(-1, 28, 28, 1)
x_validation=X_validation.reshape(-1, 28, 28, 1)

with tf.Session() as sess:
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)

Epoch:  01 mini batch:  00   =====> Loss= 19.967731476
Epoch:  01 mini batch:  100   =====> Loss= 18.776060104
Epoch:  01 mini batch:  200   =====> Loss= 18.318424225
Epoch:  01 mini batch:  300   =====> Loss= 17.988945007
Epoch:  01 mini batch:  400   =====> Loss= 17.629165649
Epoch:  01   =====> Loss= 19.039522204 ,   Train accuracy= 0.1640625 ,   Validation accuracy= 0.1906
Epoch:  02 mini batch:  00   =====> Loss= 19.248172760
Epoch:  02 mini batch:  100   =====> Loss= 18.168834686
Epoch:  02 mini batch:  200   =====> Loss= 16.551845551
Epoch:  02 mini batch:  300   =====> Loss= 16.190050125
Epoch:  02 mini batch:  400   =====> Loss= 17.629165649
Epoch:  02   =====> Loss= 18.498232764 ,   Train accuracy= 0.203125 ,   Validation accuracy= 0.19
Epoch:  03 mini batch:  00   =====> Loss= 19.453281403
Epoch:  03 mini batch:  100   =====> Loss= 18.168834686
Epoch:  03 mini batch:  200   =====> Loss= 16.010162354
Epoch:  03 mini batch:  300   =====> Loss= 17.449275970
Epoch:  03 mini batc

![title](MNIST_figures/accAdam1.png)
![title](MNIST_figures/lossAdam1.png)

We get:

| Optimizer            |  Gradient Descent  |    AdamOptimizer    |
|----------------------|--------------------|---------------------|
| Testing Accuracy     |         91,07%     |        97,81%       |       
| Training Time        |         13'29''    |        14'38''      |  

- Which optimizer gives the best accuracy on test data?

**Your answer:** The Adam Optimizer gives the best accuracy. When we look at the curves, it reaches more than 90% of accuracy after 7 min.

<b> Question 2.2.2</b> Try to add dropout (keep_prob = 0.75) before the first fully connected layer. You will use tf.nn.dropout for that purpose. What accuracy do you achieve on testing data?

**Accuracy achieved on testing data:** ...

In [16]:
def LeNet5_Model_Dropout(image):    
    # your implementation goes here
    #2 convolution layers
    conv = create_conv(image,5,6,"SAME")
    conv = create_conv(conv,5,16,"VALID")
    flat=flatten(conv)
    
    #add the dropout layer before the first fully connected layer
    flat = tf.nn.dropout(flat, keep_prob = 0.75)

    # First fully connected layer
    fc1_W = tf.Variable(tf.truncated_normal(shape=(int(flat.get_shape()[1]), 120)))
    fc1_b = tf.Variable(tf.zeros(120))
    fc1   = tf.matmul(flat, fc1_W) + fc1_b
    fc1 = tf.nn.relu(fc1)

    # Second fully connected layer
    fc2_W = tf.Variable(tf.truncated_normal(shape=(int(fc1.get_shape()[1]), 84)))
    fc2_b = tf.Variable(tf.zeros(84))
    fc2   = tf.matmul(fc1, fc2_W) + fc2_b
    fc2 = tf.nn.relu(fc2)
    
    # Third fully connected layer
    fc3_W = tf.Variable(tf.truncated_normal(shape=(int(fc2.get_shape()[1]), 10)))
    fc3_b = tf.Variable(tf.zeros(10))
    fc3   = tf.matmul(fc2, fc3_W) + fc3_b
    logits = tf.nn.softmax(fc3)
    return logits

In [17]:
tf.reset_default_graph()
# your implementation goes here

# Parameters
learning_rate = 0.001
training_epochs = 40
batch_size = 128
logs_path = 'log_files/'

# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, [None, 28, 28, 1], name="x")
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.float32, [None, 10], name="y")

# Model, loss function and accuracy
    # Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
with tf.name_scope('Model'):
    # Model
    pred =  LeNet5_Model_Dropout(x) # The LeNet architecture with dropout
with tf.name_scope('Loss'):
    # Minimize error using cross entropy
    # We use tf.clip_by_value to avoid having too low numbers in the log function
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))
with tf.name_scope('SGD'):
    # Gradient Descent
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
    
with tf.name_scope('Accuracy'):
    # Accuracy
    acc=evaluate(pred,y)
    
# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss_LeNetDropout-5_SGD", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy_LeNetDropout-5_SGD", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()
###

display_step = 100

x_train=X_train.reshape(-1, 28, 28, 1)
x_validation=X_validation.reshape(-1, 28, 28, 1)

with tf.Session() as sess:
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)

Epoch:  01 mini batch:  00   =====> Loss= 21.297061920
Epoch:  01 mini batch:  100   =====> Loss= 15.928880692
Epoch:  01 mini batch:  200   =====> Loss= 2.222682953
Epoch:  01 mini batch:  300   =====> Loss= 2.317413807
Epoch:  01 mini batch:  400   =====> Loss= 2.156584740
Epoch:  01   =====> Loss= 6.254903066 ,   Train accuracy= 0.109375 ,   Validation accuracy= 0.1256
Epoch:  02 mini batch:  00   =====> Loss= 2.376873255
Epoch:  02 mini batch:  100   =====> Loss= 2.108033657
Epoch:  02 mini batch:  200   =====> Loss= 2.124039650
Epoch:  02 mini batch:  300   =====> Loss= 1.997534513
Epoch:  02 mini batch:  400   =====> Loss= 1.894331932
Epoch:  02   =====> Loss= 2.008781370 ,   Train accuracy= 0.46875 ,   Validation accuracy= 0.4266
Epoch:  03 mini batch:  00   =====> Loss= 1.790112376
Epoch:  03 mini batch:  100   =====> Loss= 1.747947454
Epoch:  03 mini batch:  200   =====> Loss= 2.079729319
Epoch:  03 mini batch:  300   =====> Loss= 1.622712731
Epoch:  03 mini batch:  400   ====

![title](MNIST_figures/accSGD2.png)
![title](MNIST_figures/lossSGD2.png)

In [18]:
tf.reset_default_graph()
# your implementation goes here
# Parameters
learning_rate = 0.001
training_epochs = 40
batch_size = 128
logs_path = 'log_files/'

# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, [None, 28, 28, 1], name="x")
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.float32, [None, 10], name="y")

# Model, loss function and accuracy
    # Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
with tf.name_scope('Model'):
    # Model
    pred =  LeNet5_Model_Dropout(x) # The LeNet architecture we implemented previously
with tf.name_scope('Loss'):
    # Minimize error using cross entropy
    # We use tf.clip_by_value to avoid having too low numbers in the log function
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))
with tf.name_scope('Adam'):
    # Adam
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
with tf.name_scope('Accuracy'):
    # Accuracy
    acc=evaluate(pred,y)
    
# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss_LeNetDropout-5_Adam", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy_LeNetDropout-5_Adam", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()
###

display_step = 100

x_train=X_train.reshape(-1, 28, 28, 1)
x_validation=X_validation.reshape(-1, 28, 28, 1)

with tf.Session() as sess:
    train(init, sess, logs_path, training_epochs, batch_size, optimizer, cost, merged_summary_op)

Epoch:  01 mini batch:  00   =====> Loss= 21.814531326
Epoch:  01 mini batch:  100   =====> Loss= 15.481681824
Epoch:  01 mini batch:  200   =====> Loss= 8.949636459
Epoch:  01 mini batch:  300   =====> Loss= 2.127032757
Epoch:  01 mini batch:  400   =====> Loss= 1.757178783
Epoch:  01   =====> Loss= 8.220158888 ,   Train accuracy= 0.4921875 ,   Validation accuracy= 0.5426
Epoch:  02 mini batch:  00   =====> Loss= 1.193530560
Epoch:  02 mini batch:  100   =====> Loss= 1.106262684
Epoch:  02 mini batch:  200   =====> Loss= 1.277941942
Epoch:  02 mini batch:  300   =====> Loss= 0.913125813
Epoch:  02 mini batch:  400   =====> Loss= 0.625115931
Epoch:  02   =====> Loss= 0.982291756 ,   Train accuracy= 0.71875 ,   Validation accuracy= 0.7886
Epoch:  03 mini batch:  00   =====> Loss= 0.596485198
Epoch:  03 mini batch:  100   =====> Loss= 0.488438070
Epoch:  03 mini batch:  200   =====> Loss= 0.784871817
Epoch:  03 mini batch:  300   =====> Loss= 0.344962239
Epoch:  03 mini batch:  400   ===

![title](MNIST_figures/accAdam2.png)
![title](MNIST_figures/lossAdam2.png)

**Accuracy achieved on testing data:** ...


| Optimizer            |  Gradient Descent  |    AdamOptimizer    |
|----------------------|--------------------|---------------------|
| Testing Accuracy     |         72,28%     |        97,95%       |       
| Training Time        |         13'47''    |        14'06''      |  


Usually, dropout is used to avoid overfitting. Here, we see that we don't need it as our result are not better than without.<br>
With Dropout, the result for the gradient descent is worst than without.<br>
The Adam Optimizer still provides better results. The results for the Adam Optimizer don't differ a lot from the model with the Dropout layer and without.