<h1 style="text-align:center">Deep Learning   </h1>
<h1 style="text-align:center"> Lab Session 2 - 1.5 Hours </h1>
<h1 style="text-align:center"> Convolutional Neural Network (CNN) for Handwritten Digits Recognition</h1>

The aim of this session is to practice with Convolutional Neural Networks. Each group should fill and run appropriate notebook cells. 

Follow instructions step by step until the end and submit your complete notebook as an archive (tar -cf groupXnotebook.tar DL_lab2/).

Do not forget to run all your cells before generating your final report and do not forget to include the names of all participants in the group. The lab session should be completed by May 29th 2019 (23:59:59 CET).

# Introduction

In the last Lab Session, you built a Multilayer Perceptron for recognizing hand-written digits from the MNIST data-set. The best achieved accuracy on testing data was about 97%.  Can  you do better than these results using a deep CNN ?
In this Lab Session, you will build, train and optimize in TensorFlow one of the early Convolutional Neural Networks,  **LeNet-5**, to go to  more than 99% of accuracy. 






# Load MNIST Data in TensorFlow
Run the cell below to load the MNIST data that comes with TensorFlow. You will use this data in **Section 1** and **Section 2**.

In [1]:
import tensorflow as tf
import numpy as np
from tensorflow.contrib.layers import flatten
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
X_train, y_train           = mnist.train.images, mnist.train.labels
X_validation, y_validation = mnist.validation.images, mnist.validation.labels
X_test, y_test             = mnist.test.images, mnist.test.labels
print("Image Shape: {}".format(X_train[0].shape))
print("Training Set:   {} samples".format(len(X_train)))
print("Validation Set: {} samples".format(len(X_validation)))
print("Test Set:       {} samples".format(len(X_test)))

epsilon = 1e-10 # this is a parameter you will use later

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Image Shape: (784,)
Training Set:   55000 samples
Validation Set: 5000 samples
Test Set:       10000 samples


# Section 1 : My First Model in TensorFlow

Before starting with CNN, let's train and test in TensorFlow the example
**y=softmax(Wx+b)** seen in the first lab. 

This model reaches an accuracy of about 92 %.
You will also learn how to launch the TensorBoard https://www.tensorflow.org/get_started/summaries_and_tensorboard to visualize the computation graph, statistics and learning curves. 

<b> Part 1 </b> : Read carefully the code in the cell below. Run it to perform training. 

In [2]:
#STEP 1

# Parameters
learning_rate = 0.01
training_epochs = 40
batch_size = 128
display_step = 1
logs_path = 'log_files/'  # useful for tensorboard

# tf Graph Input:  mnist data image of shape 28*28=784
x = tf.placeholder(tf.float32, [None, 784], name='InputData')
# 0-9 digits recognition,  10 classes
y = tf.placeholder(tf.float32, [None, 10], name='LabelData')

# Set model weights
W = tf.Variable(tf.zeros([784, 10]), name='Weights')
b = tf.Variable(tf.zeros([10]), name='Bias')

# Construct model and encapsulating all ops into scopes, making Tensorboard's Graph visualization more convenient
with tf.name_scope('Model'):
    # Model
    pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
with tf.name_scope('Loss'):
    # Minimize error using cross entropy
    # We use tf.clip_by_value to avoid having too low numbers in the log function
    cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred, epsilon, 1.0)), reduction_indices=1))
with tf.name_scope('SGD'):
    # Gradient Descent
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
with tf.name_scope('Accuracy'):
    # Accuracy
    acc = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(acc, tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy", acc)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()

#STEP 2 

# Launch the graph for training
with tf.Session() as sess:
    sess.run(init)
    # op to write logs to Tensorboard
    summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size, shuffle=(i==0))
            # Run optimization op (backprop), cost op (to get loss value)
            # and summary nodes
            _, c, summary = sess.run([optimizer, cost, merged_summary_op],
                                     feed_dict={x: batch_xs, y: batch_ys})
            # Write logs at every iteration
            summary_writer.add_summary(summary, epoch * total_batch + i)
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            print("Epoch: ", '%02d' % (epoch+1), "  =====> Loss=", "{:.9f}".format(avg_cost))

    print("Optimization Finished!")
    summary_writer.flush()

    # Test model
    # Calculate accuracy
    print("Accuracy:", acc.eval({x: mnist.test.images, y: mnist.test.labels}))

Epoch:  01   =====> Loss= 1.289075496
Epoch:  02   =====> Loss= 0.732772976
Epoch:  03   =====> Loss= 0.600493635
Epoch:  04   =====> Loss= 0.536569137
Epoch:  05   =====> Loss= 0.497915017
Epoch:  06   =====> Loss= 0.471137127
Epoch:  07   =====> Loss= 0.451497282
Epoch:  08   =====> Loss= 0.435659541
Epoch:  09   =====> Loss= 0.423468368
Epoch:  10   =====> Loss= 0.413297223
Epoch:  11   =====> Loss= 0.404200772
Epoch:  12   =====> Loss= 0.396597400
Epoch:  13   =====> Loss= 0.390212909
Epoch:  14   =====> Loss= 0.384446583
Epoch:  15   =====> Loss= 0.379382839
Epoch:  16   =====> Loss= 0.374561733
Epoch:  17   =====> Loss= 0.370195697
Epoch:  18   =====> Loss= 0.366345226
Epoch:  19   =====> Loss= 0.362872024
Epoch:  20   =====> Loss= 0.359495350
Epoch:  21   =====> Loss= 0.356628603
Epoch:  22   =====> Loss= 0.353785908
Epoch:  23   =====> Loss= 0.351137313
Epoch:  24   =====> Loss= 0.348836233
Epoch:  25   =====> Loss= 0.346419177
Epoch:  26   =====> Loss= 0.344511320
Epoch:  27  

<b> Part 2  </b>: Using Tensorboard, we can  now visualize the created graph, giving you an overview of your architecture and how all of the major components  are connected. You can also see and analyse the learning curves. 

To launch tensorBoard: 
- Open a Terminal and run the command line **"tensorboard --logdir=lab_2/log_files/"**
- Click on "Tensorboard web interface" in Zoe  


Enjoy It !! 


# Section 2 : The 99% MNIST Challenge !

<b> Part 1 </b> : LeNet5 implementation

You are now familar with **TensorFlow** and **TensorBoard**. In this section, you are to build, train and test the baseline [LeNet-5](http://yann.lecun.com/exdb/lenet/)  model for the MNIST digits recognition problem.  

Then, you will make some optimizations to get more than 99% of accuracy.

For more informations, have a look at this list of results: http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html


<img src="lenet.png" width="800" height="600" align="center">





The LeNet architecture takes a 28x28xC image as input, where C is the number of color channels. Since MNIST images are grayscale, C is 1 in this case.

--------------------------
**Layer 1 - Convolution (5x5):** The output shape should be 28x28x6. **Activation:** ReLU. **MaxPooling:** The output shape should be 14x14x6.

**Layer 2 - Convolution (5x5):** The output shape should be 10x10x16. **Activation:** ReLU. **MaxPooling:** The output shape should be 5x5x16.

**Flatten:** Flatten the output shape of the final pooling layer such that it's 1D instead of 3D.  You may need to use tf.reshape.

**Layer 3 - Fully Connected:** This should have 120 outputs. **Activation:** ReLU.

**Layer 4 - Fully Connected:** This should have 84 outputs. **Activation:** ReLU.

**Layer 5 - Fully Connected:** This should have 10 outputs. **Activation:** softmax.


<b> Question 2.1.1 </b>  Implement the Neural Network architecture described above.
For that, your will use classes and functions from  https://www.tensorflow.org/api_docs/python/tf/nn. 

We give you some helper functions for weigths and bias initilization. Also you can refer to section 1. 


In [3]:
# Functions for weigths and bias initilization 
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0., shape=shape)
  return tf.Variable(initial)


In [4]:
def LeNet5_Model(image, keep_probability=1., dense_keep_probability=1., batch_size=-1):    
    # Input Layer
    input_layer = tf.reshape(image, [batch_size, 28, 28, 1])

    # Convolutional layer #1
    conv1 = tf.nn.conv2d(
        input=input_layer,
        filter=weight_variable((5, 5, 1, 6)), # height, width, in_channels, out_channels
        strides=[1, 1, 1, 1],
        padding='SAME',
        name='conv1'
    )
    conv1 = tf.nn.relu(conv1 + bias_variable([1, 28, 28, 6]))
    mpool1 = tf.nn.max_pool(
        value=conv1, 
        ksize=[1, 2, 2, 1], 
        strides=[1, 2, 2, 1],
        padding='VALID', # No padding
        name='mpool1'
    )
    
    # Convolutional layer #2
    conv2 = tf.nn.conv2d(
        input=mpool1,
        filter=weight_variable((5, 5, 6, 16)), # height, width, in_channels, out_channels
        strides=[1, 1, 1, 1],
        padding='VALID', # No padding
        name='conv2'
    )
    conv2 = tf.nn.relu(conv2 + bias_variable([1, 10, 10, 16]))
    mpool2 = tf.nn.max_pool(
        value=conv2, 
        ksize=[1, 2, 2, 1], 
        strides=[1, 2, 2, 1],
        padding='VALID', # No padding
        name='mpool2'
    )
    
    # Flatten layer
    flat = flatten(mpool2)
    flat = tf.nn.dropout(
        x=flat,
        keep_prob=keep_probability
    )
    
    # Fully Connected with 120 neurons
    dense1 = tf.layers.dense(
        inputs=flat,
        units=120,
        activation=tf.nn.relu,
        use_bias=True,
        bias_initializer=tf.constant_initializer(value=1.0),
        name='dense1'
    )
    dense1 = tf.nn.dropout(
        x=dense1,
        keep_prob=dense_keep_probability,
        name='dense_dropout1'
    )
    
    # Fully Connected with 84 neurons
    dense2 = tf.layers.dense(
        inputs=dense1,
        units=84,
        activation=tf.nn.relu,
        use_bias=True,
        bias_initializer=tf.constant_initializer(value=1.0),
        name='dense2'
    )
    dense2 = tf.nn.dropout(
        x=dense2,
        keep_prob=dense_keep_probability,
        name='dense_dropout2'
    )
    
    # Fully Connected with 10 neurons
    dense3 = tf.layers.dense(
        inputs=dense2,
        units=10,
        activation=tf.nn.softmax,
        use_bias=True,
        bias_initializer=tf.constant_initializer(value=1.0),
        name='dense3'
    )
    
    
    return dense3

<b> Question 2.1.2. </b>  Calculate the number of parameters of this model 

In [5]:
# Parameters for the first convolutional layer
conv1 = 5*5*1 
# Parameters for the second convolutional layer
conv2 = 5*5*6  
# Parameters for the first dense layer
dense1 = 5*5*16*120 
# Parameters for the second dense layer
dense2 = 120*84  
# Parameters for the first output layer
dense3 = 84*10 
# All the biases
bias = 6 + 16 + 120 + 84 + 10 

total = bias + dense1 + dense2 + dense3 + conv2 + conv1
print('Number of parameters: ', total)

Number of parameters:  59331


<b> Question 2.1.3. </b>  Define your model, its accuracy and the loss function according to the following parameters (you can look at Section 1 to see what is expected):

     Learning rate: 0.001
     Loss Fucntion: Cross-entropy
     Optimizer: tf.train.GradientDescentOptimizer
     Number of epochs: 40
     Batch size: 128

In [6]:
tf.reset_default_graph() # reset the default graph before defining a new model

# Parameters
learning_rate = 0.001
training_epochs = 40
batch_size = 128
logs_path = 'log_files/'

<b> Question 2.1.4. </b>  Implement the evaluation function for accuracy computation 

In [7]:
def evaluate(logits, labels):
    # logits will be the outputs of your model, labels will be one-hot vectors corresponding to the actual labels
    # logits and labels are numpy arrays
    # this function should return the accuracy of your model
    equals = tf.equal(tf.argmax(logits, axis=1), tf.argmax(labels, axis=1))
    return tf.reduce_mean(tf.cast(equals, tf.float32))

<b> Question 2.1.5. </b>  Implement training pipeline and run the training data through it to train the model.

- Before each epoch, shuffle the training set. 
- Print the loss per mini batch and the training/validation accuracy per epoch. (Display results every 100 epochs)
- Save the model after training
- Print after training the final testing accuracy 



In [8]:
import time 

def train(learning_rate, training_epochs, batch_size, display_step = 1, \
          logs_path=logs_path, optFunction="SGD", keep_probability=1.0, dense_keep_probability=1.0, verbose=True, ):
    
    optFunctions = {"SGD":tf.train.GradientDescentOptimizer, "Adam":tf.train.AdamOptimizer}
    
    # Erase previous graph
    tf.reset_default_graph()

    x = tf.placeholder(tf.float32, [None, 28, 28, 1], name='InputData')
    y = tf.placeholder(tf.float32, [None, 10], name='LabelData')
    keep_prob = tf.placeholder(tf.float32)

    # Construct model
    with tf.name_scope('Model'):
        pred = LeNet5_Model(x, 
                    keep_probability=keep_probability, 
                    dense_keep_probability=dense_keep_probability)

    with tf.name_scope('Loss'):
        cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))

    with tf.name_scope(optFunction):
        opt = optFunctions[optFunction](learning_rate)
        gvs = opt.compute_gradients(cost)
        capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
        optimizer = opt.apply_gradients(capped_gvs)

    # Evaluate model
    with tf.name_scope('Accuracy'):
        accuracy = evaluate(pred, y)

    # Initializing the variables
    init = tf.global_variables_initializer()

    # Create a summary to monitor cost tensor
    tf.summary.scalar("Loss", cost)
    # Create a summary to monitor accuracy tensor
    tf.summary.scalar("Accuracy", accuracy)
    # Merge all summaries into a single op
    merged_summary_op = tf.summary.merge_all()

    x_val, y_val = mnist.validation.images.reshape(-1, 28, 28, 1), mnist.validation.labels
    x_test, y_test = mnist.test.images.reshape(-1, 28, 28, 1), mnist.test.labels

    with tf.Session() as sess:
        sess.run(init)
        # op to write logs to Tensorboard
        summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
        saver = tf.train.Saver()
        # Training cycle
        start_time = time.time()
        for epoch in range(training_epochs):
            avg_cost = 0.
            total_batch = int(mnist.train.num_examples/batch_size)
            # Loop over all batches
            for i in range(total_batch):
                # Shuffle input only in the first iteration of the batch
                batch_xs, batch_ys = mnist.train.next_batch(batch_size, shuffle =(i == 0))
                batch_xs = batch_xs.reshape(-1, 28, 28, 1)
                # Run optimization op (backprop), cost op (to get loss value)
                # and summary nodes
                _, c, summary = sess.run([optimizer, cost, merged_summary_op],
                                         feed_dict={x: batch_xs, y: batch_ys, keep_prob: keep_probability})
                # Write logs at every iteration
                summary_writer.add_summary(summary, epoch * total_batch + i)
                # Compute average loss
                avg_cost += c / total_batch
            # Display logs per epoch step
            #train_acc = accuracy.eval({x: batch_xs, y:batch_ys})
            val_acc = accuracy.eval({x: x_val, y:y_val, keep_prob:keep_probability})
            test_acc = accuracy.eval({x: x_test, y:y_test, keep_prob:keep_probability})

            if verbose is True and (epoch+1) % display_step == 0:
                print("Epoch: ", '%02d' % (epoch+1), \
                      "  =====> Loss=", "{:.9f}".format(avg_cost), \
                      " Validation accuracy=", val_acc, " Test accuracy=", test_acc)
            if test_acc >= 0.99:
                if verbose is True:
                    print("Test Accuracy over 99%% reached after %d epochs" %(epoch+1))
                break
        end_time = time.time()
        elapsed_time = end_time - start_time 
        saver.save(sess, 'Models/model_' + str(learning_rate) + '_' + str(batch_size) + '_' + optFunction)
        if verbose is True:
            print("Training Finished!")
            # Test model and calculate accuracy
            print("Test accuracy:", accuracy.eval({x: x_test, y:y_test, keep_prob: keep_probability}))
            print("Elapsed time: ", elapsed_time, "sec")


In [9]:
train(learning_rate, training_epochs, batch_size, display_step=5)

Epoch:  05   =====> Loss= 0.604889131  Validation accuracy= 0.8694  Test accuracy= 0.8705
Epoch:  10   =====> Loss= 0.330030155  Validation accuracy= 0.918  Test accuracy= 0.9161
Epoch:  15   =====> Loss= 0.255773544  Validation accuracy= 0.935  Test accuracy= 0.9341
Epoch:  20   =====> Loss= 0.213853497  Validation accuracy= 0.9428  Test accuracy= 0.9433
Epoch:  25   =====> Loss= 0.184961832  Validation accuracy= 0.949  Test accuracy= 0.9502
Epoch:  30   =====> Loss= 0.163537280  Validation accuracy= 0.957  Test accuracy= 0.9562
Epoch:  35   =====> Loss= 0.146653618  Validation accuracy= 0.9604  Test accuracy= 0.9596
Epoch:  40   =====> Loss= 0.133313967  Validation accuracy= 0.966  Test accuracy= 0.9632
Training Finished!
Test accuracy: 0.9632
Elapsed time:  672.5316023826599 sec


<b> Question 2.1.6 </b> : Use TensorBoard to visualise and save loss and accuracy curves. 
You will save figures in the folder **"lab_2/MNIST_figures"** and display them in your notebook.

<img src="MNIST_figures/accuracy.png" width="800" height="600" align="center">

<img src="MNIST_figures/loss.png" width="800" height="600" align="center">

<b> Part 2 </b> : LeNET 5 Optimization


<b> Question 2.2.1 </b>

- Retrain your network with AdamOptimizer and then fill the table above:


| Optimizer            |  Gradient Descent  |    AdamOptimizer    |
|----------------------|--------------------|---------------------|
| Testing Accuracy     |        96,32%      |        99,09%       |       
| Training Time        |       11,20 mins   |       5,46 mins     |  

- Which optimizer gives the best accuracy on test data?

**Your answer:** The table above shows that AdamOptimizer gives the best result. It reaches 99% accuracy in 5,46 mins while using Gradient Descent we reached 96,32% in 11,20 mins.


In [10]:
tf.reset_default_graph()
# your implementation goes here
train(learning_rate, training_epochs, batch_size, optFunction="Adam")

Epoch:  01   =====> Loss= 0.310434482  Validation accuracy= 0.9774  Test accuracy= 0.9733
Epoch:  02   =====> Loss= 0.071436997  Validation accuracy= 0.9806  Test accuracy= 0.9776
Epoch:  03   =====> Loss= 0.049268089  Validation accuracy= 0.9854  Test accuracy= 0.983
Epoch:  04   =====> Loss= 0.038085153  Validation accuracy= 0.985  Test accuracy= 0.9848
Epoch:  05   =====> Loss= 0.030567726  Validation accuracy= 0.9838  Test accuracy= 0.985
Epoch:  06   =====> Loss= 0.024459270  Validation accuracy= 0.9864  Test accuracy= 0.9869
Epoch:  07   =====> Loss= 0.018488267  Validation accuracy= 0.985  Test accuracy= 0.9843
Epoch:  08   =====> Loss= 0.015350795  Validation accuracy= 0.9888  Test accuracy= 0.988
Epoch:  09   =====> Loss= 0.013342386  Validation accuracy= 0.988  Test accuracy= 0.9867
Epoch:  10   =====> Loss= 0.013449900  Validation accuracy= 0.9866  Test accuracy= 0.9872
Epoch:  11   =====> Loss= 0.011783514  Validation accuracy= 0.9848  Test accuracy= 0.983
Epoch:  12   ====

<b> Question 2.2.2</b> Try to add dropout (keep_prob = 0.75) before the first fully connected layer. You will use tf.nn.dropout for that purpose. What accuracy do you achieve on testing data?

**Accuracy achieved on testing data:** ...

In [11]:
tf.reset_default_graph()
# your implementation goes here
train(learning_rate, training_epochs, batch_size, optFunction="Adam", display_step=5, keep_probability= 0.75 )

Epoch:  05   =====> Loss= 0.053087217  Validation accuracy= 0.981  Test accuracy= 0.9795
Epoch:  10   =====> Loss= 0.032998348  Validation accuracy= 0.987  Test accuracy= 0.9856
Epoch:  15   =====> Loss= 0.021739657  Validation accuracy= 0.9864  Test accuracy= 0.9851
Epoch:  20   =====> Loss= 0.018219566  Validation accuracy= 0.9844  Test accuracy= 0.9851
Epoch:  25   =====> Loss= 0.014574020  Validation accuracy= 0.9878  Test accuracy= 0.9881
Epoch:  30   =====> Loss= 0.011885594  Validation accuracy= 0.9888  Test accuracy= 0.9892
Epoch:  35   =====> Loss= 0.009867116  Validation accuracy= 0.9864  Test accuracy= 0.988
Test Accuracy over 99% reached after 36 epochs
Training Finished!
Test accuracy: 0.9891
Elapsed time:  620.7855067253113 sec
