<a href="https://colab.research.google.com/github/claudiaqw/deep-learning/blob/main/deeplearning_mnist_cnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab assignment: classifying digits with Convolutional Networks

<img src="https://albarji-labs-materials.s3-eu-west-1.amazonaws.com/lenet.png" style="width:900px;">

In this assignment we come back to the the problem of recognizing handwritten digits, this time using Convolutional Neural Networks. We will see how this architecture allows us to attain higher accuracy rates.

## Guidelines

Throughout this notebook you will find empty cells that you will need to fill with your own code. Follow the instructions in the notebook and pay special attention to the following symbols.

<img src="https://albarji-labs-materials.s3-eu-west-1.amazonaws.com/question.png" height="80" width="80" style="float: right;"/>

***

<font color=#ad3e26>
You will need to solve a question by writing your own code or answer in the cell immediately below or in a different file, as instructed.</font>

***

<img src="https://albarji-labs-materials.s3-eu-west-1.amazonaws.com/exclamation.png" height="80" width="80" style="float: right;"/>

***
<font color=#2655ad>
This is a hint or useful observation that can help you solve this assignment. You should pay attention to these hints to better understand the assignment.
</font>

***

<img src="https://albarji-labs-materials.s3-eu-west-1.amazonaws.com/pro.png" height="80" width="80" style="float: right;"/>

***
<font color=#259b4c>
This is an advanced exercise that can help you gain a deeper knowledge into the topic. Good luck!</font>

***

To avoid missing packages and compatibility issues you should run this notebook under one of the [recommended Deep Learning environment files](https://github.com/albarji/teaching-environments-deeplearning), or make use of [Google Colaboratory](https://colab.research.google.com/). If you use Colaboratory make sure to [activate GPU support](https://colab.research.google.com/notebooks/gpu.ipynb).

Lastly, if you need any help on the usage of a Python function you can place the writing cursor over its name and press Shift+Tab to produce a pop-out with related documentation. This will only work inside code cells. 

Let's go!

## Data loading

<img src="https://albarji-labs-materials.s3-eu-west-1.amazonaws.com/question.png" height="80" width="80" style="float: right;"/>

***

<font color=#ad3e26>
Load and prepared the data as you did in the previous notebook. Make sure to normalize the pixel values, and encode the outputs a one-hot vectors. You <b>don't need to reshape the data</b> to 1-dimensional vectors, the Convolutional Network will take care of that.
</font>

***

In [None]:
####### INSERT YOUR CODE HERE
from keras.datasets import mnist
from keras.utils import np_utils
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train_norm = X_train.astype('float32') / 255
X_test_norm = X_test.astype('float32') / 255
Y_train = np_utils.to_categorical(y_train, 10) # We have 10 classes to codify
Y_test = np_utils.to_categorical(y_test, 10)

In what follows this notebook assumes you have loaded your training images as **X_train_norm**, training labels as **Y_train**, test images as **X_test_norm** and test labels as **Y_test**.

## Keras imports

We will need the following keras classes, which you already used in the previous notebook

In [None]:
from keras.models import Sequential
from keras.layers.core import Dense
from keras.layers.core import Dropout

## Convolutional Neural Networks

To further improve on this image recognition problem we need network layers that do consider the data as images, and take into account closeness of pixels to make decisions instead of just throwing all pixel data into a fully connected network and expect intelligence to emerge from chaos. **Convolutional** and **Pooling** layers are the best way to do so.

### Formatting the data as tensors

While for the perceptrons in the previous notebook we vectorized the data to fit into the perceptron framework, for convolutional networks we will need to shape the data in the form of a **4-dimensional tensor**. The dimensions of such tensor represent the following:
* Image index (e.g. 3th image in the dataset)
* Row index
* Column index
* Channel index (e.g. colour channel in colored images)
Our data currently has the following shape:

In [None]:
X_train_norm.shape

(60000, 28, 28)

So, once again we will need to make use of the reshape function to transformation the data to appropriate shape. We have 60000 images in our training set, and those images have 28 rows x 28 columns. Since these images are grayscale, the channel dimension only contains one channel:

In [None]:
traintensor = X_train_norm.reshape(60000, 28, 28, 1)
traintensor.shape

(60000, 28, 28, 1)

Now the data is correctly shaped.

<img src="https://albarji-labs-materials.s3-eu-west-1.amazonaws.com/question.png" height="80" width="80" style="float: right;"/>

***

<font color=#ad3e26>
Repeat the transformation for the test data. Save the resulting tensor in a variable named <b>testtensor</b>.
</font>

***

In [None]:
####### INSERT YOUR CODE HERE
testtensor = X_test_norm.reshape(10000, 28, 28, 1)

### Convolution and pooling layers

When defining a convolutional network, Convolution and Pooling layers work together. The most popular way of using these layers is in the following pattern:
* A Convolution layer with rectified linear activations
* A Pooling layer

We can thus define a minimal convolutional network as

In [None]:
from keras.layers.convolutional import Convolution2D, MaxPooling2D

img_rows = 28
img_cols = 28
kernel_size = 3 # Size of the kernel for the convolution layers
pool_size = 2 # Size of the pooling region for the pooling layers

convnet = Sequential()

convnet.add(Convolution2D(
    32, # Number convolution channels to generate
    (kernel_size, kernel_size), # Size of convolution kernels
    padding='valid', # Strategy to deal with borders
    input_shape=(img_rows, img_cols, 1), # Size = image rows x image columns x channels
    activation="relu"  # Activation function after the convolution
)) 
convnet.add(MaxPooling2D(pool_size=(pool_size, pool_size)))

There is an issue, though: at some point we need to transform the tensor data into a vector, as the output of the network should be a vector of 10 values, representing class probabilities. We can do this by using a **Flatten** layer. Then we can add a standard Dense layer to produce the outputs:

In [None]:
from keras.layers.core import Flatten
convnet.add(Flatten())
convnet.add(Dense(10, activation="softmax"))

Let's take a look at the network we just defined

In [None]:
convnet.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
flatten (Flatten)            (None, 5408)              0         
_________________________________________________________________
dense (Dense)                (None, 10)                54090     
Total params: 54,410
Trainable params: 54,410
Non-trainable params: 0
_________________________________________________________________


<img src="https://albarji-labs-materials.s3-eu-west-1.amazonaws.com/question.png" height="80" width="80" style="float: right;"/>

***

<font color=#ad3e26>
Compile the defined network, choosing "adam" as the optimization algorithm, and train it with the data. Use the reshaped tensor data you prepared above, not the original data. Also, use a batch size of 128 and 20 training epochs. Then measure the accuracy over the test data. Have the Convolution and MaxPooling helped?
</font>

***

In [None]:
####### INSERT YOUR CODE HERE
convnet.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
convnet.fit(
    traintensor, # Training data
    Y_train, # Labels of training data
    batch_size=128, # Batch size for the optimizer algorithm
    epochs=20, # Number of epochs to run the optimizer algorithm
    verbose=2 # Level of verbosity of the log messages
)
score = convnet.evaluate(testtensor, Y_test)
print("Test loss", score[0])
print("Test accuracy", score[1])

Epoch 1/20
469/469 - 2s - loss: 0.3522 - accuracy: 0.9041
Epoch 2/20
469/469 - 2s - loss: 0.1252 - accuracy: 0.9653
Epoch 3/20
469/469 - 2s - loss: 0.0854 - accuracy: 0.9763
Epoch 4/20
469/469 - 2s - loss: 0.0698 - accuracy: 0.9798
Epoch 5/20
469/469 - 2s - loss: 0.0599 - accuracy: 0.9833
Epoch 6/20
469/469 - 2s - loss: 0.0522 - accuracy: 0.9854
Epoch 7/20
469/469 - 2s - loss: 0.0475 - accuracy: 0.9867
Epoch 8/20
469/469 - 2s - loss: 0.0439 - accuracy: 0.9869
Epoch 9/20
469/469 - 3s - loss: 0.0398 - accuracy: 0.9883
Epoch 10/20
469/469 - 3s - loss: 0.0367 - accuracy: 0.9892
Epoch 11/20
469/469 - 3s - loss: 0.0332 - accuracy: 0.9902
Epoch 12/20
469/469 - 2s - loss: 0.0307 - accuracy: 0.9911
Epoch 13/20
469/469 - 2s - loss: 0.0288 - accuracy: 0.9916
Epoch 14/20
469/469 - 2s - loss: 0.0262 - accuracy: 0.9925
Epoch 15/20
469/469 - 2s - loss: 0.0236 - accuracy: 0.9932
Epoch 16/20
469/469 - 2s - loss: 0.0224 - accuracy: 0.9935
Epoch 17/20
469/469 - 2s - loss: 0.0203 - accuracy: 0.9944
Epoch 

<img src="https://albarji-labs-materials.s3-eu-west-1.amazonaws.com/question.png" height="80" width="80" style="float: right;"/>

***

<font color=#ad3e26>
Build and train a convolutional network with the following layers:
<ul>
     <li>A Convolution layer of 32 channels, kernel size 3 and rectified linear activation</li>
     <li>Another Convolution layer of 32 channels, kernel size 3 and rectified linear activation</li>
     <li>A MaxPooling layer of size 2</li>
     <li>A 25% Dropout</li>
     <li>A Flatten layer</li>
     <li>A Dense layer with 128 units and rectified linear activation</li>
     <li>A 50% Dropout</li>
     <li>An output Dense layer with softmax activation</li>
</ul>
Has the added complexity improved the accuracy results?    
</font>

***

In [None]:
####### INSERT YOUR CODE HERE
img_rows = 28
img_cols = 28
kernel_size = 3 # Size of the kernel for the convolution layers
pool_size = 2 # Size of the pooling region for the pooling layers

large_convnet = Sequential()

large_convnet.add(Convolution2D(32, # Number convolution channels to generate
                        (kernel_size, kernel_size),
                        padding='valid',
                        input_shape=(img_rows, img_cols, 1),
                        activation="relu"))
large_convnet.add(Convolution2D(32, (kernel_size, kernel_size), activation="relu"))
large_convnet.add(MaxPooling2D(pool_size=(pool_size, pool_size)))
large_convnet.add(Flatten())
large_convnet.add(Dense(128, activation="relu"))
large_convnet.add(Dropout(0.5))
large_convnet.add(Dense(10, activation="softmax"))

large_convnet.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
large_convnet.fit(
    traintensor, # Training data
    Y_train, # Labels of training data
    batch_size=128, # Batch size for the optimizer algorithm
    epochs=20, # Number of epochs to run the optimizer algorithm
    verbose=1 # Level of verbosity of the log messages
)
score = large_convnet.evaluate(testtensor, Y_test)
print("Test loss", score[0])
print("Test accuracy", score[1])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Test loss 0.038704849779605865
Test accuracy 0.9912999868392944


## LeNet

<a href=http://yann.lecun.com/exdb/lenet/>LeNet</a> is a particular convolutional neural network definition that has proven to be quite effective for this problem. As a final exercise we will build a network similar to LeNet and try it on our digits problem.

<img src="https://albarji-labs-materials.s3-eu-west-1.amazonaws.com/question.png" height="80" width="80" style="float: right;"/>

***

<font color=#ad3e26>
Build and train the following network:
<ul>
     <li>A Convolution layer of 32 channels, kernel size 5 and rectified linear activation</li>
     <li>A MaxPooling layer of size 2</li>
     <li>A Convolution layer of 50 channels, kernel size 5 and rectified linear activation</li>
     <li>A MaxPooling layer of size 2</li>
     <li>A Flatten layer</li>
     <li>A Dense layer with 256 units and rectified linear activation</li>
     <li>A 50% Dropout</li>
     <li>An output Dense layer with softmax activation</li>
</ul>
Is this the best network so far for the problem?   
</font>

***

In [None]:
####### INSERT YOUR CODE HERE
img_rows = 28
img_cols = 28

lenet = Sequential()

lenet.add(Convolution2D(
    32,
    (5, 5),
    padding='valid',
    input_shape=(img_rows, img_cols, 1),
    activation="relu"
))
lenet.add(MaxPooling2D(pool_size=2, strides=2))
lenet.add(Convolution2D(50, (5, 5), activation="relu"))
lenet.add(MaxPooling2D(pool_size=2, strides=2))
lenet.add(Flatten())
lenet.add(Dense(256, activation="relu"))
lenet.add(Dropout(0.5))
lenet.add(Dense(10, activation="softmax"))

lenet.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
lenet.fit(
    traintensor, # Training data
    Y_train, # Labels of training data
    batch_size=128, # Batch size for the optimizer algorithm
    epochs=20, # Number of epochs to run the optimizer algorithm
    verbose=1 # Level of verbosity of the log messages
)
score = lenet.evaluate(testtensor, Y_test)
print("Test loss", score[0])
print("Test accuracy", score[1])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Test loss 0.02574528194963932
Test accuracy 0.9932000041007996


## Bonus rounds

<img src="https://albarji-labs-materials.s3-eu-west-1.amazonaws.com/pro.png" height="80" width="80" style="float: right;"/>

***

<font color=#259b4c>
Rebuild the network above with a larger number of training epochs. What is the best test error you can achieve? 
</font>

***