# TP Programming with Keras - MNIST problem

In this session, we build a neural network to solve the classification problem in the MNIST situation, by using the Tensorflow/Keras libraries. MNIST is a database with handwritten digits images.

We will proceed in three steps. Firstly, we develop a simple fully-connected neural network, with no specific features. Then, we apply regularization procedures and we use a validation set, to improve generalization performances. Finally, we implement a Convolutional Neural Network.

In this practice session, some cells must be filled according to the instructions. They are identified by the word **Exercise**. You will perform the **Verifications** yourselves in most cases, by watching if the algorithm correctly works and converges.

Below we import the required libraries.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow.keras as keras

## Data definition

The following cell loads the MNIST data

In [None]:
#DO NOT CHANGE

(X_train, Y_train), (X_test, Y_test) = keras.datasets.mnist.load_data()

**Exercise**: Visualize X_train data by using plt.imshow.

In [None]:
#TO DO

**Verification**: Digits images should appear.

**Exercise**: Print the input and output dimensions. Print also in a table format the first input example, and the first five output data.

In [None]:
#TO DO

### Formatting the input data

**Exercise**: You can see that input data consist in table with integers between 0 and 255. These numbers are relatively high to handle for a neural network: it can cause numerical instabilities by computing the gradients. In order to avoid these instabilities, we should use data with an order of magnitude close to 1.
To do so, we simply normalize the data by dividing by 255 (maximal pixel intensity). Complete the following cell to perform this normalization on X_train and X_test.

In [None]:
X_train = #TO DO

X_test = #TO DO

### Formatting the output data

**Exercise**: You can notice that the output data correspond to the digit on the image (an integer between 0 and 9). To define a classification problem, we must define binary classes, whose value is 0 or 1 (with the expected number of classes). For instance, the digit 5 will be encoded by the vector [0,0,0,0,0,1,0,0,0,0].

To do so, you can use the keras function "keras keras.utils.to_categorical", by indicating the vector to process and the number of classes (keyword "num_classes"). Apply this function for Y_train and Y_test.

In [None]:
Y_train_cat = #TO DO

Y_test_cat = #TO DO

## Simple Keras model

### Model creation

**Exercise**: Create a Keras model with name "my_model".

**Specific instructions**: 
- The first layer must be a Flatten layer, with input_shape corresponding to the input data dimension. This layers flattens the data so that they are represented by a 1D vector: a Multilayer Perceptron indeed takes classically vectors as input. There is no need to precise other variables than the input data.
- We perform an exclusive classification: there is only one digit on each image. The last layer must use a softmax activation function. Use the right number of neurons for the last layer.
- Use 2 ou 3 hidden layers, a hundred of neurons and ReLU activations functions. You are free for the global architecture!

In [None]:
#TO DO

**Exercise**: Display your architecture by calling my_model.summary()

In [None]:
#TO DO

### Model compilation

**Exercise**: Compile your model, you can choose your optimizer.

**Specific instructions**:
- For the loss function, the "categorical_crossentropy" is adapted for exclusive classification problems.
- The relevant metric is "categorical_accuracy"

In [None]:
#TO DO

### Training

**Exercise**: Run the training. Pay attention to use Y_train_cat as output data. Store the history of your training so that we can plot the evolution of the loss and the accuracy at the end of the training procedure.

In [None]:
#TO DO

**Verification**: The loss function should decrease and the accuracy should increase.

**Exercise**: Plot the evolution of the loss function, and the evolution of the accuracy.

In [None]:
#TO DO

### Predicting with your model

**Exercise**: Run the prediction on the test set

In [None]:
#TO DO

**Exercise**: Extract the predicted labels, that corresponds to the classes with highest predicted probabilities. The function np.argmax (with the right "axis") will be useful.

In [None]:
#TO DO

**Exercise**: Compute the accuracy on the test set.

In [None]:
#TO DO

The following cell displays randomly some of the test examples and the associated predictions.

In [None]:
r = np.random.randint(X_test.shape[0])

figure = plt.figure(figsize = (16,9))

ax1 = plt.subplot(121)
ax1.imshow(X_test[r,:,:],cmap = "hot")
plt.title("Network prediction: " + str(Y_test_pred_lab[r]) + "\nTrue value: " + str(Y_test[r]))

ax2 = plt.subplot(122)
ax2.bar(np.arange(10),height = Y_pred_test[r],tick_label = np.arange(10))
plt.xlabel("Class")
plt.ylabel("Network output")


**Exercise**: Adapt the previous code to displays random wrong predictions of the network. The function np.where should be useful.

In [None]:
#TO DO

## Keras model with regularization

We add more advanced functionalities, related to the regularization, batchnormalization, validation set, early stopping.

### Model creation with regularizers

**Exercise**: Create a Keras model with name "my_model".

**Specific instructions**: 
- The first layer must be a Flatten layer, with the right input_shape
- Pay attention to the last layer (number of neurons, activation function)
- Use a batchnormalization layer between each Dense layer: keras.layers.Batchnormalization. There is no need to indicate any argument for this function. Do not use it as output layer.
- Use a Dropout layer after each batchnormalization layer: keras.layers.Dropout. You must indicate the Dropout rate (between 0 and 1). Try a number of about 0.1 for instance. Do not use this type of layer for the last layer.
- For each Dense layer, add a regularization. Use the keyword "kernel_regularizer" for each Dense layer and indicate keras.regularizers.l2(1e-3). 1e-3 corresponds to the regularization parameter.

In [None]:
#TO DO

**Exercise**: Display your architecture by calling my_model.summary()

In [None]:
#TO DO

### Model compilation

**Exercise**: Compile your model and choose an optimizer. Use adapted loss function and metrics.

In [None]:
#TO DO

### Early stopping

**Exercise**: We use a validation set during the training (you will define it later in "my_model.fit"). We use an early stopping procedure regarding this validation set: we will stop the learning when the loss function does not decrease after a predefined number of epochs.

To do so:
- Define a variable "early_stopping" by using keras.callbacks.EarlyStopping(...)
- You must indicate which quantity is monitored for this early stopping. We use the validation loss: indicate the keyword "monitor" and use the string "val_loss".
- You must indicate how many successive epochs we watch the validation loss for this early stopping. Use the keyword "patience" and indicate the number of epochs to monitor. In our case, a dozen of epochs should be enough.
- Finally, you must indicate that the model shall restore the paramters that gave the best validation loss: use the keyword "restore_best_weights" and indicate True.

In [None]:
early_stopping = #TO DO

### Training

**Exercise**: Run the training. Store the history of your training so that we can plot the evolution of the loss and the accuracy at the end of the training procedure.

**Specific instructions**:
- You must indicate a validation set. You can give a predetermined validation set (X_val,Y_val) by using the keyword "validation_data", if you have indentified such a validation set. We could, for instance, use X_test and Y_test_cat. However, in our case, we keep this test set for the final tests. The other possibility is to divide X_train and Y_train_cat in two parts: one part for the training set and the other part for the validation set. To do so, you can use the keyword validation_split and indicate the fraction of the dataset that will be used for the validation. For instance, if you indicate 0.1, 10 % of the dataset will be randomly used for the validation set.
- You must also indicate the early stopping procedure: use the keywork "callbacks" and give a list that contains only the variable "early stopping" that we previously defined. We must indicate a list because several callbacks can be used.
- Use mini-batches: use the keyword "batch_size" and define a batch size (128 should be good).

In [None]:
#TO DO

**Verification**: The loss function should decrease and the accuracy should increase. Same thing for the validation loss.

**Exercise**: Plot the evolution of the loss function, and the evolution of the accuracy, for the training set and the validation set. The keyword to find the last ones in the history are "val_loss" and "val_accuracy".

In [None]:
loss_evolution = #TO DO
acc_evolution = #TO DO
val_loss_evolution = #TO DO
val_acc_evolution = #TO DO

#PLOT THE EVOLUTION

### Predicting with your model

**Exercise**: Run the prediction on the test set

In [None]:
#TO DO

**Exercise**: Extract the predicted labels, that corresponds to the classes with highest predicted probabilities. The function np.argmax (with the right "axis") will be useful.

In [None]:
#TO DO

**Exercise**: Compute the accuracy on the test set.

In [None]:
#TO DO

The following cell displays randomly some of the test examples and the associated predictions.

In [None]:
r = np.random.randint(X_test.shape[0])

figure = plt.figure(figsize = (16,9))

ax1 = plt.subplot(121)
ax1.imshow(X_test[r,:,:],cmap = "hot")
plt.title("Network prediction: " + str(Y_test_pred_lab[r]) + "\nTrue value: " + str(Y_test[r]))

ax2 = plt.subplot(122)
ax2.bar(np.arange(10),height = Y_pred_test[r],tick_label = np.arange(10))
plt.xlabel("Class")
plt.ylabel("Network output")


**Exercise**: Adapt the previous code to displays random wrong predictions of the network. The function np.where should be useful.

In [None]:
#TO DO

## Keras Convolutional model

We use now Convolutional Neural Network.

### New data format

**Exercise**: We will use 2D convolutional layers. This type of layer expects input with shape $n\times m \times c$ where $n$ and $m$ are the horizontal and vertical sizes of the image, and $c$ is the number of channels: thus we need 3 dimension in total. For instance, a RGD image consists in 2 channels ($c = 3$). In the MNIST case, the images are in grey-scale, thus, they only have one channel. However, the dimension of an example in X_train is (28,28), so 2D, and we must expand the number of dimension so that it becomes (28,28,1).

In the following cell, apply this transformation by using the function np.expand_dims, adding a dimension on the last axis (keyword axis = 3).

In [None]:
X_train = #TO DO

X_test = #TO DO

### Model creation with convolutional layers

**Exercise**: Create a Keras model with name "my_model".

**Specific instructions**:
- The first layers must be 2D convolutional layers: keras.layers.Conv2D. In argument, indicate the number of neurons (= number of filters). Some neurons should be enough, it is not necessary to use dozens of neurons. Then, you must indicate the filter size: some pixels should be enough. You can indicate the size such as (n,m) if you want rectangular filters, or only n if you want square filters. Finally, you can add an activation function ("relu" for instance).
- The first convolutional layer must contain the input_shape. Pay attention to the fact that we previously add a dimension.
- After each convolution layer, add a MaxPooling2D layers (keras.layers.MaxPooling2D) to reduce the size of the images. Indicate the size of the pooling (usually, we use 2 by default).
- Use only some convolutional layers (2 or 3 should be enough).
- After the convolutional part, flatten into a vector by using a Flatten layer (no argument)
- Finally, you can use Dense layers to complete the network, and the last layer must contain the right number of neurons and activation function.
- You can also add BatchNormalization or Dropout layer, or regularization if you want.

In [None]:
#TO DO

**Exercise**: Display your architecture by calling my_model.summary()

In [None]:
#TO DO

### Model compilation

**Exercise**: Compile your model and choose an optimizer. Use adapted loss function and metrics.

In [None]:
#TO DO

### Early stopping

**Exercise**: Define an early stopping procedure.

In [None]:
#TO DO

### Training

**Exercise**: Run the learning by using a validation set, mini-batches, early stopping... and store the learning history in a variable.

In [None]:
#TO DO

**Verification**: The loss function should decrease and the accuracy should increase. Same thing for the validation loss.

**Exercise**: Plot the evolution of the loss function, and the evolution of the accuracy, for the training set and the validation set. 

In [None]:
#TO DO

### Predicting with your model

**Exercise**: Run the prediction on the test set

In [None]:
#TO DO

**Exercise**: Extract the predicted labels, that corresponds to the classes with highest predicted probabilities. The function np.argmax (with the right "axis") will be useful.

In [None]:
#TO DO

**Exercise**: Compute the accuracy on the test set.

In [None]:
#TO DO

The following cell displays randomly some of the test examples and the associated predictions.

In [None]:
r = np.random.randint(X_test.shape[0])

figure = plt.figure(figsize = (16,9))

ax1 = plt.subplot(121)
ax1.imshow(X_test[r,:,:,0],cmap = "hot")
plt.title("Network prediction: " + str(Y_test_pred_lab[r]) + "\nTrue value: " + str(Y_test[r]))

ax2 = plt.subplot(122)
ax2.bar(np.arange(10),height = Y_pred_test[r],tick_label = np.arange(10))
plt.xlabel("Class")
plt.ylabel("Network output")


**Exercise**: Adapt the previous code to displays random wrong predictions of the network. The function np.where should be useful.

In [None]:
#TO DO