# 1. Implementing LeNet-5 - Reading the paper

We're going to implement one of the first successful architectures in Convolutional Neural Networks, applied to handwritten digit recognition.

This model is called LeNet-5 and was introduced in the paper ["Gradient-Based Learning Applied To Document Recognition"](http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf) in the year 1998 by Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner.

Try to read and understand the cited paper. You don't need to read it all (it is ~46 pages long) but at least try reading up to page 7, where the CNN architecture is described.

You should be able to use Keras library to build the model defining all the layers that composed it. Note that "Subsampling" layers are approximately equivalent to MaxPooling layers in this context. Also, the "Gaussian" or "RBF" final layer can be replaced by a Softmax in your implementation.

Use the [MNIST dataset](https://keras.io/api/datasets/mnist/) also provided by Keras to train and validate your model.

Try to answer:
1. How many convolutional layers the model has?
2. Which kind of data pre-processing is the paper using?
3. Which activation functions are used?
4. What is the reported model accuracy in the paper?


## 2. LeNet-5 implementation

To help you in the implementation, you can refer to this diagram of the model. Note that the input images for the MNIST dataset are of size `28x28` instead of the `32x32` indicated in the original paper.

You can start by defining a [Sequential model](https://keras.io/guides/sequential_model/) model in terms of [Keras layers](https://keras.io/api/layers/). Note that you'll mostly use [Convolutional](https://keras.io/api/layers/convolution_layers/convolution2d/), [MaxPooling](https://keras.io/api/layers/pooling_layers/max_pooling2d/) and [Dense](https://keras.io/api/layers/core_layers/dense/) layers. We also recommend using an [InputLayer layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/InputLayer) to specify the size of the input images.



![lenet.png](https://raw.githubusercontent.com/MostafaGazar/mobile-ml/master/files/lenet.png)


In [None]:
from tensorflow import keras



model = keras.Sequential([
# YOUR IMPLEMENTATION HERE (START)

# YOUR IMPLEMENTATION HERE (END)
])


print(model.summary())

# 3. Data loading

You can load the data for the MNIST dataset with a single keras function: `tf.keras.datasets.mnist.load_data`. Note that Keras Conv2D layers expect inputs with shape `NxHxWxC` where `N` is the number of samples, `HxW` is the size of the images`C` is the number of channels. However, by default the MNIST dataset is loaded as a `NxHxW` matrix: `60000x28x28` for the training images and `10000x28x28` for the test images. Since the images are grayscale, there is only one channel and therefore the corresponding singleton dimension was ommited. **You** will have to [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) the  matrices so that they are of size `Nx28x28x1` to fit the size expected by Keras or you'll get an error when trying to train the model.

You should also normalize the pixel values so that the mean of all pixels of all training images is 0 and the standard deviation is 1. Note that in this case since we only have a single channel, this is simpler, but for multichannel input images you will have to normalize each channel separately.

In [None]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# YOUR IMPLEMENTATION HERE (START)


# YOUR IMPLEMENTATION HERE (END)

# 4. Training the model

CNNs are trained just the same as any NN model; you'll need to `compile` the model with an optimizer, a loss function, and some metrics to monitor the performance of the model.

Since this is a classification problem, you'll use a `categorical_crossentropy` loss, but check the encoding of the outputs to decide if it is `sparse` (labels) or dense (one-hot encoding).


Afterwards, you can just call the `fit` function as usual. Training for 30 epochs with a batch size of 128 should suffice to obtain a reasonably good accuracy on the test set (~97%). Make sure you include the test data as `validation_data` in the fit function so that we can compare the training and testing accuracies.

In [None]:

# (1) Compile and (2) train (fit) the model with SGD, using cross entropy
# YOUR IMPLEMENTATION HERE (START)


# YOUR IMPLEMENTATION HERE (END)

# use the `history` object returned by `fit` to plot the evolution of the accuracy and loss during training.
import matplotlib.pyplot as plt
def plot_history(history):
  plt.figure()
  plt.plot(history.history['accuracy'], label='Train accuracy')
  plt.plot(history.history['val_accuracy'], label = 'Test accuracy')
  plt.xlabel('Epoch')
  plt.ylabel('Accuracy')
  plt.ylim([0, 1])
  plt.legend(loc='lower right')
  plt.figure()

  plt.plot(history.history['loss'], label='Train loss')
  plt.plot(history.history['val_loss'], label = 'Test loss')
  plt.xlabel('Epoch')
  plt.ylabel('Loss')
  plt.legend(loc='lower right')

plot_history(history)