# Multi Layer Percepron (MLP) with Scikit Learn

## Libraries

In [26]:
from sklearn.metrics import classification_report
from sklearn.datasets import fetch_openml
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt

## MLP

<div class="alert alert-block alert-info">
    
The **multi layer perceptron (MLP)** is feedforward neural network composed of successive layers (cf. Figure below).

<img src="files/figures/MLP.jpg" width="600px"/>
 
The dynamics of an MLP is given by the following equations (sample and batch versions):

$$
\begin{array}{ll}
\textbf{sample $\boldsymbol{x}$} & \textbf{batch $\boldsymbol{X_i}$} \\
\begin{cases}
\boldsymbol{a^{[0]}} ~=~ \boldsymbol{x} & \\
\boldsymbol{z^{[l]}} ~=~ \boldsymbol{W^{[l]}} \boldsymbol{a^{[l-1]}} + \boldsymbol{b^{[l]}}, & l = 1, \dots, L \\
\boldsymbol{a^{[l]}} ~=~ \boldsymbol{\sigma} \left( \boldsymbol{z^{[l]}} \right), & l = 1, \dots, L
\end{cases}
~&~
\begin{cases}
\boldsymbol{A^{[0]}} ~=~ \boldsymbol{X_i}	\\
\boldsymbol{Z^{[l]}} ~=~ \boldsymbol{W^{[l]}} \boldsymbol{A^{[l-1]}} \oplus \boldsymbol{b^{[l]}}, & l = 1, \dots, L \\
\boldsymbol{A^{[l]}} ~=~ \boldsymbol{\sigma} \big( \boldsymbol{Z^{[l]}} \big), & l = 1, \dots, L
\end{cases}
\end{array}
$$

</div>

At this point, we still don't know how to train a neural network properly.<br>
But we will use the `sklearn` library that does this for us...

The **MNIST dataset** consists of handwritten digits. The MNIST classification problem consists in predicting the correct digit represented on an image.

<img src="files/figures/mnist.png" width="600px"/>

- Load the MNIST dataset from from https://www.openml.org/d/554 using the following commands:

```
X, y = fetch_openml("mnist_784", version=1, return_X_y=True, as_frame=False)
X = X / 255.0  # rescale pixel values {0,...,255} -> [0, 1]
```

- Split data into train and test sets (80% train/ 20% test)^

- Define a **multi-layer perceptron (MLP)** with the following parameters:
    - 2 hidden layers of size 128 neurons<br>
    `hidden_layer_sizes = (128, 128)`
    - 40 epochs<br>
    `max_iter = 40`
    - as the solver, use a stochastic gradient descent (SGD)<br>
    `solver = "sgd"`
- Train your model on the train set.
- Get the scores of your model the train and test sets.
- Get the test predictions and labels and compute the classification report.

    Check the documentation:<br>
https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier

**I'm very surprised by the training speed... Results are excellent!**

- Using `plt.matshow()`, visualize the weights of each layer of your model.
- **Note:** the weight of layer `i` is a numpy arrax given by `your_model.coefs_[0]`.