This notebook has been inspired by the book from Sebastian Raschka "Python Machine Learning".

## Multi Layer Perceptron

We are going in this notebook to code a multi layer perceptron from scratch. A multi layer perceptron is a sequence of matrix computation, each followed by a non- linear function. In this notebook we are going to use as non linear function the sigmoid - even if currently ReLu is a more used function. We are going to show how to code a simple neural network.



### Exercise 1 - Define a function that compute the sigmoid and plot the function

The sigmoid is defined as:

$$
\phi(z) = \frac{1}{1+e^{-z}}
$$

In [None]:
import numpy as np

def sigmoid(z):
    ## compute the sigmoid
    pass

### Exercise 2 - one hot function

Write a function that takes in input a list of values and returns the one_hot values of them.

In [None]:
def one_hot(y):
    pass

In [None]:
assert (one_hot([1,2]) == np.array([[1,0], [0., 1]]) ).all()
assert (one_hot([2,1]) == np.array([[0,1], [1., 0]]) ).all()
assert (one_hot([2, 1,2,3,1, 4]) == np.array([[0.,1.,0.,0.], [1,0.,0.,0.], [0.,1.,0., 0], [0, 0, 1, 0], [1.,0.,0.,0.], [0., 0., 0., 1.]])).all()

### Exercise 3 - Cost function - logistic cost function

We are going to use as cost function the logistic cost function (with no regularization). We are also going to normalize the logistic cost function with respect to the number of values we are considering, to be able to evaluate it on datasets of different lenghts.

$$
C =\frac{1}{n} \sum_{1}^{n}{-y^{i}*\log(f(x^{i})+(1-y^{i}) *\log(1-f(x^{i})}
$$

where

- n isthe number of datapoints
- $y^{i}$ is the label for the $i^{th}$ element
- f is the prediction function
- $x^{i}$ is the $i^{th}$ element of the dataset


In [None]:
def cost_function(y_label, prediction):
    pass



In [None]:
assert np.abs(cost_function(np.array([[1,0]]), np.array([[0.9,0.1]])) - 0.21072103131565256) < 0.000001
assert np.abs(cost_function(np.array([[1,0,], [0, 1]]), np.array([[0.999,0.1], [0.5, 0.5]])) - 0.7463276885556502) < 0.000001

###  Exercise 4 - Forward propagation 

In the following we are going to consider a multi-layer perceptron having only one hidden layer of a certain size. The non-linear function is going to be the sigmoid.

Let X be data. The algorithm to compute the forward pass is the following:


$
z_{h} = X*w_{h} + b_{h} \\
a_{h} = \phi(z_{h}) \\
z_{out} = z_{h}*w_{out} + b_{out} \\
a_{out} = \phi(z_out)
$

Function should return values for $z_{h}, a_{h}, z_{out}, a_{out}$.

In [None]:
def forward(X, w_h, b_h, w_out, b_out):
    pass

In [None]:
forward_X_test = [[1,0,1]]
forward_wh_test = [[0,1], [1,0], [1,1]]
forward_bh_test = [1,1]
forward_wout_test = [[1,4,5], [2,-1, -1]]
forward_bout_test = [1,2,-1]

t1, t2, t3, t4 = forward(forward_X_test, forward_wh_test,forward_bh_test, forward_wout_test, forward_bout_test)

assert (t1 == np.array([[2,3]])).all()
assert (t2 - np.array([[0.88079708, 0.95257413]]) < 0.00001).all()
assert (t3 - np.array([[3.78594533 ,4.57061419, 2.45141126]]) < 0.000001).all()
assert (t4 - np.array([[0.99987661, 0.99908895, 0.99752738]]) < 0.00001).all()

### Exercise 5 - Back propagation

We are going to need before being able to train a neural network, the back propagation step. In this step we are going to compute the error the model is doing with respect to the true label and backpropagate this information to every layer. Then we will update the weight of each layer with the information from the gradient.

We are going to compute:

$
\delta_{out} = a_{out} - y \\
\frac{\partial \phi(z_{h})}{\partial z_{h}} = a_{h}*(1-a_{h}) \\
\delta_{h} = (\delta_{out}*w_{out}^{T})*(a_{h}*(1-a_{h})) \\
\nabla(w_{h}) = X^{T}*\delta_{h} \\
\nabla(b_{h}) = \sum{\delta_{h}} \\
\nabla(w_{out}) = a_{h}^{T}*\delta_{out} \\
\nabla(b_{out}) = \sum{\delta_{out}}
$

It should return the values of

$
\nabla(w_{h}) \\
\nabla(b_{h}) \\
\nabla(w_{out}) \\
\nabla(b_{out})
$

In [None]:
def backpropagation(X, y, z_h, a_h, z_out, a_out, w_out):
    pass
    
    

In [None]:
g1, g2, g3, g4 = backpropagation(np.array(forward_X_test), [0,1,0], t1, t2, t3, t4, np.array(forward_wout_test))


assert np.abs((g1 - np.array([[0.58168091, 0.04721922],
 [0.  ,       0.        ],
 [0.58168091, 0.04721922]]))).sum() < 0.0001
assert np.abs((g2 - np.array([0.58168091, 0.04721922]))).sum() < 0.0001
assert np.abs((g3 - np.array([[ 0.86125738, -0.00902424,  0.81091868],
 [ 0.93144212, -0.00975964,  0.87700127]]))).sum() < 0.0001
assert np.abs((g4 - np.array([ 0.97781589, -0.01024554,  0.92066459]))).sum() < 0.0001


### Exercice 6 - Putting all together

We are going to create a class called MultiLayerPerceptron that takes as input in the init:

- the number of features of the dataset X $n_{features}$
- the number of hidden neurons $n_{hidden}$
- the number of output neurons $n_{output}$ (number of unique labels in target)

We are going to initiate as class variables:
- bias vector $b_{h}$ as a set of zeros of length $n_{hidden}$
- matrix $w_{h}$ as a random normal matrix with size $(n_{features}, n_{hidden})$
- bias vector $b_{out}$ as a set of zeros of length $n_{output}$
- matrix $w_{out}$ as a random normal matrix with size $(n_{hidden}, n_{output})$

Then we will integrate the two functions forward and backprop by using the class variables instead of having a static function. We are going to use both in the train method

In [None]:
import random


class MultiLayerPerceptron:
    def __init__(self, n_features, n_hidden, n_output):
        pass
        
    def forward(self, X):
        pass
   
    
    
    def backprop(self, X_batch, y_batch, z_h, a_h, z_out, a_out):
        pass

    def train(self, X_train, y_train, X_val, y_val, epochs, batch_size, learning_rate):
        epoch_loss_train = []
        epoch_loss_val = []
        batch_loss = []
        
        for idx_epoch in range(epochs):
            # iterate over minibatches
            indices = np.arange(X_train.shape[0])
            batch_num = 1
            for start_idx in range(0, indices.shape[0] - batch_size + 1, batch_size):
                batch_idx = #select the good indices

                X_batch = X_train[batch_idx]
                y_batch = y_train[batch_idx]
                
                # forward propagation
                z_h, a_h, z_out, a_out = None
                
                batch_cost = cost_function(y_batch, a_out)
                batch_num += 1
                
                batch_loss.append(batch_cost)
                grad_w_h, grad_b_h, grad_w_out, grad_b_out = None 
                
                # Weight updates - do not forget the learning rate!
                self.w_h
                self.b_h  
                self.w_out 
                self.b_out 
            
            _, _, _, a_out = self.forward(X_train)
            
            cost = cost_function(y_train, a_out)
            self.predict(X_val)
            
            _, _, _, a_out_val = self.forward(X_val)
            cost_val = cost_function(y_val, a_out_val)
            epoch_loss_train.append(cost)
            epoch_loss_val.append(cost_val)
            print(f'Epoch {idx_epoch}: train {cost} - eval {cost_val}')
        return epoch_loss_train, epoch_loss_val, batch_loss
    
    def predict(self, X):
        _, _, _, prediction = self.forward(X)
        return np.argmax(prediction, axis=1)
    
    def predict_proba(self, X):
        _, _, _, prediction = self.forward(X)
        return prediction

### Import the datasets

We are going to use the digits dataset from sklearn. We will build a train, validation and test set.

In [None]:
from sklearn.datasets import load_digits

In [None]:
digits = load_digits()

In [None]:
X = digits.data
y = one_hot(digits.target)

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.15)

### Exercice 7 - training a classifier

Create a new object nn with the correct dimension for values $n_{features}$ and $n_{output}$. You can choose the number of hidden neurons you want to use.

Call the *train* method with different epochs, batch_size and learning_rate.

In [None]:
nn_code = #instance class

In [None]:
epoch_value, epoch_loss_val, batch_value = #train model with given values

### Exercice 8 - compute accuracy metrics for train, test and valuation

### Exercice 9 - show examples of mislabelled images for training, valuation and test sets

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline



### Exercice 9 - plot training graphs

Plot a graph having epoch loss for train and validation sets. 

Add to the plot a graph having epoch loss for batch loss. 

# Using pytorch

By using pytorch documentation, replicate what has been done from scratch.

Change the definition of the network to see what works better.

In [None]:
import torch
from torch import nn

In [None]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(64, 40),
            nn.ReLU(),
            nn.Linear(40,10),
            nn.ReLU(),
            nn.Linear(10,10)
            )
        
    def forward(self, x):
        logits = self.model(x)
        return logits
    
       
    def train(self, X_train, y_train, X_val, y_val, epochs, batch_size, learning_rate):
        epoch_loss_train = []
        epoch_loss_val = []
        batch_loss = []
        criterion = #use CrossEntropyLoss
        optimizer = #choose an optim  
        for idx_epoch in range(epochs):
            # iterate over minibatches
            indices = np.arange(X_train.shape[0])
            batch_num = 1
            for start_idx in range(0, indices.shape[0] - batch_size + 1, batch_size):
                batch_idx = indices[start_idx:start_idx + batch_size]
                
                optimizer.zero_grad()

                X_batch = torch.from_numpy(X_train[batch_idx])
                y_batch = torch.from_numpy(y_train[batch_idx])

                outputs = #compute the output
                loss = # compute the loss
                
                # compute the loss and optimize the code
                
                batch_loss.append(loss.item())
            
            # compute the loss for train and val
            pred_train = 
            loss_train =
            
            pred_val = 
            loss_val = 
            print(f'Loss: {loss.item()} - {loss_val.item()}')
            epoch_loss_train.append(loss_train.item() )
            epoch_loss_val.append(loss_val.item() )

        return epoch_loss_train, epoch_loss_val, batch_loss
    
    def compute_accuracy(self, X_values, y_values):
        pass

In [None]:
torch_nn = NeuralNetwork()

In [None]:
loss_train, loss_val, batch_loss = # train the model

### Exercice 10 

Evaluate the accuracy for train, evaluation and test set. Add a method compute_accuracy to the class NeuralNetwork

### Exercice 11

Plot the graphs of training for both training and evaluation set.

### Exercice 12 

Find the set of images that are wrongly classified by both models.

### Exercice 13 - Use a convolutional neuronal network instead of a linear network (optional)