# Neural Networks - A Practical Introduction
by _Minho Menezes_  

---

## Neural Networks - Learning

In this second notebook, we build the intelligent algorithms that will learn the optimal set of weights for the Neural Network task. This is the Supervisioned Learning approach, and it is fundamental in Machine Learning.

* [1. Evaluating Performance](#1.-Evaluating-Performance)  
* [2. Backpropagation](#2.-Backpropagation)  
* [3. Gradient Descent Training](#3.-Gradient-Descent-Training)  
* [4. Training a MLP for Binary Classification](#4.-Training-a-MLP-for-Binary-Classification)  
* [5. Training a MLP for Multiclass Classification](#5.-Training-a-MLP-for-Multiclass-Classification)  

---

### Libraries

In [None]:
## LIBRARIES ##
import numpy as np                         # Library for Numerical and Matricial Operations
import matplotlib.pyplot as plt            # Library for Generating Visualizations
import pandas as pd                        # Library for Handling Datasets
from tools.tools import Tools as tl        # Library for some Utilitary Tools

### Neural Network Class

In [None]:
## CLASS: Multilayer Perceptron ##
class MultilayerPerceptron:
    
    # CLASS CONSTRUCTOR
    def __init__(self, n_neurons=[2, 5, 1]):
        if(len(n_neurons) < 2):
            raise ValueError("The network must have at least two layers! (The input and the output layers)")
        
        # Network Architecture
        self.hidden_layers = len(n_neurons)-2
        self.n_neurons = n_neurons
        self.W = []
        
        # Adjusting the Network architecture
        for i in range(1, len(n_neurons)):
            self.W.append( np.random.randn(self.n_neurons[i-1]+1 , self.n_neurons[i]) )
        
    # ACTIVATION FUNCTION
    def activate(self,Z):
        return 1 / (1 + np.exp(-Z))
    
    # FORWARD PROPAGATION
    def forward(self, X):
        # Activation List
        A = []
        
        # Input Layer Activation
        A.append( np.vstack([np.ones([1, X.shape[1]]), X]) )
        
        # Hidden Layer Activation
        for i in range(0, self.hidden_layers):
            Z = np.matmul(self.W[i].T, A[-1])
            Z = self.activate(Z)
            
            A.append( np.vstack([np.ones([1, Z.shape[1]]), Z]) )
        
        # Output Layer Activation
        Z = np.matmul(self.W[-1].T, A[-1])
        Z = self.activate(Z)

        A.append(Z)
        
        return A
    
    # CLASSIFICATION PREDICTION
    def predict(self, X):
        A = self.forward(X)
        
        if(self.n_neurons[-1] > 1):
            return A[-1].argmax(axis=0)
        else:
            return (A[-1] > 0.5).astype(int)
    
    # LOSS FUNCTION
    def loss(self, y, y_hat):
        pass
    
    # ACCURACY FUNCTION
    def accuracy(self, y, y_hat):
        pass
    
    # BACKPROPAGATION
    def backpropagate(self, A, y):
        pass
    
    # GRADIENT DESCENT TRAINING
    def train(self, X_train, y_train, alpha=1e-3, maxIt=50000, tol=1e-5, verbose=False):
        pass
        
## ---------------------------- ##

---
### 1. Evaluating Performance

One of the first things to be defined in any Supervisioned Learning approach is the evaluation and error metrics used to tell the model how it performed, and allow it to correct his parameters.

An important metric is the **Loss Function**, that direcly be used in the training. In our case, we will use the function known as _Cross-Entropy Loss_:

$$
    \mathcal{L}(W) = -\frac{1}{m} \sum y\ log( \hat{y} ) + (1-y)\ log(1 - \hat{y} )
$$

While it is mathematically useful for the training, its values are hard to interpret. So, a more human-like performance metric consists in the **Accuracy Function**, that can be calculated as:

$$
    \text{Acc}(W) = -\frac{100}{m} \sum ( y = \hat{y} )
$$

Implement the two evaluation metrics below:

In [None]:
def loss(self, y, y_hat):
    # YOUR CODE HERE

def accuracy(self, y, y_hat):
    # YOUR CODE HERE

MultilayerPerceptron.loss = loss
MultilayerPerceptron.accuracy = accuracy

Now, use these metrics to evaluate the performance of classifiction for the examples in the matrix $X$:

In [None]:
X = np.array([[ 5,  1, -2, -1],
              [ 4,  2,  0,  4],
              [ 3,  3,  1,  4],
              [ 2,  4, -1, -3]])    

y = np.array([[1, 1, 0, 0]])

# YOUR CODE HERE

### 2. Backpropagation

The **Backpropagation** algorithm is one of the most popular and powerful techniques in traning multilayer models.

This algorithm allows the misclassification error to be distributed to the entire network, responsabilizing each neuron individually for its contribution to the faults. The algorithm work as follows.

The error in the output layer is directly the difference between the real value of the class for each sample subtracted by the probabilities calculated by the network. For all the subsequent layers, the error in each neuron is equal to:

$$
    e_i^{(l)} = \left( e_1^{(l+1)} W_1^{(l)} + e_2^{(l+1)} W_2^{(l)} + \cdots + e_n^{(l+1)} W_n^{(l)} \right) \cfrac{d \varphi(S_{i_\text{net}}^{(l)})}{dW}
$$

Where the derivative is equal to:

$$
    \cfrac{d \varphi(S_{i_\text{net}}^{(l)})}{dW} = A_i^{(l)}(1-A_i^{(l)})
$$

Using matrix multiplication, the backpropagation of the error between two layers is:

$$
    \mathbf{E}^{(l)} = \mathbf{W}^{(l)} \mathbf{E}^{(l+1)} \mathbf{A}^{(l)} (1 - \mathbf{A}^{(l)})
$$

Implement the backpropagation method in the cell below:

In [None]:
def backpropagate(self, A, y):
    # YOUR CODE HERE

MultilayerPerceptron.backpropagate = backpropagate

Now, experiment the _backpropagation()_ method in the example below:

In [None]:
X = np.array([[ 5,  1, -2, -1],
              [ 4,  2,  0,  4],
              [ 3,  3,  1,  4],
              [ 2,  4, -1, -3]])    

y = np.array([[1, 1, 0, 0]])

# YOUR CODE HERE

### 3. Gradient Descent Training

The **Gradient Descent** is the most popular iterative algorithm to optimize loss functions.

The algorithm works as follows:

Given a Multilayer Neural Network with $L$ layers, where $A^{(l)}$ is the activation matrix in each layer $l$, and $E^{(l)}$ is the error matrix for each layer $l$:

1. Calculate the activation $\mathbf{A}^{(l)}$ and the errors $\mathbf{E}^{(l)}$ using $\mathbf{W}^{(i)}$;  

2. Evaluate the current performance using the _Loss Function_ and the _Accuracy Metric_;
   
3. Update the network weights:

    $$ 
        \mathbf{W}^{(i+1)} = \mathbf{W}^{(i+1)} - \alpha \nabla \mathcal{L}(\mathbf{W}^{(i)})
    $$
    
    where $\alpha$ is a scaling factor (mostly between 0 and 1) called _Learning Rate_, and $\nabla \mathcal{L}(\mathbf{W}^{(i)})$ represents the gradient of the _cost function_, that can be calculated as:
    
    $$ 
        \nabla \mathcal{L}(\mathbf{W}^{(i)}) = (A^{(l)}) E^{(l)^T}
    $$  <br>
    
4. Print the training results at each 50 epochs  <br>

5. Check for convergence by comparing the decrease in the _Loss Function_; <br>

6. If the training did not converged, go back to Step 1. <br>

In [None]:
def train(self, X_train, y_train, alpha=1e-3, maxIt=50000, tol=1e-5, verbose=False):
    # YOUR CODE HERE

MultilayerPerceptron.train = train

### 4. Training a MLP for Binary Classification

In [None]:
## CARREGANDO E VISUALIZANDO OS DADOS ##
X_train, X_test, y_train, y_test = tl.loadData("data/toy_data_01.csv")
tl.plotData(X_train, y_train)

In [None]:
# YOUR CODE HERE

In [None]:
# YOUR CODE HERE

### 5. Training a MLP for Multiclass Classification

In [None]:
## CARREGANDO E VISUALIZANDO OS DADOS ##
X_train, X_test, y_train, y_test = tl.loadData("data/toy_data_02.csv")
tl.plotData(X_train, y_train)

In [None]:
# YOUR CODE HERE

In [None]:
# YOUR CODE HERE