# Multi-Layer Perceptron from the scratch

Nowadays, multi-Layer neural network has been proved to be a powerful tool in many data science problems. Though many existing packages have provided the interfaces to call this function (e.g. scikit-learn), it would be good to write some toy model by your own. Through this practice, you will gain some experience in software engineering. More importantly, you will understand the underlying mathmatics better and know how to fix the troubles when you run the code from the existing softwares. In the tutorial, we will continue to use the wine data and figure out how to write our own MLP classfier.

Let us start with the example in the previous lecture
```
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(4, 2), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)
```
<img src="img/MLP.jpeg" style="width: 800px;"/>
<center> Figure 1, the MLP model used in this lecture</center>

You should be able to understand most of the parameters at the moment. To realize a minimum version of MLP, we can try to implement the following parameters into our model:
- hidden_layer_sizes: to make life easier, let us just consider 2 hidden layer models
- max_iter: maximum number of iteractions
- learning_rate_init: 

Note that we will completely ignore the terms related to regularization

## Back propagation
$$\frac{\partial L}{\partial y} = y-Y$$
$$\frac{\partial y}{\partial f_3} = $$
$$\frac{\partial f_3}{\partial h_2} = $$

In [1]:
import numpy as np

class my_MLPClassifier(object):
    """
    Basic MultiLayer Perceptron (MLP) neural network.
    Args:
    hidden layer: []
    max_iterations: []
    """
    def __init__(self, hiddenlayers=[4,4], activation = 'sigmoid', max_iterations = 50, learning_rate = 0.01, 
                decay_rate = 0.99, loss_method='mse'):
        """
        :param hidden: number of hidden neurons
        :param iterations: how many epochs
        :param learning_rate: initial learning rate
        """
        self.hiddenlayers = hiddenlayers
        
        # initialize input parameters
        self.iterations = max_iterations
        self.learning_rate = learning_rate
        self.decay_rate = decay_rate
        self.n_hid1, self.n_hid2 = self.hiddenlayers[0], self.hiddenlayers[1]
        self.loss_method= loss_method
        
    def fit(self, X, y):
        """The neural network scheme is performed."""
        
        if type(X) is np.ndarray:
            self.X = X
        else:
            self.X = np.asarray(X)
            
        self.n_features = self.X.shape[1]
        
        # Initializing random weights
        self.weights = self.init_random_weights(self.hiddenlayers, self.n_features)
        
        # Forward
        for i in range(len(self.X)):
            X = np.hstack(([1.], self.X[i]))
            self.output = self.forward(X, self.weights, self.hiddenlayers)
        
        
        #for i in range(self.iterations):
        #    # forward function
        #    (h1, h2, y) = self.forward(input)
            
            # evaluate and print the loss function    
        #    loss = self.loss(y, target)
                
            # backpropagation
            #g_loss = self.grad_loss(y, target)
            #g_y 
            #gw3
            #gb3
            #gh2
            #gw2
            #gb2
            #gh1
            #gw1
            #gb1
            
            # updating the weight
        #    learning_rate = self.learning_rate * (self.decay_rate**i)
        #    w3 -= learning_rate*gw3
        #    b3 -= learning_rate*gb3
        #    w2 -= learning_rate*gw2
        #    b2 -= learning_rate*gb2
        #    w1 -= learning_rate*gw1
        #    b1 -= learning_rate*db1
        #self.weights = (w1, w2, w3)
        #self.bias = (b1, b2, b3)
        #loss = self.loss(y, target)
        #print('The final loss is {:12.4f} after {:4d} iterations'.format(self.loss, self.iterations))
    
            
    def forward(self, X, weights, hiddenlayers):
        """Perform neural network forward pass."""
        output = {}
        n_features = len(X)

        nn_structure = [n_features] + hiddenlayers + [1]
        nn_structure_len = len(nn_structure)

        output[0] = X

        for i in range(nn_structure_len-1):
            output[i+1] = np.dot(weights[i], output[i])

            if i < nn_structure_len-2:
                output[i+1] = np.hstack(([1.],output[i+1]))

        return output

    def activation(self, x, method='sigmoid'):
        """Activating the node"""
        
        allmethod = ('linear', 'sigmoid')
        
        if self.activation_method == 'linear':
            activation = self.identity(x)
        elif self.activation_method == 'sigmoid':
            activation = self.sigmoid(x)
        else:
            raise NotImplementedError(f"The {method} is not implemented. Try from {allmethod}")
            
        return activation
    
        
    def init_random_weights(self, hiddenlayers, feature_length, seed=None):
        """Initializing random weights"""

        rs = np.random.RandomState(seed=seed)

        # Initialized weights and bias
        weights = {}
        nn_structure = [feature_length] + hiddenlayers + [1] #[input, hiddenlayers, output]
        nn_structure_len = len(nn_structure)

        for l in range(nn_structure_len-1):
            epsilon = np.sqrt(6. / (nn_structure[l] + nn_structure[l+1]))
            norm_epsilon = 2. * epsilon
            
            # (nn_structure[l+1], nn_structure[l] + 1) the size of the weight; plus 1 is for bias
            weights[l] = rs.random_sample((nn_structure[l+1], nn_structure[l]+1)) * \
                         norm_epsilon - norm_epsilon / 2.

        return weights
    
    
    def sigmoid(self, x, derivative=False):
        """Compute the logistic sigmoid function."""
        
        sigmoid = 1 / (1 + np.exp(-x))
        
        if derivative:
            return sigmoid * (1. - sigmoid)
        else:
            return sigmoid
        
        
    def identity(self, x, derivative=False):
        """Compute the identity function"""
        
        if derivative:
            pass
        else:
            return x
        
    #################### Not implemented ##################
    def predict(self, X):
        """
        return predictions after training algorithm
        """
        (h1, h2, y) = self.forward(X)
        return y
    
    def grad_activation(self, x, method='sigmoid'):
        if self.activation_method=='sigmoid':
            return x*(1-x)
        elif self.activation_method=='log_loss':
            pass
        else:
            raise NotImplementedError
            
    def loss(self, y, target):
        if self.loss_method=='mse':
            return 0.5*np.sum(np.power(y-target,2))
        elif self.loss.method=='log_loss':
            pass
        else:
            raise Notimplementederror
    
    def grad_loss(self, y, target):
        if self.loss_method=='mse':
            return y-target
        elif self.loss_method=='log_loss':
            pass
        else:
            raise NotImplementedError

In [2]:
#Obtain and preprocess the data
from sklearn.datasets import load_wine
data=load_wine()
x, Y = data.data, data.target

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()  
scaler.fit(x)  
x0 = scaler.transform(x) 

In [3]:
mlp = my_MLPClassifier(hiddenlayers=[4,4])
mlp.fit(x0, Y)

In [4]:
mlp.output[3]

array([-0.46649148])