# Multilayer Perceptron

This is an implementation of a multilayer perceptron. back-propagation is used to fit the model paramaters

I build a model to address the famous kaggle titanic problem, a binary classification problem. Individuals must be predicted as having survived or not survived the titanic disaster.

### import modules

In [1]:
import pandas as pd
import numpy as np
from tqdm import tqdm
from sklearn.metrics import f1_score, accuracy_score
from numpy.random import randint

%matplotlib inline
pd.set_option('max.rows', None)

### read data

In [2]:
data = pd.read_csv('train.csv')

In [3]:
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [4]:
feats = data[['Pclass', 'Age', 'SibSp', 'Parch', 'Fare']]

### impute Age NA values with mean  

In [5]:
feats.Age = feats.Age.fillna(data.Age.mean())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


In [6]:
feats.head()

Unnamed: 0,Pclass,Age,SibSp,Parch,Fare
0,3,22.0,1,0,7.25
1,1,38.0,1,0,71.2833
2,3,26.0,0,0,7.925
3,1,35.0,1,0,53.1
4,3,35.0,0,0,8.05


### defining variables

I will be implementing the following artificial neural network (ANN) architecture.

The ANN will:

* contain 1 hidden layer
* contain 6 units in the hidde layer including the bias unit
* output will come from a single node
* all units will utilise teh sigmoid (logistic) activation function



<img src="img/mlp.png" style="height:300px">

credit: https://scikit-learn.org/stable/_images/multilayerperceptron_network.png

**definitions**
<br>
each input into the $ith$ layer is as follows
$$a^i \in \mathbb{R}^{n}$$
<br>
<br>
the weights for teh $ith$ layer are a matrix
$$w^i \in \mathbb{R}^{mxn}$$
<br>
<br>
The linear combination of variables for the $ith$ layer are 
$$z^i = w^{i}a^{i-1}$$
<br>
<br>
$\sigma$ is the logistic function:
$$y = \sigma(z)$$
where
$$\sigma(z) = \frac{1}{1+e^{-z}}$$
<br>
<br>
***
**1st layer**
<br>
the features are an n dimensional vector:
$$a^0 \in \mathbb{R}^{n}$$
<br>

***
**2st layer**

<br>
the hidden layer is a n dimensional vector:
$$a^1 \in \mathbb{R}^{n}$$

where n is equal to the number of units including a bias unit
    
<br>
z is a linear combination of the weights and feature values such that:


$$z^1 = w^{1}a^0$$

$$a^1 = \sigma(z^1)$$

***
**output layer**

$$a^2 = \sigma(z^1)$$



    

    


### functions that need to be made

**general**
* can be classes
* need to use the above notation with correct superscript
<br>

**functions**
* function for making linear combinations (z) that takes in 1 matrix and 1 vector
* function for calculating logistic function output that takes in a vector



### function for making linear combinations from 2 matrices

In [59]:
multi = multiLayerPerceptron(alpha=5, totalIterations=5)

In [60]:
multi.linearCombination(a=a_1, w=w_1)

In [61]:
multi.sigmoid_activation()

In [62]:
multi.linear_combination_output

array([[0.214],
       [0.234],
       [0.276]])

In [68]:
1/(1+np.exp(-0.276))

0.568565299077705

In [63]:
multi.sigmoid_activation_output

array([[0.55329676],
       [0.55823452],
       [0.5685653 ]])

In [56]:
a_1 = np.array([[0.2,0.22]])
a_1


array([[0.2 , 0.22]])

In [57]:
w_1 = np.array([[0.3,0.4,0.5],[0.7,0.7,0.8]])
w_1

array([[0.3, 0.4, 0.5],
       [0.7, 0.7, 0.8]])

In [58]:
class multiLayerPerceptron:
    def __init__(self, alpha, totalIterations):
        self.alpha = alpha
        self.totalIterations = totalIterations
        
    def linearCombination(self, a, w):
        self.a = a
        self.w = w
        
        '''
        takes in values of units from previous layer (a) and weights(w)
        a must have dim = (1, m) and w must have dim = (m, l)
        calculates linear combination of variables
        outputs vector of dim = (1,l)
        '''
        
        linear_combination = np.dot(a,w).T
        self.linear_combination_output = linear_combination
        
    def sigmoid_activation(self):
        sigmoid_activation_output = 1 / (1+np.exp(-self.linear_combination_output))
        self.sigmoid_activation_output = sigmoid_activation_output
                                         
        
        
        
    def sigmoid(self, feature_vector, weights_sig):
        self.feature_vector = feature_vector
        self.weights_sig = weights_sig
    
        '''
        takes in vector of feature values and vector of weights and computes output from logistic function

        '''

        # calculate 'z'
        linear_combination = np.dot(self.feature_vector, self.weights)

        # input 'z' into logistic function
        function_output = 1 / (1+np.exp(-linear_combination))

        return(function_output)
    
        
    def fit(self, features, y):
        self.features = features
        self.y = y
        
        '''
        fits logistic regression model to data with batch gradient descent

        features: pandas dataframe containing features
        y: pandas series containing labels
        alpha: learning rate
        totalIterations: number of iterations of batch gradient descent

        '''

        X = np.array(self.features.T)
        X = np.insert(arr = X, values = np.ones(X.shape[1]), obj = 0, axis = 0)
        Y = np.array([self.y]).T
    ################################################################## initialise lists to store loss and cost function values   
        loss_function_values = []
        cost_function_values =[]
    ################################################################## initialise dictionaries 


    ################################################################## set up arrays
        row_number = self.features.shape[1]+1
        old_params = np.zeros((row_number, 1))
        new_params = np.zeros((row_number, 1))
        dw = np.zeros((row_number, 1))


    ################################################################## set up arrays


    ################################################################## loop through data 
        for counter in tqdm(range(self.totalIterations)):

            #reset dw to zeros
            dw = np.zeros(self.features.shape[1]+1)
            #update old paramaters with new paramaters defined from previous iteration
            old_params = new_params.copy()
            new_params = np.zeros(self.features.shape[1]+1)
            #Create vector Z which holds linear combinations of features for all observations
            Z = np.dot(old_params.T, X)
            #print(f'dimensions of w is {old_params.shape}')
            #print(f'dimensions of X is {X.shape}')
            #print(f'dimensions of Z is {Z.shape}')
            #create vector A which holds outputs from logistic function for all linear combinations of features in Z
            A = 1 / (1+np.exp(-Z.T))
            #print(f'dimensions of A is {A.shape}')
            #create vector A containing all errors
            #print(f'dimensions of Y is {Y.shape}')
            E = A - Y

    ################################################################## update dw
            #record all average dw values for all features
            #print(f'dimensions of E is {E.shape}')
            ### got to here
            dw = np.dot(E.T,X.T).T
            #print(f'dimensions of dw is {dw.shape}')
            average_dw = dw/X.shape[1]    
            #print(f'average_dw is {average_dw}')
    ################################################################## update dw


    ################################################################## record loss function

            loss_function_outputs = -(Y*(np.log(A))+((1-Y)*np.log(1-A)))
            #print(f'dimension of loss_function_outputs is{loss_function_outputs.shape}')
            cost_function_output = sum(loss_function_outputs)/X.shape[1]
            #print(f'cost_function_output value is {cost_function_output}')
            cost_function_values.append(cost_function_output[0])
    ################################################################## record loss function  


    ################################################################## update feature weights
            #print(f'dimensions of average_dw is {average_dw.shape}')
            #print(f'dimensions of old_params is {old_params.shape} before')
            new_params = old_params-self.alpha*(average_dw)
            #print(f'dimensions of new_params is {new_params.shape} after')
    ################################################################## update feature weights
        self.cost_function_values = cost_function_values
        self.fitted_weights = new_params        
    
    def predict(self, data, weights, x_cols, sensitivity = 0.5):
    
        '''
        predicts survival using fitted logistic regression model
        '''
        self.data = data
        self.weights = weights
        self.x_cols = x_cols
        self.sensitivity = sensitivity

        results = []
        X = np.array(self.data[self.x_cols].T)
        X = np.insert(arr = X, values = np.ones(X.shape[1]), obj = 0, axis = 0)
        
        #print(f'dimensions of self.weights.T are {self.weights.T.shape}')
        #print(f'dimensions of X are {X.shape}')
        linear_combination = np.dot(self.weights.T, X)
        
        #print(f'the dimensions of linear-combinations.T are {linear_combination.T.shape}')

        function_output = 1 / (1+np.exp(-linear_combination))
        function_output_list = list(function_output[0])
        for i in function_output_list:
            if i >= self.sensitivity:
                results.append(1)
            else:
                results.append(0)
        

        self.results = results
        
    def plot_cost_function(self):
        
        '''
        plots cost function values as function of batch gradient descent iterations 
        '''
        
        values_for_plotting = pd.Series(self.cost_function_values) 
        values_for_plotting.plot()
        


