# Building A Multilayer Perceptron (from Scratch)
## Load in necessary packages
These are the packages that I will be using within this Neural Network model.

In [6]:
import numpy as np
import pandas as pd
import requests
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from numpy import array
from numpy import argmax
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
import seaborn as sns
%matplotlib inline



## Establish and modify input parameters
Modify the parameters to correlate with the input dataset.

In [7]:
# load the dataset
iris=pd.read_csv('IRIS.csv')
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [37]:
# Split Dataset up into feature and label vectors
X = iris.iloc[:,0:3]
y = iris.iloc[:,4]

# learning rate and initial lambda initialization
lr = .001
init_lam = .01

# modify the epochNum, or number of iterations that our network runs
epochNum = 100000

## Pre-Processing
Pre-process the data so that it can smoothly run through the Multilayer Perceptron.
Process includes:
1. Establishing numerical values to the label vector (y) using Sklearn's One Hot Encoder. The One Hot encoding provides a binarization of data so that the neural network does not assume a heirarchy between the classes.

2. Split the data into training and testing sets for both the feature vector and label vector.

3. Create total number variables for feature and label vectors in order to calculate random initial weights and biases.

4. Create initial weights and biases as a starting point for the neural network. We'll need a starting point for the weights and biases so that the forward propogation process can begin with these initial values. The values will later be optimized through the backpropogation process (that is where the learning takes place).

In [38]:
# Use Sci-Kit's one-hot encoder to transform the classes into numerical values
# to feed into our multilayer perceptron
numVal = array(y)

# integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(numVal)

# encode from integer to binary
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)

# splitting up training and testing data into appropriate vectors for initial 
# group
X_train, X_test, y_train, y_test = train_test_split(X, onehot_encoded)

# establish the number of labels and features
num_labels = y_train.shape[1]
num_features = X_train.shape[1]

# set size of hidden layer
hidden_nodes = 3 * len(X_train)

# establish the arrays of weights and biases
w1 = np.random.normal(0, 1, [num_features, hidden_nodes]) 
w2 = np.random.normal(0, 1, [hidden_nodes, num_labels]) 

b1 = np.zeros((1, hidden_nodes))
b2 = np.zeros((1, num_labels))

## Functions for the neural network
These functions include:

### * relu_activation -->**
I have chosen the relu activation function as it has a simple implementation and is often very accurate as it solves the vanishing gradient issue with other activation functions. It simply takes the max of 0 and the input vector x. (i.e. max(x, 0))

### * softmax -->
The relu activation function only works when used as an activation function on the hidden layers of a neural network, therefore I have chosen to use the softmax activation function for the output layer. The softmax function conveniently places the output values within the range [0, 1]. Since all the probabilities that will go towards the output layer will equal 1, the softmax activation function will choose the classification with the highest probability. Easy as that!

### * forward -->
The neural network will feed forward the input data. The data will be computed at each layer using f(x) = wx + b function as w = weight values, x is input values, and b is the bias of the layer. Then after that value is calculated at each layer, it will be put through the layer's associated activation function to keep moving forward through the neural network.

### * backprop -->
This is the step of the process where the neural network learns, as it identifies the error of the output nodes. This establishes a loss value and a gradient or slope for that value at each node. When the data is fed back through the network, each value goes through gradient descent where the local minimum of its specific gradient is calculated so that it can adjust new values to the parameters of the neural networks to minimize loss the next time data is fed through the network.

In [39]:
# effective activation function of my choosing
def relu_activation(vec):
    return np.maximum(vec, 0)


# returns a vector of output probabilities
def softmax(vec):
    # for softmax we compute input over number of choices
    input = np.exp(vec)
    # output is sum of all of those choices, K
    output = np.sum(input, axis = 1, keepdims = True)
    return input / output


def forward(softmax_vec, onehot_labels, lam, w1, w2):
  
    # first we calculate softmax cross-entropy loss (refer to formula)
    i = np.argmax(onehot_labels, axis = 1).astype(int)
  
    # since softmax output will be probability values (non-integer) we use function
    # arange() 
    predicted = softmax_vec[np.arange(len(softmax_vec)), i]
    logs = np.log(predicted)
    loss = -np.sum(logs) / len(logs)
    
    # second we add regularization to the loss in order to avoid overfitting
    w1_loss = 0.5 * lam * np.sum(w1 * w1)
    w2_loss = 0.5 * lam * np.sum(w2 * w2)
    return (loss + (w1_loss + w2_loss))
  


In [40]:
def backprop(w1, b1, w2, b2, lam, lr, output_vec, hidden_vec):
    output_error = (output_vec - y_train) / output_vec.shape[0]

    hidden_error = np.dot(output_error, w2.T) 
    hidden_error[hidden_vec <= 0] = 0

    gw2 = np.dot(hidden_vec.T, output_error)
    gb2 = np.sum(output_error, axis = 0, keepdims = True)

    gw1 = np.dot(X_train.T, hidden_error)
    gb1 = np.sum(hidden_error, axis = 0, keepdims = True)

    gw2 += lam * w2
    gw1 += lam * w1

    w1 -= lr * gw1
    b1 -= lr * gb1
    w2 -= lr * gw2
    b2 -= lr * gb2

## Running the network
Here, we establish the amount of iterations that we will feed the data through the network in order to train it and prepare it to make predictions on new data. epochNum in this case is the amount of iterations that we will initialize.

In each iteration, there are a few things that are happening:

1. The values at each layer are being established with respect to that specific layer's activation function.

2. The softmax activation function (in this case) will make a decision on the classificiation it believes is correct from choosing the classification with the highest probability (since the softmax outputs values between [0, 1].)

3. With these established vectors at each layer, we will first feed the data forward through forward propogation to get our output values

4. Then we will adjust our network's parameters in the process of backpropogation so that the network can classify more accurately in the future.

In [41]:
# since we need to return the object 'epoch' in this case we will use xrange()
# rather than range() function in python

for epoch in range(1,epochNum):
    # wx + b
    input = np.dot(X_train, w1) + b1
    hidden = relu_activation(input)
    output = np.dot(hidden, w2) + b2
    soft_output = softmax(output)

    forward(soft_output, y_train, init_lam, w1, w2)
    backprop(w1, b1, w2, b2, init_lam, lr, output, hidden)

  if sys.path[0] == '':


## It is time to test the network!

I have defined an eval() function that will put the network's prediction vector up against the actual classes of the data that it was fed. This function pretty much just calculates the correctness by dividing the correct predictions against the total number of rows of data the network evaluated, or total number of predictions it had.

In [36]:
# test

def eval(preds, y):
    ifcorrect =  np.argmax(preds, 1) == np.argmax(y, 1)
    correct_predictions = np.sum(ifcorrect)
    return correct_predictions * 100 / preds.shape[0]
  

input = np.dot(X_test, w1)
hidden = relu_activation(input + b1)
scores = np.dot(hidden, w2) + b2
probs = softmax(scores)
print('Accuracy of Multilayer Perceptron: {0}%'.format(eval(probs, y_test)))

Accuracy of Multilayer Perceptron: 94.73684210526316%
