# Neural Network With Back Propogation

# Content 
1. Introduction
2. Error back-propagation algorithm theory
3. Database description
4. Back-propagation Neural Network implementation
5. Conclusion
***
## 1. Introduction

The Back-Propagation Neural Network is a feed-forward network with a quite simple arhitecture. The arhitecture of the network consists of an input layer, one or more hidden layers and an output layer. This type of network can distinguish data that is not linearly separable. We use [error back-propagation](https://en.wikipedia.org/wiki/Backpropagation) algorithm to tune the network iterative. 
***
 ## 2. Error back-propagation algorithm theory
 
The error back-propagation algorithm consists of two big steps:
1. Feeding forward the input from the database to the input layer than to the hidden layers and finally to the output layer.
2. Calculating the output error and feeding it backwards tuning the network's variables.
***
## 3. Database description

In this example we are going to use Duke Breast Cancer database that consists of [86] entries and [7129] attributes plus the class attribute that is located on the first column.
The data is numerical and has no missing values.
***
## 4. Back-propagation Neural Network implementation

First of all, we need to load the database.

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split

db = np.loadtxt("duke-breast-cancer.txt")
print("Database raw shape (%s,%s)" % np.shape(db))

Now we have to shuffle it and then split it into training 90%  and testing 10% so that the network can train itslef better. If needed you can also normalize the database.

In [None]:
np.random.shuffle(db)
y = db[:, 0]
x = np.delete(db, [0], axis=1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1)
print(np.shape(x_train),np.shape(x_test))

Now we have  to create the hidden layer vector, weight's matrix,  output layer vector and the hidden weight's matrix. We choose the hidden layer to be made of a number of [72] hidden perceptrons. The output layer needs to have a number of perceptrons equal to the number of classes.The weight's matrix will have the following shape: lines = number of the database attributes, cols = number of hidden layer perceptrons and the hidden weight's matrix will have the following shape: lines = hidden layer length, cols = number of output layer perceptrons.

In [None]:
hidden_layer = np.zeros(72)
weights = np.random.random((len(x[0]), 72))
output_layer = np.zeros(2)
hidden_weights = np.random.random((72, 2))


To continue we need to implement: 
1. Sum function
2. Activation function
3. SoftMax function
4. Recalculate Weights function
5. Back-propagation function

## Sum function
s_i is the sum for [i]th perceptron from the layer.
![sum.png](https://image.ibb.co/i3EM27/sum.png)

In [None]:
def sum_function(weights, index_locked_col, x):
    result = 0
    for i in range(0, len(x)):
        result += x[i] * weights[i][index_locked_col]
    return result

## Activation function
g(s_i) is the activation for the [i]th perceptron from the layer.
![image.png](https://image.ibb.co/eTfYFS/g.png)

In [None]:
def activate_layer(layer, weights, x):
    for i in range(0, len(layer)):
        layer[i] = 1.7159 * np.tanh(2.0 * sum_function(weights, i, x) / 3.0)

## SoftMax function
[The softmax function, or normalized exponential function, is a generalization of the logistic function that "squashes" a K-dimensional vector z of arbitrary real values to a K-dimensional vector σ ( z ) of real values in the range (0, 1) that add up to 1.](https://en.wikipedia.org/wiki/Softmax_function)
![image.png](https://image.ibb.co/dKsr27/sm.png)

In [None]:
def soft_max(layer):
    soft_max_output_layer = np.zeros(len(layer))
    for i in range(0, len(layer)):
        denominator = 0
        for j in range(0, len(layer)):
            denominator += np.exp(layer[j] - np.max(layer))
        soft_max_output_layer[i] = np.exp(layer[i] - np.max(layer)) / denominator
    return soft_max_output_layer

## Recalculate weights function
Here we tune the network weights and hidden weights matrix. We are going to use this inside the back propagation function.
![image.png](https://image.ibb.co/moBepn/w.png)

In [None]:
def recalculate_weights(learning_rate, weights, gradient, activation):
    for i in range(0, len(weights)):
        for j in range(0, len(weights[i])):
            weights[i][j] = (learning_rate * gradient[j] * activation[i]) + weights[i][j]

## Back-propagation function
In this function we find out the output layer gradient and the hidden layer gradient to recalculate the network weights.
Output gradient formula
![image.png](https://image.ibb.co/eJ9qUn/go.png)
Hidden gradient formula
![image.png](https://image.ibb.co/mYQ3h7/gh.png)

In [None]:
def back_propagation(hidden_layer, output_layer, one_hot_encoding, learning_rate, x):
    output_derivative = np.zeros(2)
    output_gradient = np.zeros(2)
    for i in range(0, len(output_layer)):
        output_derivative[i] = (1.0 - output_layer[i]) * output_layer[i]
    for i in range(0, len(output_layer)):
        output_gradient[i] = output_derivative[i] * (one_hot_encoding[i] - output_layer[i])
    hidden_derivative = np.zeros(72)
    hidden_gradient = np.zeros(72)
    for i in range(0, len(hidden_layer)):
        hidden_derivative[i] = (1.0 - hidden_layer[i]) * (1.0 + hidden_layer[i])
    for i in range(0, len(hidden_layer)):
        sum_ = 0
        for j in range(0, len(output_gradient)):
            sum_ += output_gradient[j] * hidden_weights[i][j]
        hidden_gradient[i] = sum_ * hidden_derivative[i]
    recalculate_weights(learning_rate, hidden_weights, output_gradient, hidden_layer)
    recalculate_weights(learning_rate, weights, hidden_gradient, x)

Next we can [one hot encode](https://www.quora.com/What-is-one-hot-encoding-and-when-is-it-used-in-data-science) our output and start training our network iterative.

In [None]:
one_hot_encoding = np.zeros((2,2))
for i in range(0, len(one_hot_encoding)):
    one_hot_encoding[i][i] = 1
training_correct_answers = 0
for i in range(0, len(x_train)):
    activate_layer(hidden_layer, weights, x_train[i])
    activate_layer(output_layer, hidden_weights, hidden_layer)
    output_layer = soft_max(output_layer)
    training_correct_answers += 1 if y_train[i] == np.argmax(output_layer) else 0
    back_propagation(hidden_layer, output_layer, one_hot_encoding[int(y_train[i])], -1, x_train[i])
print("MLP Correct answers while learning: %s / %s (Accuracy = %s) on %s database." % (training_correct_answers, len(x_train), 
                                                                                       training_correct_answers/len(x_train),"Duke breast cancer"))

The accuracy of the test depends on the random generated weight's matrix and the learning rate. Using different learning rates and weight's will generate a different accuracy.

In [None]:
testing_correct_answers = 0
for i in range(0, len(x_test)):
    activate_layer(hidden_layer, weights, x_test[i])
    activate_layer(output_layer, hidden_weights, hidden_layer)
    output_layer = soft_max(output_layer)
    testing_correct_answers += 1 if y_test[i] == np.argmax(output_layer) else 0
print("MLP Correct answers while testing: %s / %s (Accuracy = %s) on %s database" % (testing_correct_answers, len(x_test),
                                                                                     testing_correct_answers/len(x_test), "Duke breast cancer"))

On this testing set the accuracy can go up to even 100%  with the right amount of hidden perceptrons in the hidden layer. In this example we used a learning rate of [-1] with a total of [72] hidden perceptrons in the hidden layer.

***
## 5.  Conclusion
In this test we have shown that the back-propagation neural network performs well on large sets of data. The performance can be improved by changing the number of hidden neurons and the learning rate.
Because of its iterative training and gradient based training the general speed is far slower than required, so it takes a large amount of time to train on a very large set of data. We cannot say that there is a perfect network for every kind of database out there. So keep testing your data on multiple neural networks and see what fits the best.

I hope this notebook helped you to begin your journey into machine learning and big data world.