In this Jupyter notebook I followed the tutorial of this <a href="https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6">**Towards Data Science article**</a> from James Loy

# Creating a simple two layer Neuronal Network from scratch

In [1]:
import random
import numpy as np

The network has an **input layer x**, <br> a certain number of **hidden layers**,<br> and an **output layer y**.<br> Between each layer there are a lot of **weights W** and **biases b**.<br> In this example we use the **Sigmoid function** as **activation function**.

<img src="images/2_layer_model.png" style="width: 50%;">

In [2]:
# Activation function
def sigmoid(t):
    return (1 / (1 + np.exp(-t)))

In [3]:
def sigmoid_derivative(x):
    return (x * (1.0 - x))

In [4]:
class NeuralNetwork():
    def __init__(self, x, y):
        self.input      = x
        self.weights1   = np.random.rand(self.input.shape[1],4) 
        self.weights2   = np.random.rand(4,1)                 
        self.y          = y
        self.output     = np.zeros(self.y.shape)     

# Training the Neuronal Network

The output y of the 2-layer Neuronal Network will look like this

<img src="images/2_layer_NN.png">

The right values for the **weights** and **biases** determines the strength of the predictions. The process of fine-tuning
fine tuning the weights and biases from the imput data is known as **training the Neuronal Network** <br><br>
Each iteration of the process consists of the following steps: 
<ul>
    <li>Calculating the predicted output y, known as <b>feedforward</b></li>
    <li>Updating the weights and biasesm known as <b>backpropagation</b></li>
</ul>
The sequential graph below illustrates the process

<img src="images/2_layer_seq.png">

# Feedforward

As we've seen in the sequential graph above, feedforward is just simple calculus and for a basic 2-layer neuronal network, the output of the network is:

<img src="images/2_layer_NN.png">

Let's add feedforward to the python code. Note that for simplicity the **biases** are assumed to be 0

In [5]:
def feedforward(self):
    self.layer1 = sigmoid(np.dot(self.input, self.weights1))
    self.output = sigmoid(np.dot(self.layer1, self.weights2))
NeuralNetwork.feedforward = feedforward

However, we still need a way to evaluate the "godness" of our predctions (i.e. how far off are our predictions)? The **Loss Function** allows us to do exactly that.

# Loss Function

There are many available loss functions, and the nature of our problem should dictate our choice of loss function. In this tutorial, we'll use a simple **sum-of-squares error** as our loss function

<img src="images/2_layer_loss.png">

That is, the sum-of-squares error is simply the sum of the difference between each predicted value and the actual value. The difference is squared so that we measure the absolute value of the difference.

**Our goal in training is to find the best set of weights and biases that minimizes the loss function.**

# Backpropagation

Now that we’ve measured the error of our prediction (loss), we need to find a way to **propagate** the error back, and to update our weights and biases.

In order to know the appropriate amount to adjust the weights and biases by, we need to know the **derivative of the loss function with respect to the weights and biases**.

Recall from calculus that the derivative of a function is simply the slope of the function.

<img src="images/2_layer_grad.png">

If we have the derivative, we can simply update the weights and biases by increasing/reducing with it(refer to the diagram above). This is known as **gradient descent**.

However, we cant't directly calculate the derivative of the loss function with respect to the weights and biases because the equation of the loss function does not contain weights and biases. Therefore, we need the **chain rule** to help us calculate it.

<img src="images/2_layer_chain.png">

Phew! That was ugly but it allows us to get what we needed — the derivative (slope) of the loss function with respect to the weights, so that we can adjust the weights accordingly.

Now that we have that, let's add the backpropagation function into our python code

In [6]:
def backprop(self):
    # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
    d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))
    d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)))

    # update the weights with the derivative (slope) of the loss function
    self.weights1 += d_weights1
    self.weights2 += d_weights2
NeuralNetwork.backprop = backprop

# Putting it all together

Now that we have our complete python code for doing feedforward and backpropagation, let’s apply our Neural Network on an example and see how well it does.

<img src="images/2_layer_data.png"/>

Our Neural Network should learn the ideal set of weights to represent this function. Note that it isn’t exactly trivial for us to work out the weights just by inspection alone.

Let's train the network for 1500 iterations and see what happens.

In [72]:
if __name__ == "__main__":
    X = np.array([[0,0,1],
                  [0,1,1],
                  [1,0,1],
                  [1,1,1]])
    y = np.array([[0],[1],[1],[0]])
    nn = NeuralNetwork(X,y)

    for i in range(1500):
        nn.feedforward()
        nn.backprop()

    print(nn.output)

[[0.01552409]
 [0.97284378]
 [0.96953441]
 [0.03489688]]


We did it! Our feedforward and backpropagation algorithm trained the Neural Network successfully and the predictions converged on the true values.

Note that there’s a slight difference between the predictions and the actual values. This is desirable, as it prevents **overfitting** and allows the Neural Network to **generalize** better to unseen data.