# Build a neural network using Math concepts

Neural Networks consist of the following components

- An input layer, x<br>
- An arbitrary amount of hidden layers<br>
- An output layer, ŷ<br>
- A set of weights and biases between each layer, W and b<br>
- A choice of activation function for each hidden layer, σ. In this tutorial, we’ll use a Sigmoid activation function<br>

In [1]:
#support both Python 2 and Python 3 with minimal overhead.
from __future__ import absolute_import, division, print_function

In [2]:
# to avoid warnings
import warnings
warnings.filterwarnings('ignore')

In [3]:
import numpy as np

In [4]:
class NeuralNetwork:
    def __init__(self, x, y):
        self.input      = x
        self.weights1   = np.random.rand(self.input.shape[1],4) 
        self.weights2   = np.random.rand(4,1)                 
        self.y          = y
        self.output     = np.zeros(y.shape)

the right values for the weights and biases determines the strength of the predictions. The process of fine-tuning the weights and biases from the input data is known as *training the Neural Network.*

For a basic 2-layer neural network, the output of the Neural Network is:

- **ŷ= σ ( W2σ (W1x+b1) + b2 )**

# Feedforward and Backpropagation

Each iteration of the training process consists of the following steps:

Calculating the predicted output ŷ, known as feedforward<br>
Updating the weights and biases, known as backpropagation

In [5]:
 def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1))
        self.output = sigmoid(np.dot(self.layer1, self.weights2))

we still need a way to evaluate the “goodness” of our predictions (i.e. how far off are our predictions)? The Loss Function allows us to do exactly that.

### Loss Function

sum-of-sqaures error in this case

goal in training is to find the best set of weights and biases that minimizes the loss function.

# Backpropagation

Since we’ve measured the error of our prediction (loss), we need to find a way to propagate the error back, and to update our weights and biases.

### gradient descent

If we have the **derivative**, we can simply update the weights and biases by increasing/reducing with it(refer to the diagram above). This is known as gradient descent.

However, we can’t directly calculate the derivative of the loss function with respect to the weights and biases because the equation of the loss function does not contain the weights and biases. Therefore, we need the chain rule to help us calculate it.

In [6]:
 def backprop(self):
        # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
        d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))
        d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)))

        # update the weights with the derivative (slope) of the loss function
        self.weights1 += d_weights1
        self.weights2 += d_weights2

# Application of Math and Theory

In [7]:
x = np.array([[0,0,1],
            [0,1,1],
            [1,0,1],
            [1,1,1]])
                
y = np.array([[0],
            [1],
            [1],
            [0]])

### Build neural network model

In [8]:
num_epochs = 60000

#initialize weights
syn0 = 2*np.random.random((3,4)) - 1 # making a mtrix of 3*4 as this is shape of our input i.e x
syn1 = 2*np.random.random((4,1)) - 1 # making a matrix of 4*1 as it is shape of output i.e y

In [9]:
syn0

array([[-0.10262622, -0.83194771, -0.07990365,  0.69883122],
       [ 0.5636327 , -0.78283847,  0.74266398, -0.30325949],
       [-0.23039543, -0.37640469,  0.46069511, -0.21332079]])

In [10]:
syn1

array([[-0.94999879],
       [-0.41754329],
       [ 0.16706992],
       [-0.38379534]])

### Declaring sigmoid as activation function

In [11]:
def nonlin(x,deriv=False):
    if(deriv==True):
        return x*(1-x)

    return 1/(1+np.exp(-x))

### Train Model

In [12]:
for j in range(num_epochs):
    #feed forward through layers 0,1, and 2
    k0 = x
    k1 = nonlin(np.dot(k0, syn0))
    k2 = nonlin(np.dot(k1, syn1))
    
    
    #how much did we miss the target value? Knowing error/loss 
    k2_error = y - k2
    if (j% 10000) == 0:
        print ("Error:" + str(np.mean(np.abs(k2_error))))
    
    
    
    #in what direction is the target value .............Error weighted Derivative aka Gradiant Desent 
    k2_delta = k2_error*nonlin(k2, deriv=True)
    #how much did each k1 value contribute to k2 error i.e Back Propagation
    k1_error = k2_delta.dot(syn1.T)
    
    k1_delta= k1_error * nonlin(k1,deriv=True)
    # Updated  weight
    syn1 += k1.T.dot(k2_delta)
    syn0 += k0.T.dot(k1_delta)

Error:0.4986761259416044
Error:0.010140148913514226
Error:0.006895863224123339
Error:0.0055328582224920845
Error:0.004740819954242397
Error:0.0042089283566478595
