**Backpropagation Algorithm**

This notebook illustrates the backprop algorithm for a relatively simple dataset. The network architecture is as follows:

1. $z_1 = w_1x$
2. $h_1 = tanh(z_1)$
3. $z_2 = w_2h_1+b$
4. $h_2 =tanh(z_2)$
5. $C = \frac{1}{2}(y-h_2)^2$

Now the gradients needed would be for the following parameters:

1. $w_1$
2. $w_2$
3. $b$

Expressions for gradients of weight terms:

$\frac{\partial(C)}{\partial(w_1)} = -((1-tanh^2(z_1))x_1w_2(1-tanh^2(z_2))(y-h_2))$

$\frac{\partial(C)}{\partial(w_2)} = -((1-tanh^2(z_2))(y-h_2)h_1)$

$\frac{\partial(C)}{\partial(b)} = -((1-tanh^2(z_2))(y-h_2))$

In [1]:
import pandas as pd
import numpy as np

In [2]:
data = pd.read_csv("../nn_data.csv")

In [3]:
X = data['x'].values
y = data['y'].values

In [4]:
import random

In [17]:
class NN():
    def __init__(self):
        self.w1 = random.random()
        self.b = random.random()
        self.w2 = random.random()
        self.lr = 0.01
        
    def tanh(self,z):
        return np.tanh(z)
    
    def forward(self,x):## x-> scalar, one data point
        self.z1 = self.w1*x
        self.h1 = self.tanh(self.z1)
        self.z2 = (self.w2*self.h1)+self.b
        self.h2 = self.tanh(self.z2)
        return self.h2
    
    def backward(self,x,y):
        error = (y-self.h2)
        grad_act_l2 = (1-self.tanh(self.z2)**2)
        grad_act_l1 = (1-self.tanh(self.z1)**2)
        
        grad_w1 = -1*(grad_act_l1)*x*self.w2*(grad_act_l2)*error
        grad_w2 = -1*(grad_act_l2)*self.h1*error
        grad_b = -1*(grad_act_l2)*error
        
        ## Apply Gradient Descent
        
        self.w1 = self.w1 - self.lr*grad_w1
        self.w2 = self.w2 - self.lr*grad_w2
        self.b = self.b - self.lr*grad_b


In [18]:
network = NN()

In [19]:
print(f"The w1 is {network.w1}, w2 is {network.w2} and b is {network.b}, before the training")

The w1 is 0.7779787982788374, w2 is 0.18373682364514954 and b is 0.5715411087422217, before the training


In [20]:
epochs = 10
for epoch in range(epochs):
    for (x,actual) in zip(X,y):
        pred = network.forward(x)
        loss = 0.5*(actual-pred)**2
        network.backward(x,actual)
    print(f"Loss after epoch: {epoch+1} is: {loss}")
print(f"The w1 is {network.w1}, w2 is {network.w2} and b is {network.b}, after the training")   

Loss after epoch: 1 is: 3886.34591195102
Loss after epoch: 2 is: 3885.433514366245
Loss after epoch: 3 is: 3885.184652229801
Loss after epoch: 4 is: 3885.069240852518
Loss after epoch: 5 is: 3885.0028096916367
Loss after epoch: 6 is: 3884.9596847555867
Loss after epoch: 7 is: 3884.929441890661
Loss after epoch: 8 is: 3884.907056443429
Loss after epoch: 9 is: 3884.889811112882
Loss after epoch: 10 is: 3884.876109246789
The w1 is 2.2497160597691, w2 is 3.45268228836755 and b is -0.11987448809306131, after the training
