## Backpropagation

**Backpropagation Algorithm**

This notebook illustrates the backprop algorithm for a relatively simple dataset. The network architecture is as follows:

1. $z_1 = w_1x$
2. $a_1 = tanh(z_1)$
3. $z_2 = w_2a_1+b$
4. $y =tanh(z_2)$
5. $C = \frac{1}{2}(y-t)^2$

Now the gradients needed would be for the following parameters:

1. $w_1$
2. $w_2$
3. $b$


The following gradient results will be needed:
- $$\frac{\partial(C)}{\partial(w_1)}=\frac{\partial z_1}{\partial w_1}*\frac{\partial f_1}{\partial z_1}*\frac{\partial z_2}{\partial f_1}*\frac{\partial C}{\partial z_2}= x*(1-tanh^2(z_1))*w_2*(tanh(z_2)-t)(1-tanh^2(z_2))$$

- $$\frac{\partial C}{\partial w_2} = \frac{\partial z_2}{\partial w_2}*\frac{\partial C}{\partial z_2} = tanh(z_1)*(tanh(z_2)-t)(1-tanh^2(z_2)$$

- $$\frac{\partial C}{\partial b} = \frac{\partial z2}{\partial b}*\frac{\partial C}{\partial z_2} = 1*(tanh(z_2)-t)(1-tanh^2(z_2)$$

In [1]:
import pandas as pd
data = pd.read_csv("./data/nn_data.csv")

In [2]:
x = data['x'].values
y = data['y'].values

In [3]:
import math
class NN():
    def __init__(self):
        self.w1 = 1
        self.w2 = 1
        self.b = 1
        self.eta = 0.01
    def forward(self,x):
        self.z1 = self.w1*x
        self.a1 = math.tanh(self.z1)
        self.z2 = self.w1*self.a1+self.b
        y = math.tanh(self.z2)
        return y 
    def backward(self,x,y):
        grad_w1 = x*(1-math.tanh(self.z1)**2)*self.w2*(math.tanh(self.z2)-y)*(1-math.tanh(self.z2)**2)
        grad_w2 = math.tanh(self.z1)*(math.tanh(self.z2)-y)*(1-math.tanh(self.z2)**2)
        grad_b = (math.tanh(self.z2)-y)*(1-math.tanh(self.z2)**2)
        self.w1 = self.w1-self.eta*grad_w1
        self.w2 = self.w2-self.eta*grad_w2
        self.b = self.b -self.eta*grad_b     

In [4]:
model = NN()
model.w1

1

In [5]:
epochs = 10
for epoch in range(epochs):
    for X,Y in zip(x,y):
        pred = model.forward(X)
        print(f"Prediction is {round(pred,2)}, actual value is {Y.round(2)}")
        model.backward(X,Y)

Prediction is 0.78, actual value is 3.34
Prediction is 0.95, actual value is 78.46
Prediction is 0.15, actual value is -98.71
Prediction is -0.73, actual value is -57.87
Prediction is -0.8, actual value is -36.3
Prediction is -0.95, actual value is -124.09
Prediction is -0.95, actual value is -73.2
Prediction is -0.91, actual value is -34.58
Prediction is 0.5, actual value is 34.09
Prediction is -0.97, actual value is -101.13
Prediction is -0.97, actual value is -93.24
Prediction is -0.88, actual value is -23.1
Prediction is 0.75, actual value is 38.04
Prediction is 0.86, actual value is 39.76
Prediction is -0.97, actual value is -89.14
Prediction is -0.02, actual value is 3.34
Prediction is 0.94, actual value is 78.46
Prediction is -0.97, actual value is -98.71
Prediction is -0.97, actual value is -57.87
Prediction is -0.95, actual value is -36.3
Prediction is -0.98, actual value is -124.09
Prediction is -0.98, actual value is -73.2
Prediction is -0.96, actual value is -34.58
Predicti

### Using autograd libraries to compute gradients

- Implement Gradient Descent using auto-grad
- Estimate Linear and Logistic Regression using auto-grad

In [6]:
import torch

Minmize $X^2+4X$

In [7]:
x=torch.tensor(0.0,requires_grad=True)

In [8]:
z=x*x+4*x ### forward pass

In [9]:
z.backward()

In [10]:
x.grad ### Grad of x at 0 wrt z

tensor(4.)

In [11]:
z=x*x+4*x

In [12]:
z.backward()

In [13]:
x.grad ### 

tensor(8.)

In [14]:
x.grad.zero_()

tensor(0.)

In [15]:
x.grad

tensor(0.)

In [16]:
x=torch.tensor(0.0,requires_grad=True)
lr=0.01
for i in range(10):
    z=x*x+4*x
    z.backward() ### dz/dx
    with torch.no_grad(): ##Disables any gradient computation
        x-=lr*x.grad
        x.grad.zero_()
    print(f"Z {z}, x: {x}")

Z 0.0, x: -0.03999999910593033
Z -0.15839999914169312, x: -0.07919999957084656
Z -0.31052735447883606, x: -0.11761599779129028
Z -0.4566304683685303, x: -0.15526367723941803
Z -0.5969479084014893, x: -0.19215840101242065
Z -0.7317087650299072, x: -0.22831523418426514
Z -0.8611330986022949, x: -0.2637489140033722
Z -0.9854321479797363, x: -0.29847392439842224
Z -1.104809045791626, x: -0.3325044512748718
Z -1.2194585800170898, x: -0.3658543527126312


In [17]:
### Autodiff
### xy=750=>x=750/y
## x+10y Minimize this
### 750/y+y*10

In [18]:
x=torch.tensor(1.0,requires_grad=True)
lr=0.01
for i in range(100):
    z=(750/x)+x*10
    z.backward()
    with torch.no_grad():
        x-=lr*x.grad
        x.grad.zero_()
    print(f"Z: {z}, x: {x}")

Z: 760.0, x: 8.399999618530273
Z: 173.2857208251953, x: 8.406291961669922
Z: 173.28179931640625, x: 8.41242504119873
Z: 173.27809143066406, x: 8.418403625488281
Z: 173.27456665039062, x: 8.42423152923584
Z: 173.27120971679688, x: 8.429913520812988
Z: 173.2680206298828, x: 8.435453414916992
Z: 173.26498413085938, x: 8.4408540725708
Z: 173.26211547851562, x: 8.446120262145996
Z: 173.25936889648438, x: 8.451254844665527
Z: 173.25677490234375, x: 8.45626163482666
Z: 173.25428771972656, x: 8.46114444732666
Z: 173.25193786621094, x: 8.465906143188477
Z: 173.24969482421875, x: 8.470550537109375
Z: 173.24755859375, x: 8.475079536437988
Z: 173.24554443359375, x: 8.479496955871582
Z: 173.2436065673828, x: 8.483805656433105
Z: 173.2417755126953, x: 8.488008499145508
Z: 173.24002075195312, x: 8.492108345031738
Z: 173.23837280273438, x: 8.496108055114746
Z: 173.23678588867188, x: 8.500009536743164
Z: 173.23529052734375, x: 8.503815650939941
Z: 173.23385620117188, x: 8.507528305053711
Z: 173.2324981

In [19]:
import pandas as pd
reg=pd.read_csv("./data/regression.csv").dropna()

In [20]:
reg.head(2)

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin
0,18.0,8.0,307.0,130.0,3504.0,12.0,70.0,1.0
1,15.0,8.0,350.0,165.0,3693.0,11.5,70.0,1.0


In [21]:
### mpg=b0+b1*cyl ## as a matrix product?
### dloss/db0, dloss/db1
#### loss=f(b0,b1)
##loss=eq
##loss.backward()
##b0.grad
##b1.grad
X=reg[['cylinders']].values
y=reg[['mpg']].values

In [22]:
## W dim?=> dloss/dW,dloss/db
W=torch.randn(1,1,requires_grad=True)
b=torch.randn(1,requires_grad=True)

In [23]:
X=torch.tensor(X)
y=torch.tensor(y)

In [24]:
lr=0.01
for i in range(100):
    diff=y-torch.matmul(X.float(),W)+b
    loss=sum(diff*diff)/y.shape[0]
    loss.backward()
    with torch.no_grad():
        W-=lr*W.grad
        b-=lr*b.grad
        W.grad.zero_()
        b.grad.zero_()
    print(f"Loss: {loss.item()}, W: {W.detach().numpy()}, b: {b.detach().numpy()}")

Loss: 1232.7423724703717, W: [[1.897617]], b: [0.70098984]
Loss: 299.83267440683056, W: [[3.0871732]], b: [0.42572457]
Loss: 200.88668794171477, W: [[3.4652]], b: [0.28614816]
Loss: 189.93863059455396, W: [[3.5796304]], b: [0.19073406]
Loss: 188.2774799440126, W: [[3.608451]], b: [0.10975139]
Loss: 187.59797955950813, W: [[3.609477]], b: [0.03354244]
Loss: 187.02379210418584, W: [[3.6014888]], b: [-0.04103003]
Loss: 186.46244312247816, W: [[3.5905867]], b: [-0.11498526]
Loss: 185.9041695255456, W: [[3.5787525]], b: [-0.18865451]
Loss: 185.34793517195828, W: [[3.56663]], b: [-0.2621455]
Loss: 184.79362537959506, W: [[3.5544276]], b: [-0.33549333]
Loss: 184.24121948872394, W: [[3.542214]], b: [-0.40870962]
Loss: 183.6907144867814, W: [[3.5300107]], b: [-0.48179823]
Loss: 183.14209987618685, W: [[3.5178246]], b: [-0.5547606]
Loss: 182.59536999680304, W: [[3.5056586]], b: [-0.62759733]
Loss: 182.05051716281997, W: [[3.493513]], b: [-0.7003088]
Loss: 181.50753823176538, W: [[3.4813886]], b:

In [25]:
### linear classifier.
### Can you estimate a linear classifier using autodiff
### Loss for linear classifier?
### log loss as a function of W and b, p=f(X,W,b)

In [26]:
cls=pd.read_csv("./data/classification.csv")
cls.head()

Unnamed: 0,No_pregnant,Plasma_glucose,Blood_pres,Skin_thick,Serum_insu,BMI,Diabetes_func,Age,Class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [27]:
X=cls[['No_pregnant']].values
y=cls[['Class']].values

In [28]:
### loss=−[𝑦𝑙𝑜𝑔(𝑝+tol)+(1−𝑦)𝑙𝑜𝑔(1−𝑝+tol)]
### p=1/(1+e^-z)
### z= XW+b

In [29]:
X=torch.tensor(X)
y=torch.tensor(y)
W=torch.randn(1,1,requires_grad=True)
b=torch.randn(1,requires_grad=True)
tol=0.0000000001
lr=0.01
for i in range(100):
    z=torch.matmul(X.float(),W)+b
    p=1.0/(1+torch.exp(-z))
    loss=-(y*torch.log(p+tol)+(1-y)*torch.log(1-p+tol)).mean()
    loss.backward()
    with torch.no_grad():
        W-=lr*W.grad
        b-=lr*b.grad
        W.grad.zero_()
        b.grad.zero_()
    print(f"Loss: {loss.item()}, W: {W}, b: {b}")

Loss: 2.6268749237060547, W: tensor([[-1.6294]], requires_grad=True), b: tensor([1.6334], requires_grad=True)
Loss: 2.603846788406372, W: tensor([[-1.6143]], requires_grad=True), b: tensor([1.6345], requires_grad=True)
Loss: 2.5808122158050537, W: tensor([[-1.5991]], requires_grad=True), b: tensor([1.6356], requires_grad=True)
Loss: 2.5577785968780518, W: tensor([[-1.5840]], requires_grad=True), b: tensor([1.6367], requires_grad=True)
Loss: 2.5347530841827393, W: tensor([[-1.5689]], requires_grad=True), b: tensor([1.6377], requires_grad=True)
Loss: 2.5117435455322266, W: tensor([[-1.5537]], requires_grad=True), b: tensor([1.6388], requires_grad=True)
Loss: 2.4887564182281494, W: tensor([[-1.5386]], requires_grad=True), b: tensor([1.6398], requires_grad=True)
Loss: 2.4657962322235107, W: tensor([[-1.5235]], requires_grad=True), b: tensor([1.6409], requires_grad=True)
Loss: 2.4428679943084717, W: tensor([[-1.5084]], requires_grad=True), b: tensor([1.6419], requires_grad=True)
Loss: 2.419

## Using tensorflow to compute gradients 

$y = x^2 +4x$

$\frac{dy}{dx} = 2x+4$

In [30]:
import tensorflow as tf
x = tf.Variable(3.0)

Metal device set to: Apple M1


2021-12-23 16:50:52.588684: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-12-23 16:50:52.588799: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


In [31]:
with tf.GradientTape() as tape:
    y = x**2+4*x

In [32]:
dy_dx = tape.gradient(y, x) ## compute dy/dx

In [33]:
dy_dx.numpy()

10.0

In [34]:
dy_dx

<tf.Tensor: shape=(), dtype=float32, numpy=10.0>

Write the gradient descent using ```GradientTape()```

In [35]:
x = tf.Variable(0.0)
lr = 0.1
for i in range(10):
    with tf.GradientTape() as tape:
        y = x**2+4*x
    grad = tape.gradient(y,x)
    x.assign_sub(lr*grad)
    print(x.numpy())

-0.4
-0.72
-0.9760001
-1.1808001
-1.34464
-1.4757121
-1.5805696
-1.6644558
-1.7315646
-1.7852517


### Linear Regression with basic tf

In [36]:
reg.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin
0,18.0,8.0,307.0,130.0,3504.0,12.0,70.0,1.0
1,15.0,8.0,350.0,165.0,3693.0,11.5,70.0,1.0
2,18.0,8.0,318.0,150.0,3436.0,11.0,70.0,1.0
3,16.0,8.0,304.0,150.0,3433.0,12.0,70.0,1.0
4,17.0,8.0,302.0,140.0,3449.0,10.5,70.0,1.0


In [37]:
X=reg[['cylinders']].values
y=reg[['mpg']].values

In [38]:
X = tf.constant(X,dtype='float32')
y = tf.constant(y,dtype='float32')

In [39]:
def loss(y_pred,y):
    return tf.reduce_mean(tf.square(y-y_pred))

In [40]:
W = tf.Variable(tf.random.normal(shape=(1,1),dtype='float32'))
B = tf.Variable(tf.random.normal(shape=(1,),dtype='float32'))
lr = 0.01
for i in range(10):
    with tf.GradientTape() as tape:
        y_pred = tf.reshape(X@W+B,shape=X.shape[0])
        error = loss(y_pred,y)
    dw,db = tape.gradient(error,[W,B])
    W.assign_sub(lr*dw)
    B.assign_sub(lr*db)
    print(error)

tf.Tensor(830.1917, shape=(), dtype=float32)
tf.Tensor(184.82008, shape=(), dtype=float32)
tf.Tensor(116.57598, shape=(), dtype=float32)
tf.Tensor(109.228386, shape=(), dtype=float32)
tf.Tensor(108.3068, shape=(), dtype=float32)
tf.Tensor(108.06379, shape=(), dtype=float32)
tf.Tensor(107.892845, shape=(), dtype=float32)
tf.Tensor(107.730034, shape=(), dtype=float32)
tf.Tensor(107.56855, shape=(), dtype=float32)
tf.Tensor(107.40773, shape=(), dtype=float32)


In [41]:
## Class Excercise: Use tensorflow api to write the gradients for logistic regresssion