## pytorch basics:

- How to compute gradients (derivatives) of simple functions
- How to do gradient descent using pytorch

## tensorflow basics:

- How to compute gradients (derivatives) of simple functions
- How to do gradient descent using tf


Minimize $X^2+X$

In [1]:
import torch

In [2]:
x = torch.tensor(0.0,requires_grad=True)

x = x -lr*grad (Disable any gradient computations)

In [3]:
x

tensor(0., requires_grad=True)

In [4]:
z = x**2+x ### forward pass

In [5]:
z.backward()

In [6]:
x.grad

tensor(1.)

$\frac{dz}{dx} = 2x+1$

In [7]:
z = x**2+x ## forward pass

In [8]:
z.backward()

In [9]:
x.grad

tensor(2.)

$\frac{dz}{dx}_{iter1}+\frac{dz}{dx}_{iter2}$

In [10]:
### Rewrite again
x = torch.tensor(0.0,requires_grad=True)
z = x**2+x ### forward pass
z.backward()
print(x.grad)
x.grad.zero_() ### cleared the gradients from the previous forward pass
z = x**2+x ### forward pass
z.backward()
print(x.grad)

tensor(1.)
tensor(1.)


$z = X^2+4X-8$

$y = x-lr*x.grad$

dont want $\frac{dy}{dx}$ to be computed

Use pytorch and find the $\frac{dz}{dx}_{x=4}$

Once you have computed the gradient, try to do gradient descent and minimize z

Update equation for gradient descent:

$x := x - \eta*grad$

$f(x_1,x_2,x_3)$
$f.backward()$
$\frac{\partial(f)}{x_1}$
$\frac{\partial(f)}{x_2}$
$\frac{\partial(f)}{x_3}$

In [11]:
x = torch.tensor(4.0,requires_grad=True)
eta = 0.01
for i in range(10):
    z = x**2+4*x-8
    z.backward()
    with torch.no_grad():## stop any kind of gradient tracking
        x -= eta*x.grad ## can't write x=x-lr*grad
        x.grad.zero_() ## clear the gradients
    print(f"z :{z} x: {x}")

z :24.0 x: 3.880000114440918
z :22.57440185546875 x: 3.7624001502990723
z :21.20525550842285 x: 3.6471521854400635
z :19.89032745361328 x: 3.5342092514038086
z :18.627471923828125 x: 3.423525094985962
z :17.414623260498047 x: 3.3150546550750732
z :16.249805450439453 x: 3.2087535858154297
z :15.131114959716797 x: 3.1045784950256348
z :14.056720733642578 x: 3.0024869441986084
z :13.02487564086914 x: 2.902437210083008


Convert the for loop given above into a while loop and terminate that loop based on some thershold value

In [12]:
x = torch.tensor(4.0,requires_grad=True) 
eta = 0.01
converged = False
tol = 0.0001
while not converged:
    z = x**2+4*x-8 # z=f(x)
    y = z**3
    z.backward() ## dz/dx
    with torch.no_grad():## stop any kind of gradient tracking
        x -= eta*x.grad ## can't write x=x-lr*grad ## y=x-eta*grad, i am not tracking the gradients for any calcluation done in this block, !dy/dx
        x.grad.zero_()
        z_new = x**2+4*x-8
    if abs(z_new-z)<=tol:
        converged = True
print(z,x)  

tensor(-11.9975, grad_fn=<SubBackward0>) tensor(-1.9510, requires_grad=True)


### Linear Regression using pytorch
$y = w_0 + w_1x$

- Define a tensor for $w_0$ and $w_1$
- You want to calculate $\frac{dLoss}{dw_0}$ and $\frac{dLoss}{dw_1}$
- Define an expression for loss, so that can call loss.backward(), and then find out $w_0.grad$ and $w_1.grad$


In [13]:
import pandas as pd

In [14]:
data = pd.read_csv("../Codes/data/regression.csv")

In [15]:
data.isnull().sum()

mpg             0
cylinders       0
displacement    0
horsepower      6
weight          0
acceleration    0
year            0
origin          0
dtype: int64

In [16]:
data.head(2)

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin
0,18.0,8.0,307.0,130.0,3504.0,12.0,70.0,1.0
1,15.0,8.0,350.0,165.0,3693.0,11.5,70.0,1.0


In [20]:
X,y = data['cylinders'].values, data['mpg'].values

In [21]:
X = torch.tensor(X)
y = torch.tensor(y)

In [27]:
w0 = torch.tensor(1.0,requires_grad=True)
w1 = torch.tensor(1.0,requires_grad=True)
def predict(w0,w1,x):
    return w0+w1*x 
def loss_fn(actual,preds):
    error = (actual-preds)
    error = error**2
    return error.mean()
converged = False
tol = 0.01
lr = 0.01
while not converged:
    preds = predict(w0,w1,X)
    loss = loss_fn(y,preds)
    loss.backward()
    with torch.no_grad():
        w0-=lr*w0.grad
        w1-=lr*w1.grad
        w0.grad.zero_()
        w1.grad.zero_()
        new_loss = loss_fn(y,predict(w0,w1,X))
    if abs(new_loss-loss)<=tol:
        converged = True
print(loss,w0,w1)

tensor(27.1987, dtype=torch.float64, grad_fn=<MeanBackward0>) tensor(37.2327, requires_grad=True) tensor(-2.6050, requires_grad=True)


W = [w0,
     w1,
     w2,
     w3]
X = [[1],cylinders,displacement,hp]
preds = XW
loss.backward()
W.grad = []

In [28]:
X = data[['cylinders','displacement']]
y = data['mpg'].values

In [30]:
X['ones'] = 1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X['ones'] = 1


In [33]:
X = X[['ones','cylinders','displacement']].values

In [35]:
W = torch.ones(3,1,requires_grad=True)

In [42]:
W

torch.float32

In [44]:
X = torch.tensor(X,dtype=torch.float)

  X = torch.tensor(X,dtype=torch.float)


In [46]:
preds = X@W

In [47]:
y = torch.tensor(y,dtype=torch.float)

In [49]:
l = loss_fn(y,preds)

In [50]:
l.backward()

In [51]:
W.grad

tensor([[  352.7322],
        [ 2266.2466],
        [90253.7266]])

### tensorflow

In [52]:
import tensorflow as tf

In [57]:
tf.__version__

'2.7.0'

Minimize $X^2+X$

In [53]:
x = tf.Variable(0.0)

Metal device set to: Apple M1


2022-01-07 14:50:37.680593: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-01-07 14:50:37.681536: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


In [54]:
with tf.GradientTape() as tape:
    y = x**2+x

In [55]:
dy_dx = tape.gradient(y,x) ## dy/dx

In [56]:
dy_dx.numpy()

1.0

x = x -lr*grad

In [None]:
lr = 0.01
x.assign_sub(lr*dy_dx) # 