# Autograd

Autograd is a core componant of PyTorch that provides automatic differenciation for tensor operations. It enables gradient computation, which is essential, which is essential for training machine learning models using optimization algorithms like gradient descent.

let's say we want to calculate derivative of x, where x = $x^2$. We pass the value of x and our function will calculate the derivative.

In [10]:
def dy_dx(x):
    return 2 * x

In [11]:
dy_dx(3)

6

## using Autograd

In [3]:
import torch

In [4]:
x = torch.tensor(3.0, requires_grad=True)

In [5]:
y = x**2

In [6]:
x

tensor(3., requires_grad=True)

In [7]:
y

tensor(9., grad_fn=<PowBackward0>)

In [8]:
y.backward()

In [9]:
x.grad

tensor(6.)

## Chain rule

In [12]:
import math

def dz_dx(x):
    return 2 * x * math.cos(x**2)

In [13]:
dz_dx(3)

-5.466781571308061

## Using Autograd

In [33]:
x = torch.tensor(3.0, requires_grad=True)

In [34]:
y = x**2

In [35]:
z = torch.sin(y)

In [36]:
x

tensor(3., requires_grad=True)

In [37]:
y

tensor(9., grad_fn=<PowBackward0>)

In [38]:
z

tensor(0.4121, grad_fn=<SinBackward0>)

In [39]:
z.backward()

In [41]:
x.grad

tensor(-5.4668)

In [42]:
dz_dx(4)

-7.661275842587077

In [43]:
x = torch.tensor(4.0, requires_grad=True)

In [45]:
y = x**2

In [46]:
z = torch.sin(y)

In [47]:
z.backward()

In [48]:
x.grad

tensor(-7.6613)

In [50]:
## Inputs
x = torch.tensor(6.7) # input feature
y = torch.tensor(0.0) # target variable (binary)

w = torch.tensor(1.0) # weight
b = torch.tensor(0.0) # bias

In [51]:
# Binary cross entropy loss for scaler
def binary_cross_entropy_loss(prediction, target):
    epsilon = 1e-8 # to prevent log(0)
    prediction = torch.clamp(prediction, epsilon, 1 - epsilon)
    return - (target * torch.log(prediction) + (1 - target) * torch.log(1 - prediction))

In [52]:
# Forward pass
z = w * x + b # weighted sum (linear part)
y_pred = torch.sigmoid(z)

# compute binary cross entropy loss
loss = binary_cross_entropy_loss(y_pred, y)

In [53]:
loss

tensor(6.7012)

In [54]:
# Derivatives

## loss with respect to the prediction (y_pred)
dloss_dy_pred = (y_pred - y) / (y_pred * (1 - y_pred))

## prediction (y_pred) with respect to z (sigmoid derivatives)
dy_pred_dz = y_pred * (1 - y_pred)

## z with respect to w and b
dz_dw = x 
dz_db = 1


dL_dw = dloss_dy_pred * dy_pred_dz * dz_dw 
dL_db = dloss_dy_pred * dy_pred_dz * dz_db

In [55]:
print(f"Manual Gradient of loss w.r.t weight (dw): {dL_dw}")
print(f"Manual Gradient of loss w.r.t bias (db): {dL_db}")

Manual Gradient of loss w.r.t weight (dw): 6.691762447357178
Manual Gradient of loss w.r.t bias (db): 0.998770534992218


## Using Autograd

In [56]:
x = torch.tensor(6.7)
y = torch.tensor(0.0)

In [57]:
w = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)

In [58]:
w

tensor(1., requires_grad=True)

In [59]:
b

tensor(0., requires_grad=True)

In [60]:
z = w * x + b
z

tensor(6.7000, grad_fn=<AddBackward0>)

In [61]:
y_pred = torch.sigmoid(z)
y_pred

tensor(0.9988, grad_fn=<SigmoidBackward0>)

In [62]:
loss = binary_cross_entropy_loss(y_pred, y)
loss

tensor(6.7012, grad_fn=<NegBackward0>)

In [63]:
loss.backward()

In [64]:
w.grad

tensor(6.6918)

In [65]:
b.grad

tensor(0.9988)

We can also do the same operations int the vectors as like as scalers.

In [67]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
x

tensor([1., 2., 3.], requires_grad=True)

In [69]:
y = (x**2).mean()
y

tensor(4.6667, grad_fn=<MeanBackward0>)

In [70]:
y.backward()

In [71]:
x.grad

tensor([0.6667, 1.3333, 2.0000])

## Clearing Grad

In [84]:
x = torch.tensor(2.0, requires_grad=True)
x

tensor(2., requires_grad=True)

In [103]:
y = x ** 2

In [104]:
y.backward()

In [105]:
x.grad

tensor(8.)

so, my grad here is 4, but if my run the forward pass and the backward pass again, then the grad will be 8. Because, autograd accumulate all the gradients!

So, we need to clear the gradient after claculating the gradients!

In [99]:
x.grad.zero_()

tensor(0.)

## Not to track Gradient

- Option 1: requires_grad_(False)
- Option 2: detach()
- Option 3: torch.no_grad()

In [107]:
x.requires_grad_(False)

tensor(2.)

In [108]:
x

tensor(2.)

In [109]:
y = x ** 2
y

tensor(4.)

In [112]:
y.backward()

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

In [110]:
x = torch.tensor(3.0, requires_grad=True)
x

tensor(3., requires_grad=True)

In [111]:
z = x.detach()
z

tensor(3.)

In [113]:
y = x ** 2
y

tensor(9., grad_fn=<PowBackward0>)

In [114]:
with torch.no_grad():
    y = x ** 2

In [115]:
y

tensor(9.)