### Why we need Auto Grad

As nested functions get complex their getting their derivative becomes more harder

**Training process of neural network**
1. Forward pass -> Compute the output of the network given an input
2. Calculate loss -> Calculate the loss function to quantify the error
3. Backward pass -> Compute gradients of the loss with respect to the parameters
4. Update Gradients -> Adjust the parameters of using an optimization algorithm (e.g. gradient descent)

Neural network acts as a nested function

Autograd -> Autograd is a core component of PyTorch that provides automatic differentiation for tensor operations. It enables gradient computation, which is essential for training machine learning models using optimization algorithms like gradient descent

In [1]:
import torch

#### Eg. 1

In [2]:
x = torch.tensor(5.0, requires_grad=True)
# By default requires_grad = False -> while creating a tensor if you set requires_grad = True then pytorch will track that tensor and whenever you request for differentiation pytorch will give it automatically

In [3]:
y = x**2

In [4]:
x

tensor(5., requires_grad=True)

In [5]:
y # Here in the backend pytorch will create a computation graph for x since requires_grad = True for now it will look as x -> square -> y
# [Now while going backward we have to do differentiation]

tensor(25., grad_fn=<PowBackward0>)

In [6]:
y.backward() # This is how we differentiate

In [7]:
x.grad # Printing the value after differentiation

tensor(10.)

#### Eg . 2

In [8]:
x = torch.tensor(5.0, requires_grad=True)
y = x**2
z = torch.sin(y)

In [9]:
x

tensor(5., requires_grad=True)

In [10]:
y

tensor(25., grad_fn=<PowBackward0>)

In [11]:
z

tensor(-0.1324, grad_fn=<SinBackward0>)

In [12]:
z.backward()

In [13]:
x.grad # y.grad cannot be printed because it is the middle step [By default intermidiate nodes cannot be calculated]

tensor(9.9120)

#### Eg. 3

##### Doing Manually

In [14]:
x = torch.tensor(6.7) # Input Feature
y = torch.tensor(0.0) # True label (binary)

w = torch.tensor(1.0) # Weight
b = torch.tensor(0.0) # Bais

In [16]:
# Binary Cross Entropy Loss for scalar
def binary_cross_entropy_loss(prediction, target):
  epsilon = 1e-8
  prediction = torch.clamp(prediction, epsilon, 1 - epsilon)
  return -(target * torch.log(prediction) + (1 - target) * torch.log(1 - prediction))

In [17]:
# Forward Pass
z = w * x + b
y_pred = torch.sigmoid(z)

loss = binary_cross_entropy_loss(y_pred, y)

In [18]:
loss

tensor(6.7012)

In [21]:
# Derivatives:
# 1. dL/d(y_pred) : Loss with respect to the prediction (y_pred)
dloss_dy_pred = (y_pred - y) / (y_pred * (1 - y_pred))

# 2. dy_pred/dz : Prediction (y_pred) with respect to z (sigmoid derivative)
dy_pred_dz = y_pred * (1 - y_pred)

# 3. dz/dw : z with respect to w
dz_dw = x

# 4. dz/db : z with respect to b
dz_db = 1

dL_dw = dloss_dy_pred * dy_pred_dz * dz_dw
dL_db = dloss_dy_pred * dy_pred_dz * dz_db

In [22]:
print(f"Manual Gradient of loss w.r.t weight (dw) : {dL_dw}")
print(f"Manual Gradient of loss w.r.t bais (db) : {dL_db}")

Manual Gradient of loss w.r.t weight (dw) : 6.691762447357178
Manual Gradient of loss w.r.t bais (db) : 0.998770534992218


##### Using Autograd

In [23]:
# Using Autograd

x = torch.tensor(6.7) # Input Feature
y = torch.tensor(0.0) # True label (binary)

w = torch.tensor(1.0, requires_grad=True) # Weight
b = torch.tensor(0.0, requires_grad=True) # Bais

In [24]:
w

tensor(1., requires_grad=True)

In [25]:
b

tensor(0., requires_grad=True)

In [27]:
z = w * x + b
z

tensor(6.7000, grad_fn=<AddBackward0>)

In [28]:
y_pred = torch.sigmoid(z)
y_pred

tensor(0.9988, grad_fn=<SigmoidBackward0>)

In [29]:
loss = binary_cross_entropy_loss(y_pred, y)
loss

tensor(6.7012, grad_fn=<NegBackward0>)

In [30]:
loss.backward()

In [31]:
print(w.grad)
print(b.grad)

tensor(6.6918)
tensor(0.9988)


#### Vector input tensor

In [32]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)

In [33]:
x

tensor([1., 2., 3.], requires_grad=True)

In [36]:
y = (x ** 2).mean()

In [37]:
y

tensor(4.6667, grad_fn=<MeanBackward0>)

In [38]:
y.backward()

In [39]:
x.grad

tensor([0.6667, 1.3333, 2.0000])

#### Clearing Grad

In [40]:
# When you do multiple backward passes the the gradients keep accumulating which is not a good practice because the gradient of previous pass gets added to the gradient of current pass
x.grad.zero_()

tensor([0., 0., 0.])

#### Disable Gradient Tracking

In [41]:
x = torch.tensor(2.0, requires_grad=True)
x

tensor(2., requires_grad=True)

In [42]:
y = x ** 2
y

tensor(4., grad_fn=<PowBackward0>)

In [43]:
y.backward()

In [44]:
x.grad

tensor(4.)

In [45]:
# Option 1 : requires_grad(False)
# Option 2 : detach()
# Optino 3 : torch.no_grad()

In [46]:
x.requires_grad_(False)

tensor(2.)

In [47]:
x

tensor(2.)

In [48]:
y = x ** 2

In [49]:
y # you cannot call y.backward()

tensor(4.)

In [58]:
x = torch.tensor(2.0, requires_grad=True)
x

tensor(2., requires_grad=True)

In [59]:
y = x ** 2
print(y)
z = x.detach()
print(z)
y1 = x ** 2
print(y1)

tensor(4., grad_fn=<PowBackward0>)
tensor(2.)
tensor(4., grad_fn=<PowBackward0>)


In [60]:
y.backward() # Possible
# y1.backward() # Not possible

In [61]:
x.grad

tensor(4.)

In [62]:
x = torch.tensor(2.0, requires_grad=True)
x

tensor(2., requires_grad=True)

In [63]:
with torch.no_grad():
  y = x ** 2
  print(y)

tensor(4.)


In [None]:
# y.backward() # This will not work