**Autograd in PyTorch**
Autograd (short for automatic differentiation) is PyTorch's differentiation engine that automatically computes gradients for tensors in a computational graph. It is crucial for training neural networks since it enables backpropagation.


How Autograd Works
1. Tracks Operations: When you perform operations on tensors with requires_grad=True, PyTorch builds a computation graph dynamically.
2. Computes Gradients: Calling .backward() on a tensor automatically computes gradients of all tensors that require them.
3. Optimizes Memory Usage: PyTorch only stores required information to compute gradients and frees unused memory efficiently.

In [0]:
!pip install torch 
import torch 

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
Collecting torch
  Using cached torch-2.6.0-cp310-cp310-manylinux1_x86_64.whl (766.7 MB)
Collecting nvidia-cusparselt-cu12==0.6.2
  Using cached nvidia_cusparselt_cu12-0.6.2-py3-none-manylinux2014_x86_64.whl (150.1 MB)
Collecting nvidia-cusolver-cu12==11.6.1.9
  Using cached nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl (127.9 MB)
Collecting sympy==1.13.1
  Using cached sympy-1.13.1-py3-none-any.whl (6.2 MB)
Collecting nvidia-nvjitlink-cu12==12.4.127
  Using cached nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)
Collecting networkx
  Using cached networkx-3.4.2-py3-none-any.whl (1.7 MB)
Collecting nvidia-cuda-cupti-cu12==12.4.127
  Using cached nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (13.8 MB)
Collecting nvidia-cublas-cu12==12.4.5.8
  Using cached nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.

In [0]:
x=torch.tensor(3.0,requires_grad=True)
y=x**2

In [0]:
y.backward()

In [0]:
x.grad

tensor(6.)

In [0]:
import numpy as np 
a=torch.tensor(4.0,requires_grad=True)
b=a**2 
c=torch.sin(b)


In [0]:
c.backward()

In [0]:
a.grad

tensor(-7.6613)

In [0]:
a

tensor(4., requires_grad=True)

In [0]:
b

tensor(16., grad_fn=<PowBackward0>)

In [0]:
c

tensor(-0.2879, grad_fn=<SinBackward0>)

Manual Derivative of a Simple neural network 

In [0]:
import torch

# Inputs
x = torch.tensor(6.7)  # Input feature
y = torch.tensor(0.0)  # True label (binary)

w = torch.tensor(1.0)  # Weight
b = torch.tensor(0.0)  # Bias

In [0]:
# Binary Cross-Entropy Loss for scalar
def binary_cross_entropy_loss(prediction, target):
    epsilon = 1e-8  # To prevent log(0)
    prediction = torch.clamp(prediction, epsilon, 1 - epsilon)
    return -(target * torch.log(prediction) + (1 - target) * torch.log(1 - prediction))

In [0]:
# Forward pass
z = w * x + b  # Weighted sum (linear part)
y_pred = torch.sigmoid(z)  # Predicted probability

# Compute binary cross-entropy loss
loss = binary_cross_entropy_loss(y_pred, y)

In [0]:
loss

tensor(6.7012)

In [0]:
# Derivatives:
# 1. dL/d(y_pred): Loss with respect to the prediction (y_pred)
dloss_dy_pred = (y_pred - y)/(y_pred*(1-y_pred))

# 2. dy_pred/dz: Prediction (y_pred) with respect to z (sigmoid derivative)
dy_pred_dz = y_pred * (1 - y_pred)

# 3. dz/dw and dz/db: z with respect to w and b
dz_dw = x  # dz/dw = x
dz_db = 1  # dz/db = 1 (bias contributes directly to z)

dL_dw = dloss_dy_pred * dy_pred_dz * dz_dw
dL_db = dloss_dy_pred * dy_pred_dz * dz_db

In [0]:
print(f"Manual Gradient of loss w.r.t weight (dw): {dL_dw}")
print(f"Manual Gradient of loss w.r.t bias (db): {dL_db}")

Manual Gradient of loss w.r.t weight (dw): 6.691762447357178
Manual Gradient of loss w.r.t bias (db): 0.998770534992218


Using Autograd

In [0]:
x = torch.tensor(6.7)
y = torch.tensor(0.0)

In [0]:
w = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)

In [0]:
z = w*x + b
z


tensor(6.7000, grad_fn=<AddBackward0>)

In [0]:
y_pred=torch.sigmoid(z)
y_pred

tensor(0.9988, grad_fn=<SigmoidBackward0>)

In [0]:
loss = binary_cross_entropy_loss(y_pred, y)
loss

tensor(6.7012, grad_fn=<NegBackward0>)

In [0]:
loss.backward()

In [0]:
print(w.grad)
print(b.grad)

tensor(6.6918)
tensor(0.9988)


Clearing Grad 

If you do multiple forward and backward propogation , Your gradients are tebd to accumulate (addition of gradients). To resolve this issue we try to clears the grad.

In [0]:
!pip install torch

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
Collecting torch
  Downloading torch-2.6.0-cp310-cp310-manylinux1_x86_64.whl (766.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 766.7/766.7 MB 1.6 MB/s eta 0:00:00
Collecting sympy==1.13.1
  Downloading sympy-1.13.1-py3-none-any.whl (6.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 67.9 MB/s eta 0:00:00
Collecting triton==3.2.0
  Downloading triton-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (253.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 253.1/253.1 MB 3.3 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu12==12.4.127
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (24.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.6/24.6 MB 43.5 MB/s eta 0:00:00
Collecting nvidia-nvjitlink-cu12==12.4.127
  Downloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)
     ━━━━━━

In [0]:
import torch 

In [0]:
x=torch.tensor(6.7, requires_grad=True)
y=x**2 

y.backward()
x.grad

tensor(13.4000)

In [0]:
x.grad.zero_()

tensor(0.)

In [0]:
x.grad

tensor(0.)

If we dont need backward traking, we just need forward tracking, what we can do??

We can use 3 options 
1. Requires grad 
2. detach 
3. torch.no_grad()

In [0]:
x.requires_grad_(False) # gradient tracking on x has been stopped

tensor(6.7000)

In [0]:
x.grad

tensor(0.)

In [0]:
z=x.detach()

In [0]:
y=x**2 

In [0]:
y.backward

<bound method Tensor.backward of tensor(44.8900)>

In [0]:
x.grad

tensor(0.)

In [0]:
y1=z**2 
y1.backward() # we wont be able to do backward here 

[0;31m---------------------------------------------------------------------------[0m
[0;31mRuntimeError[0m                              Traceback (most recent call last)
File [0;32m<command-8917511663787183>, line 2[0m
[1;32m      1[0m y1[38;5;241m=[39mz[38;5;241m*[39m[38;5;241m*[39m[38;5;241m2[39m 
[0;32m----> 2[0m y1[38;5;241m.[39mbackward()

File [0;32m/local_disk0/.ephemeral_nfs/envs/pythonEnv-1c01d255-4465-43cc-984c-540edee6b0ef/lib/python3.10/site-packages/torch/_tensor.py:626[0m, in [0;36mTensor.backward[0;34m(self, gradient, retain_graph, create_graph, inputs)[0m
[1;32m    616[0m [38;5;28;01mif[39;00m has_torch_function_unary([38;5;28mself[39m):
[1;32m    617[0m     [38;5;28;01mreturn[39;00m handle_torch_function(
[1;32m    618[0m         Tensor[38;5;241m.[39mbackward,
[1;32m    619[0m         ([38;5;28mself[39m,),
[0;32m   (...)[0m
[1;32m    624[0m         inputs[38;5;241m=[39minputs,
[1;32m    625[0m     )
[0;32m--> 626[0m 

In [0]:
with torch.no_grad(): 
    y=x**2 
    

In [0]:
y.backward() # No backward will happen because of no_grad()

[0;31m---------------------------------------------------------------------------[0m
[0;31mRuntimeError[0m                              Traceback (most recent call last)
File [0;32m<command-8917511663787185>, line 1[0m
[0;32m----> 1[0m [43my[49m[38;5;241;43m.[39;49m[43mbackward[49m[43m([49m[43m)[49m

File [0;32m/local_disk0/.ephemeral_nfs/envs/pythonEnv-1c01d255-4465-43cc-984c-540edee6b0ef/lib/python3.10/site-packages/torch/_tensor.py:626[0m, in [0;36mTensor.backward[0;34m(self, gradient, retain_graph, create_graph, inputs)[0m
[1;32m    616[0m [38;5;28;01mif[39;00m has_torch_function_unary([38;5;28mself[39m):
[1;32m    617[0m     [38;5;28;01mreturn[39;00m handle_torch_function(
[1;32m    618[0m         Tensor[38;5;241m.[39mbackward,
[1;32m    619[0m         ([38;5;28mself[39m,),
[0;32m   (...)[0m
[1;32m    624[0m         inputs[38;5;241m=[39minputs,
[1;32m    625[0m     )
[0;32m--> 626[0m [43mtorch[49m[38;5;241;43m.[39;49m[43mauto