# Autograd
- Autograd is a core component of PyTorch that provides automatic differentiation for tensor operations. It enables gradient computation which is essential for training machine learning models using optimization algorithm like gradient descent.
- Gradient:  It indicates the direction and magnitude of the steepest increase of that function.
- Let's take a simple function:
`f(x, y) = x²y + y³`



In [1]:
import torch

In [2]:
x = torch.tensor(2.0, requires_grad=True)
# requires_grad=True tells PyTorch:
# "Track all operations on this tensor for gradient calculation!"
y = torch.tensor(3.0, requires_grad=True)

In [3]:
z = x**2 * y + y**3
# z = (2)² × 3 + (3)³
#   = 4 × 3 + 27
#   = 12 + 27 = 39

In [4]:
z.backward()  # Computes ∂z/∂x and ∂z/∂y

Partial Derivatives:

- `∂z/∂x` = 2xy = 2×2×3 = 12
- `∂z/∂y` = x² + 3y² = 4 + 27 = 31

In [5]:
print(x.grad)
print(y.grad)

tensor(12.)
tensor(31.)


So it calculates derivative for us.
# Why this matters in ML:

In neural networks, we have:
`Loss = f(weights, biases)`

Autograd automatically computes:

`∂Loss/∂weights`

`∂Loss/∂biases`

So we can update: `weights = weights - η × ∂Loss/∂weights`

No manual calculus needed! PyTorch tracks all operations and applies chain rule automatically.

In [6]:
# Example 2: Multiple operations chain
w = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(2.0, requires_grad=True)

In [7]:
# Forward pass: y = w*x + b (with x=3)
x = 3.0
y_pred = w * x + b        # y_pred = 1*3 + 2 = 5
y_true = 10.0

In [8]:
# Loss = (y_pred - y_true)²
loss = (y_pred - y_true) ** 2  # (5-10)² = 25

In [9]:
# Autograd magic!
loss.backward()

In [10]:
print("∂loss/∂w:", w.grad)  # 2*(y_pred-y_true)*x = 2*(-5)*3 = -30
print("∂loss/∂b:", b.grad)  # 2*(y_pred-y_true)*1 = 2*(-5) = -10

∂loss/∂w: tensor(-30.)
∂loss/∂b: tensor(-10.)


 `.backward()`:
- Computes derivatives using the chain rule
- Starts from loss and works backward through all operations
- Calculates: `∂loss/∂x` for every tensor with `requires_grad=True`

`.grad`:
- Stores the result of the derivative
- Contains: ∂loss/∂x (how much loss changes when x changes)

In [11]:
x = torch.tensor(3.0, requires_grad=True)
y = x**2                    # y = x²
y.backward()                # ← COMPUTES derivative
print(x.grad)               # ← STORES derivative result

tensor(6.)


# Autograd in Vector Input Tensors


In [12]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True) # Vector input instead of scalar

In [15]:
y = (x ** 2).mean()

In [16]:
y.backward()

In [17]:
x.grad

tensor([0.6667, 1.3333, 2.0000])

`y = mean(x²) = (x₁² + x₂² + x₃²)/3`

Partial derivatives:

`∂y/∂x₁ = 2x₁/3 = 2×1/3 = 0.6667`

`∂y/∂x₂ = 2x₂/3 = 2×2/3 = 1.3333`

`∂y/∂x₃ = 2x₃/3 = 2×3/3 = 2.0000`

Gradients accumulate by default in PyTorch. If you don't clear them, each `.backward()` call adds to previous gradients.

In [18]:
x = torch.tensor(2.0, requires_grad=True)

In [19]:
# First backward pass
y = x ** 2
y.backward()
print("Grad after 1st:", x.grad)

Grad after 1st: tensor(4.)


In [20]:
# Clear gradients
x.grad.zero_()

tensor(0.)

In [22]:
# Second backward pass
y = x ** 3
y.backward()
print("Grad after 2nd:", x.grad) # tensor(12.0) - NOT 16.0!

Grad after 2nd: tensor(12.)


In [24]:
# Option to disable Gradient Tracking

In [23]:
# Default: gradient tracking ON
x = torch.tensor(2.0, requires_grad=True)

# Gradient tracking OFF
x = torch.tensor(2.0, requires_grad=False)  # Default behavior
x = torch.tensor(2.0)  # Same as above

In [25]:
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2  # y has gradient tracking

x_detached = x.detach()  # Remove gradient tracking
z = x_detached ** 3     # z has NO gradient tracking

In [26]:
x = torch.tensor(2.0, requires_grad=True)

with torch.no_grad():  # All operations inside have NO gradient tracking
    y = x ** 2
    z = y + 1
# y and z don't require grad

# Outside: gradient tracking resumes
w = x ** 3  # w requires grad