# ⚡ PyTorch Autograd: Automatic Differentiation

Autograd is PyTorch's automatic differentiation engine that powers neural network training. It computes gradients automatically, eliminating manual calculus.

**Topics:** Gradient tracking, backward pass, chain rule, torch.no_grad()

---
## Setup

In [None]:
import torch

---
## 1. Gradient Basics

To track gradients, set `requires_grad=True` when creating tensors. PyTorch builds a computation graph as you perform operations.

**Business Example:** Model a cost function where:
- Total Cost = 3 × (material_cost)² + 5 × (labor_cost) + 100

We want to know: *How does total cost change if we adjust material or labor costs?*

In [None]:
# Create tensors with gradient tracking enabled
material_cost = torch.tensor(10.0, requires_grad=True)
labor_cost = torch.tensor(5.0, requires_grad=True)

# Define the cost function
total_cost = 3 * material_cost**2 + 5 * labor_cost + 100
total_cost

### Computing Gradients with .backward()

Calling `.backward()` computes gradients for all tensors with `requires_grad=True`.

**Mathematical derivation:**
- ∂(total_cost)/∂(material_cost) = 6 × material_cost = 6 × 10 = **60**
- ∂(total_cost)/∂(labor_cost) = 5 (constant coefficient) = **5**

In [None]:
# Compute gradients via backpropagation
total_cost.backward()

# Access gradients via .grad attribute
print(f"∂cost/∂material = {material_cost.grad}")  # Should be 60
print(f"∂cost/∂labor = {labor_cost.grad}")        # Should be 5

**Interpretation:**
- Material cost gradient (60): A $1 increase in material cost raises total cost by $60
- Labor cost gradient (5): A $1 increase in labor cost raises total cost by $5

This tells us material cost has 12x more impact on total cost than labor cost!

---
## 2. Disabling Gradient Tracking

Use `torch.no_grad()` when you don't need gradients (inference, evaluation). This:
- Saves memory (no computation graph stored)
- Speeds up computation
- Is essential during model evaluation

In [None]:
x = torch.tensor(4.0, requires_grad=True)

# Inside no_grad context, operations don't track gradients
with torch.no_grad():
    y = x**2 + 5
    print(f"Inside no_grad - requires_grad: {y.requires_grad}")

# Outside no_grad, gradients are tracked again
z = x**2 + 5
print(f"Outside no_grad - requires_grad: {z.requires_grad}")

---
## 3. Gradients with Matrices

Autograd works seamlessly with multi-dimensional tensors. For non-scalar outputs, you must reduce to a scalar before calling `.backward()`.

**Function:** y = 2x³ + 7

**Derivative:** dy/dx = 6x²

In [None]:
x = torch.tensor([[1.0, 2.0], 
                  [3.0, 4.0]], requires_grad=True)

y = 2 * x**3 + 7
print("y =")
print(y)

### Reducing to Scalar

Since `y` is a matrix, we sum it to get a scalar before backpropagation. This is equivalent to computing gradients for each element and summing them.

In [None]:
result = y.sum()
result.backward()

### Verify the Gradients

For y = 2x³ + 7, the derivative is dy/dx = 6x²

| x | Expected gradient (6x²) |
|---|------------------------|
| 1 | 6 × 1² = 6 |
| 2 | 6 × 2² = 24 |
| 3 | 6 × 3² = 54 |
| 4 | 6 × 4² = 96 |

In [None]:
print("Computed gradients:")
print(x.grad)
print("\nExpected: [[6, 24], [54, 96]]")

---
## 4. Chain Rule in Action

Autograd automatically applies the chain rule for composite functions. This is the foundation of backpropagation in neural networks.

**Example:** z = (x + y)² where x=2, y=3
- Let u = x + y = 5
- z = u² = 25
- ∂z/∂x = ∂z/∂u × ∂u/∂x = 2u × 1 = 2(5) = 10

In [None]:
x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)

u = x + y
z = u ** 2

z.backward()

print(f"z = {z.item()}")
print(f"∂z/∂x = {x.grad.item()}")  # Should be 10
print(f"∂z/∂y = {y.grad.item()}")  # Should be 10

---
## 5. Gradient Accumulation

**Important:** Gradients accumulate by default! You must zero them before each backward pass in training loops.

In [None]:
w = torch.tensor(1.0, requires_grad=True)

# First backward
loss1 = w * 2
loss1.backward()
print(f"After 1st backward: grad = {w.grad}")

# Second backward - gradients ACCUMULATE!
loss2 = w * 3
loss2.backward()
print(f"After 2nd backward: grad = {w.grad} (accumulated: 2 + 3 = 5)")

### Zeroing Gradients

In training loops, always zero gradients before each iteration:

In [None]:
w = torch.tensor(1.0, requires_grad=True)

for i in range(3):
    # Zero gradients first!
    if w.grad is not None:
        w.grad.zero_()
    
    loss = w * (i + 1)
    loss.backward()
    print(f"Iteration {i}: loss = {loss.item()}, grad = {w.grad.item()}")

---
## 6. Detaching Tensors

Use `.detach()` to create a tensor that shares data but doesn't track gradients. Useful for:
- Freezing parts of a model
- Using tensor values without affecting the computation graph

In [None]:
x = torch.tensor(3.0, requires_grad=True)
y = x ** 2

# Detach y from the graph
y_detached = y.detach()

print(f"y requires_grad: {y.requires_grad}")
print(f"y_detached requires_grad: {y_detached.requires_grad}")
print(f"Same values: {y.item()} == {y_detached.item()}")

---
## Summary

| Concept | Code | Purpose |
|---------|------|--------|
| Enable gradients | `requires_grad=True` | Track operations for backprop |
| Compute gradients | `.backward()` | Run backpropagation |
| Access gradients | `.grad` | Get computed derivatives |
| Disable tracking | `torch.no_grad()` | Inference/evaluation mode |
| Zero gradients | `.grad.zero_()` | Reset before each training step |
| Detach tensor | `.detach()` | Remove from computation graph |

**Key insight:** Autograd transforms the tedious manual calculus of backpropagation into a single `.backward()` call!