# ***Automatic Differentiation (Autograd)***


- In deep learning, the core idea is to optimize a loss function by adjusting weights in a neural network. This is done through gradient descent, which relies on gradients—partial derivatives of the loss function with respect to model parameters.

Traditionally, computing these gradients required:
- Manual differentiation — error-prone and tedious
- Symbolic differentiation (like in SymPy) — can be inefficient for large networks
- Numerical differentiation (finite differences) — inaccurate and slow

This led to the need for Automatic Differentiation (AD), a technique to compute exact gradients efficiently and automatically.

## ⚙️ What is Automatic Differentiation?

Automatic Differentiation (AD) breaks down computations into a graph of operations and then systematically applies the chain rule to compute gradients.

- PyTorch uses a dynamic computation graph, built on the fly during forward pass.
- This makes it flexible and intuitive, especially for models with varying control flow (if/else, loops, etc.).

### 🚀 Enter Autograd: PyTorch’s Engine for AD

autograd is the automatic differentiation engine in PyTorch.

- Key ideas:

Every Tensor in PyTorch has a ***flag: requires_grad***
- When requires_grad=True, PyTorch tracks all operations on that tensor to build the computation graph
- Calling .backward() on a scalar loss automatically computes gradients of all tensors involved

## 🔄 The Computation Graph
- Nodes: tensors
- Edges: functions (operations) that produce the next tensors
- Leaf Nodes: input tensors
- The graph is dynamic: it is re-built after every forward pass

>> ***This design is why PyTorch is called "define-by-run" (as opposed to "define-and-run", like TensorFlow 1.***x).

🧠 Behind .backward()
- PyTorch uses reverse-mode automatic differentiation, which is efficient when you have:
- Many inputs (model parameters)
- Single output (loss)

.backward() starts at the scalar loss and moves backward through the computation graph applying the chain rule



| Function               | Description                                              |
| ---------------------- | -------------------------------------------------------- |
| `.backward()`          | Computes gradients                                       |
| `.grad`                | Stores the gradient of a tensor                          |
| `.zero_grad()`         | Clears old gradients before a new `.backward()`          |
| `with torch.no_grad()` | Disables autograd for inference or evaluation            |
| `detach()`             | Returns a new tensor detached from the computation graph |


- Suppose you compute the loss L = (x * w + b)^2, where x, w, and b are tensors.

1. During the forward pass, PyTorch builds the graph:

multiplies x and w, adds b, squares the result


2. You call loss.backward()

PyTorch traces back the graph and computes gradients:
 dl/dw , dl/db

3. These are stored in w.grad, b.grad


### ⚠️ Things to Remember
- You need to zero out gradients after every optimization step (optimizer.zero_grad()), or gradients will accumulate.
- Autograd only works with float-type tensors with requires_grad=True.
- Operations not involving tensors with requires_grad=True are not tracked.


In [9]:
 # Create a tensor that requires gradients
import torch
x = torch.tensor(2.0, requires_grad=True)
print(f"x = {x}")

x = 2.0


In [10]:
 # Define a function
y = x**2 + 3*x + 1
print(f"y = x² + 3x + 1 = {y}")

y = x² + 3x + 1 = 11.0


In [11]:
##Calculate the gradient (derivative)
y.backward()  # This computes dy/dx

In [12]:
print(f"Gradient dy/dx = {x.grad}")


Gradient dy/dx = 7.0


## ***More complex example***

In [13]:
x = torch.tensor(1.0, requires_grad= True)
y = torch.tensor(2.0, requires_grad= True)

z = x**2 + y**3 + x*y

print(f"z = x² + y³ + xy = {z}")

z.backward()

print(f"∂z/∂x = {x.grad}")
print (f"∂z/∂y = {y.grad}") 


z = x² + y³ + xy = 11.0
∂z/∂x = 4.0
∂z/∂y = 13.0


---

In [15]:
x = torch.tensor(2.0, requires_grad=True)
w = torch.tensor(3.0, requires_grad=True)
b = torch.tensor(4.0, requires_grad=True)

y = w * x + b     # y = 3*2 + 4 = 10
loss = y ** 2     # loss = 10^2 = 100


loss.backward()

print("dloss/dx:", x.grad)  # Should be 2 * y * w = 2*10*3 = 60
print("dloss/dw:", w.grad)  # Should be 2 * y * x = 2*10*2 = 40
print("dloss/db:", b.grad)  # Should be 2 * y * 1 = 2*10 = 20

dloss/dx: tensor(60.)
dloss/dw: tensor(40.)
dloss/db: tensor(20.)


In [23]:
x = torch.tensor(3.0, requires_grad=True)
y2 = x**3 - 2*x**2 + x-5

y2.backward()

print (x.grad)

tensor(16.)
