# üß† Step 1E ‚Äî Computation Graph & Autograd (The Heart of Learning)

You already know how to manipulate tensors ‚Äî but tensors alone can‚Äôt ‚Äúlearn.‚Äù
What gives PyTorch its magic is autograd and the computation graph.

## üß© 1Ô∏è‚É£ What Is a Computation Graph?

Whenever you perform mathematical operations on tensors in PyTorch,
it secretly builds a graph of operations.

* The nodes of this graph are tensors (values).

* The edges are operations (like addition, multiplication, etc.).

So if you do:

x = torch.tensor(2.0, requires_grad=True)
y = x * 3
z = y + 5


PyTorch internally builds:

x ‚îÄ‚îÄ(√ó3)‚îÄ‚îÄ‚ñ∫ y ‚îÄ‚îÄ(+5)‚îÄ‚îÄ‚ñ∫ z

### üßÆ 2Ô∏è‚É£ What Happens During .backward()

When you call z.backward(), PyTorch walks backward through that graph,
applying the chain rule of calculus to compute the derivative
of z with respect to every tensor that had requires_grad=True.

So here:

* dz/dx = 3

* That value gets stored in x.grad.

## ‚öôÔ∏è 3Ô∏è‚É£ The ‚Äúrequires_grad‚Äù Flag

This flag tells PyTorch:

‚ÄúTrack everything that happens to this tensor so you can compute gradients later.‚Äù

If it‚Äôs False (the default), PyTorch ignores that tensor during .backward().

In [1]:
import torch
x = torch.tensor(2.0, requires_grad=True)
y = x * 3
z = y + 5

In [2]:
a = torch.tensor(5.0, requires_grad=True)  # tracked
b = torch.tensor(4.0)   

If you mix them, autograd still works, but only variables with requires_grad=True
will receive .grad values.

### üö¶ 4Ô∏è‚É£ Why We Need Gradients

Gradients tell us the direction and magnitude to adjust parameters  
to minimize a loss (error).  

Example intuition:  

If loss decreases when weight w decreases, the gradient for w is positive ‚Üí  

so optimizer subtracts a small step from w.  


### üß† 5Ô∏è‚É£ Example: One-variable computation

In [26]:
import torch

x = torch.tensor(2.0, requires_grad=True)
y = x ** 3 + 2 * x + 1
y.backward()          # compute dy/dx
print(x.grad)         # ‚Üí 3x¬≤ + 2 = 3*4 + 2 = 14

tensor(14.)


In [27]:
y = x ** 3 + 2 * x + 1
y.backward()          # compute dy/dx
print(x.grad)  

tensor(28.)


### üîÑ 6Ô∏è‚É£ What if you call backward multiple times?

Gradients accumulate by default ‚Äî they don‚Äôt reset automatically.  

Otherwise, you‚Äôll see .grad keep increasing over epochs.

In [28]:
x.grad.zero_()   # always reset before next backward

# OR

x = x.detach()

# Both stop gradient tracking temporarily.


## üß© 7Ô∏è‚É£ Disabling Autograd (for evaluation/inference)

When you‚Äôre not training (for example, testing a model),
you don‚Äôt need gradient tracking ‚Äî it saves memory & computation.

In [29]:
# Use either:

with torch.no_grad():
    prediction = model(x)


# or

x = x.detach()


# Both stop gradient tracking temporarily.

NameError: name 'model' is not defined

## üîç 8Ô∏è‚É£ Behind the Scenes (a tiny mental model)

Each tensor that participates in autograd has:

tensor.data ‚Üí its raw value

tensor.grad ‚Üí its computed gradient (after backward)

tensor.grad_fn ‚Üí a reference to the operation that created it

Check it out:

In [31]:
x = torch.tensor(3.0, requires_grad=True)
y = x * 2
print(y.grad_fn)   # prints something like <MulBackward0 object>

# That‚Äôs literally PyTorch keeping a ‚Äúrecipe‚Äù to reverse the computation.

<MulBackward0 object at 0x7eb7b1c21000>


## üß© 9Ô∏è‚É£ Mini Exercises (try these slowly)

### 1Ô∏è‚É£ Create x = torch.tensor(2.0, requires_grad=True)
Compute y = 3*x**2 + 4*x + 5
Find x.grad after .backward() ‚Äî verify manually.

### 2Ô∏è‚É£ Repeat but call .backward() twice without zeroing grad.
What happens to x.grad?

In [70]:
x = torch.tensor(2.0, requires_grad=True)

In [71]:
y = 3*x**2 + 4*x + 5

In [72]:
y.backward()

In [73]:
x.grad

tensor(16.)

### 3Ô∏è‚É£ Wrap your computation in with torch.no_grad(): and check ‚Äî
does x.grad update anymore?

üß© 3Ô∏è‚É£ Perfect mental model üí≠

You can think of autograd as a notebook keeping track of every math step you do.
When you call backward(), PyTorch flips through the notebook in reverse, applying the chain rule.

But when you use torch.no_grad(), you‚Äôre saying:

‚ÄúDon‚Äôt write anything in the notebook for a while.‚Äù

So afterward, when you ask it to differentiate, there‚Äôs literally no record of the steps to go back through.

#### ‚öôÔ∏è 4Ô∏è‚É£ Why we still use torch.no_grad()

Even though it blocks gradient tracking, it‚Äôs very useful when you‚Äôre:

evaluating / testing models

generating outputs (no training)

saving memory (since graph tracking uses extra memory)

preventing accidental gradient updates

In [76]:
before = x.grad.clone()
with torch.no_grad():
    y = 3*x**2 + 4*x + 5
    y.backward()    # has no effect
print(torch.equal(x.grad, before))  # should be True

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

#### causes the error because:

with torch.no_grad(): tells PyTorch:

‚ÄúDo not build the computation graph for any operation inside this block.‚Äù

So when you do  

y = 3*x**2 + 4*x + 5,  

that y does not know it came from x.  

It‚Äôs just a plain tensor, not connected to x.grad_fn anymore.

Hence, when you call .backward(), PyTorch says:

‚ÄúSorry, this tensor isn‚Äôt part of a computation graph ‚Äî there‚Äôs nothing to differentiate!‚Äù

That‚Äôs literally what

‚Äúdoes not require grad and does not have a grad_fn‚Äù
means.

#### so what you are telling me is in no grad function we won't be storing the graph of operations so the backward differentiation can't be done since it doesn't or can't get the prev operation that is done on it

You are telling PyTorch:

‚ÄúHey, for the operations inside this block, don‚Äôt bother recording how the result was computed.
Just give me the output values, I don‚Äôt need gradients later.‚Äù

So PyTorch skips building the computation graph ‚Äî
no grad_fn objects are attached, no connections are made, nothing to trace later.

In [56]:
x.grad

tensor(48.)

### 4Ô∏è‚É£ Create a = torch.tensor([2.0,3.0], requires_grad=True)
Compute b = (a**2).sum()  

Find a.grad. (Hint: gradient will be [2a‚ÇÅ, 2a‚ÇÇ].)

In [77]:
a = torch.tensor([2.0, 3.0], requires_grad=True)

In [78]:
b = (a**2).sum()

In [79]:
b.backward()

In [80]:
a.grad

tensor([4., 6.])

# üß† `.backward()` and Autograd ‚Äî How They Relate

‚úÖ **Yes.**
`autograd` is the **system**, and `.backward()` is the **action** that triggers it.

Let‚Äôs think of it like this üëá

| Analogy            | Meaning                                                                                                                 |
| ------------------ | ----------------------------------------------------------------------------------------------------------------------- |
| üß© **Autograd**    | The **engine** that records operations and knows how to compute gradients (it builds and stores the computation graph). |
| ‚öôÔ∏è **.backward()** | The **command** that tells the autograd engine: ‚ÄúOK, now walk this graph in reverse and compute the derivatives.‚Äù       |

So:

```python
# Autograd system is *watching* everything
x = torch.tensor(2.0, requires_grad=True)
y = x**2 + 3*x + 1

# This line tells the autograd engine to start differentiation
y.backward()
```

### Step by step:

1Ô∏è‚É£ `requires_grad=True`
‚Üí Autograd starts **recording** all ops done on `x`.

2Ô∏è‚É£ When you compute `y = ...`,
‚Üí Autograd builds a **computation graph** connecting `x ‚Üí y`.

3Ô∏è‚É£ When you call `y.backward()`,
‚Üí Autograd **traverses the graph backward**, applying the **chain rule** to compute `dy/dx`.

4Ô∏è‚É£ The result is stored in

```python
x.grad
```

‚úÖ So `.backward()` **uses autograd** internally.
Autograd ‚âà the engine; `.backward()` ‚âà pressing the ‚Äúdifferentiate now‚Äù button.

---

# üîç Example showing autograd & backward in action

```python
import torch

x = torch.tensor(3.0, requires_grad=True)
y = 2*x**3 + 4*x + 1     # autograd tracks this computation
print(y.grad_fn)         # shows grad function, means it's tracked

y.backward()             # backward() calls autograd internally
print(x.grad)            # gradient: dy/dx = 6x^2 + 4 = 6*9 + 4 = 58
```

Here:

* `grad_fn` shows that `y` has a recorded operation (proof autograd is tracking).
* `y.backward()` triggers autograd‚Äôs engine to compute the gradient.
* `.grad` holds the computed result.

---

# üí° Summary (one-line mental map)

> **Autograd = system that builds the graph.**
> **.backward() = trigger that makes autograd compute all gradients.**

---

You‚Äôve now fully connected **tensors ‚Üí computation graph ‚Üí autograd ‚Üí backward() ‚Üí gradients** ‚úÖ
That‚Äôs the complete conceptual chain of ‚Äúhow neural networks actually learn.‚Äù

---