In [1]:
import torch

# 1) Calculating gradient using autograd

In [2]:
# Creating a tensor with gradient tracking
x = torch.tensor(3.0, requires_grad = True)

# Defining a function
y = x**2
z = torch.sin(y)

# Computing gradient of root node(z) w.r.t leaf node(x)
z.backward()

# Calculated derivative of z w.r.t x, gets sotred in x.grad
print(x.grad)
print(y.grad)

tensor(-5.4668)
None


  print(y.grad)


# 2) Calculating gradient for intermediate nodes (like y, in above example)


In [3]:
# Defining the function again
y = x**2
y.retain_grad()
z = torch.sin(y)

z.backward(retain_graph = True)   # we have to put "retain_graph = True", when we are calling the .backward() operation again on the same graph

print(x.grad)
print(y.grad)

tensor(-10.9336)
tensor(-0.9111)


## 2.1) Gradients Accumulate by Default in PyTorch

In PyTorch, the `.grad` attribute does **not** reset automatically after each backward pass. Instead, gradients are **accumulated** (added up) by default.  

This behavior is very useful during training (e.g., summing gradients over multiple mini-batches), but it can cause confusion during experiments if you forget to clear them.  
👉 So, when you compute derivatives multiple times, the gradients get **accumulated**.  

---

### 🔹 Example Walkthrough  

**First backward pass:**  

$$
\frac{dz}{dx} = \cos(x^2) \cdot 2x \approx -5.4668
$$  

**Second backward pass (same function again):**  

PyTorch adds the new gradient to the old one:  

$$
-5.4668 + (-5.4668) = -10.9336
$$  

---

### 🔹 How to Fix This  
Always clear gradients **before** calling `.backward()` again if you want fresh values:  

```
x.grad = None      # or x.grad.zero_()

y.grad = None

z.backward()
```

In [4]:
# Defining the function again
y = x**2
y.retain_grad()
z = torch.sin(y)

# Clearing the gradients
x.grad = None
y.grad = None

z.backward(retain_graph = True)   # we have to put "retain_graph = True", when we are calling the .backward() operation again on the same graph

print(x.grad)
print(y.grad)

tensor(-5.4668)
tensor(-0.9111)


## 2.2) Calculating intermediate gradients on completely new tensor


In [5]:
# Creating same tensor with different name
a = torch.tensor(3.0, requires_grad = True)                         # a --> x

# Define function
b = a**2                       # b --> y
b.retain_grad()
c = torch.sin(b)               # c --> z

# Call backward pass; calculates and store the gradients
c.backward()                                                  # Here we don't need to pass "retain_graph = True"

# Print the calculated derivatives/gradients
print(a.grad)                                     # a.grad --> x.grad
print(b.grad)                                     # b.grad --> y.grad

tensor(-5.4668)
tensor(-0.9111)


# 3) Disabling Gradient Calculation in PyTorch

By default, PyTorch tracks operations on tensors with `requires_grad=True` to build a computation graph for backpropagation.  
Sometimes, you don’t want gradients (e.g., during inference, or when freezing parts of a model).  

There are **three main ways** to disable gradient calculation:



## 3.1)  Option 1: `requires_grad_(False)`
- Permanently tells PyTorch to **stop tracking gradients** for this tensor.  
- The tensor becomes a "leaf" without gradient history.  

In [6]:
a = torch.tensor(3.0, requires_grad = True)

print(a)

tensor(3., requires_grad=True)


In [7]:
# Disabling gradients

a.requires_grad_(False)                # now gradients won't be tracked

print(a)

tensor(3.)


## 3.2) Option 2: `.detach()`
- Returns a **new tensor** that shares the same data as the original but **without gradients**.  
- The original tensor is unaffected.  
- Useful for intermediate computations where you want to "cut off" the gradient flow.


In [8]:
a = torch.tensor(3.0, requires_grad = True)

b = a.detach()                        # b is a tensor without gradient tracking

print(a)
print(b)

tensor(3., requires_grad=True)
tensor(3.)


## 3.3) Option 3: `torch.no_grad()`

- A **context manager** that disables gradient tracking for everything inside its block.  
- Commonly used during **inference** or **evaluation** (to save memory and computation).  

In [9]:
a = torch.tensor(3.0, requires_grad = True)

with torch.no_grad():                      # No gradient tracking under this block
    b = a**2

c = torch.sin(a)

print(a)
print(b)
print(c)

tensor(3., requires_grad=True)
tensor(9.)
tensor(0.1411, grad_fn=<SinBackward0>)
