# **AutoGrad**

**Autograd** (automatic differentiation) is PyTorch’s engine that automatically computes gradients (partial derivatives) for tensor computations, by recording operations on tensors in a dynamic computation graph and then running backprop when you call .backward().
​

**Why autograd is required:**
Training neural networks uses gradient-based optimization (e.g., gradient descent), which needs gradients like ∂loss/∂θ for millions of parameters. Autograd removes the need to manually derive and code these derivatives, which becomes impractical for deep/complex networks.
​

**Key points to remember**


*   Only tensors with requires_grad=True are tracked for gradients, and during .backward() gradients are accumulated into .grad only for leaf tensors with requires_grad=True.
*   Gradients accumulate by default across multiple .backward() calls, so you usually clear them each iteration (optimizer.zero_grad() or param.grad = None).


*   Use torch.no_grad() (or detach()) during inference to avoid tracking and save memory/compute.

**Example 1:** y = x^2

In [2]:
import torch

In [3]:
x = torch.tensor(3.0, requires_grad=True) # requires_grad by default is False and setting it True means we need gradient value of "x"

In [4]:
print(x)

tensor(3., requires_grad=True)


In [5]:
y = x ** 2

In [6]:
print(y)

tensor(9., grad_fn=<PowBackward0>)


In [7]:
y.backward() # it will does backward propagation with respect to x (because we set requires_grad true in x)

In [8]:
x.grad # it will return the gradient value of x after brackward propagation

tensor(6.)

**Example 2: ** z = sin(x^2)

In [9]:
x = torch.tensor(4.0, requires_grad=True)
print(x)

tensor(4., requires_grad=True)


In [10]:
y = x **2
print(y)

tensor(16., grad_fn=<PowBackward0>)


In [11]:
z = torch.sin(y)
print(z)

tensor(-0.2879, grad_fn=<SinBackward0>)


In [12]:
z.backward()

In [13]:
x.grad

tensor(-7.6613)

**Example 3:** calculating gradient of w and b of a simple nn

In [14]:
x = torch.tensor(6.7)
y = torch.tensor(0.0)

In [16]:
w = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)

In [17]:
print(w)
print(b)

tensor(1., requires_grad=True)
tensor(0., requires_grad=True)


In [18]:
z = w * x + b
print(z)

tensor(6.7000, grad_fn=<AddBackward0>)


In [19]:
y_pred = torch.sigmoid(z)
print(y_pred)

tensor(0.9988, grad_fn=<SigmoidBackward0>)


In [20]:
def binary_cross_entropy_loss(predicted, target):
  epsilon = 1e-8 # to prevent log(0)
  predicted = torch.clamp(predicted, epsilon, 1-epsilon)

  return -(target * torch.log(predicted) + (1-target) * torch.log(1-predicted))

In [21]:
loss = binary_cross_entropy_loss(y_pred, y)
print(loss)

tensor(6.7012, grad_fn=<NegBackward0>)


In [22]:
loss.backward()

In [23]:
print(w.grad)
print(b.grad)

tensor(6.6918)
tensor(0.9988)


# **Example 4:** y = x^2 where x is vector

In [24]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
print(x)

tensor([1., 2., 3.], requires_grad=True)


In [27]:
y = (x ** 2).mean()
print(y)

tensor(4.6667, grad_fn=<MeanBackward0>)


In [28]:
y.backward()

In [29]:
print(x.grad)

tensor([0.6667, 1.3333, 2.0000])


# **Clearning Grad**
We need to clear grad after calculation otheriwse gradient will accumulate with the previous gradient value.
Means: Suppord I got,
 **x_grad = 4** after calculating forward prop of y = x^2
Now if I don't clear the grad and run the forward prop again at the end
**x_grad** will become **8** by adding x_grad with the previous x_grad **(4 + 4) = 8**. And it will keep adding 4 with the previous grad after each iteration. That is why we need to clear grad.

In [30]:
# clearning grad
x = torch.tensor(4.0, requires_grad=True)

In [31]:
y = x ** 2

In [32]:
y.backward()

In [33]:
print(x.grad)

tensor(8.)


In [34]:
print(x.grad.zero_()) # it will clear the grad or I can say set the grad 0 before new iteration

tensor(0.)


# **Disable Gradient Tracking**

In [36]:
x = torch.tensor(2.0, requires_grad=True)
print(x)

tensor(2., requires_grad=True)


In [37]:
y = x ** 2
print(y)

tensor(4., grad_fn=<PowBackward0>)


In [38]:
y.backward()

In [40]:
x.grad

tensor(4.)

In [41]:
# option 1: requires_grad_(False)

x.requires_grad_(False)
print(x) # removes requires_grad true means it will not keep track of the gradient

tensor(2.)


In [42]:
y = x ** 2
print(y) # y also removed backward tracking, so if I run y.backward() after that I will get error for no grad track.

tensor(4.)


In [43]:
# option 2: detach()

x = torch.tensor(2.0, requires_grad=True)
print(x)

z = x.detach() #This will create a copy of x and assign to z without grad tracking.
print(z)

tensor(2., requires_grad=True)
tensor(2.)


In [44]:
y = z ** 2
print(y) # will get the same y value without backprop tracking for grad.

tensor(4.)


In [45]:
# option 3: torch.no_grad()

x = torch.tensor(2.0, requires_grad=True)
print(x)

tensor(2., requires_grad=True)


In [47]:
with torch.no_grad():
  y = x ** 2 # it will not track y for backprop. so y.backward() will not works for  this y, ultimatly we can Disable grad of x here.

print(y)

tensor(4.)
