## Autograd
 
Autograd helps in performing automatic differentiation when training a model and using a optimization algorithm like GD.
To enable automatic differentiation we have to define the tensor with this attribute that is requires_grad = True when defining the tensor.

In [37]:
import torch


In [38]:
a = torch.tensor(3.0, requires_grad= True) # leaf tensor

In [39]:
b = a**2

In [40]:
a 

tensor(3., requires_grad=True)

In [41]:
b

tensor(9., grad_fn=<PowBackward0>)

In [42]:
b.backward()   #backward() computes gradients using backpropagation,

In [43]:
a.grad   # and .grad stores the gradient of the output with respect to a leaf tensor.

tensor(6.)

In [44]:
### Example Number 2 

x = torch.randn((4,4), requires_grad = True)



In [45]:
y = x+ 32

In [None]:
z =( y**2 /3)
# z =( y**2 /3).mean() this  will give you a scalar output and  is most used un training neural networks as the grad
# ient values are small and easier to work with

In [None]:
z.backward(torch.ones_like(z))
# z.backward() used this when z is scalar ie herer using the mean method
 # when the output is not a scalar, we need to provide the gradient argument

In [48]:
x.grad

tensor([[20.1285, 21.1137, 21.0638, 20.4828],
        [21.0922, 22.0769, 21.3843, 20.4889],
        [21.5277, 20.7600, 21.1099, 21.3451],
        [20.8075, 21.0698, 20.7387, 22.5073]])

### Clearing Gradients 
This is a important concept that help us to clear the gradients and make them zero 

why do we do this ?

We do this to becoz the gradient each and everytime we run a forward pass the gradients gets stored and when again the forward pass is done the gradient will add up which will affect the output of the model 

eg:- follow this 
1. forward pass  the grad after we calculate the backward would be 4 for exmaple
2. when we again run the forward and the backward pass the gradient would be 8 

why is that ?
the resultant gradient after the backward pass which was 4 will add up to the gradient which was calculated before would add up to result 4 

this would result in large gradients

In [None]:
# to avoid this we can do the following

x1 = torch.tensor(5.0, requires_grad=True)
y1 = x1**3

# this abpove will line of code is forward pass 

In [None]:
y1.backward()  # this is the backward pass

In [None]:
x1.grad # to see the change in the grad value before the we make it zero you have to run the above two code cells 
# again and then run this again to see the accumulation effects

tensor(0.)

In [55]:
x1.grad.zero_() # this will reset the gradients to zero hence solving the accumulation problem

tensor(0.)

### Removing the gradient tracking 

This is done in a scenario when the we no longer the need the gradient ie when the training the model the has completed and now we want to make prediction or evaluate the model.

we can use the following functios to achieve this task 

1. require_grad_(False)
2. detach()
3. with torch.no_grad()

In [57]:
x2 = torch.tensor(5.0, requires_grad=True)
x2

tensor(5., requires_grad=True)

In [58]:
x2.requires_grad_(False)
x2

tensor(5.)

In [59]:
# Method 2 

x2 = x.detach()
x2

tensor([[-1.8073, -0.3294, -0.4043, -1.2758],
        [-0.3617,  1.1154,  0.0764, -1.2666],
        [ 0.2915, -0.8600, -0.3351,  0.0176],
        [-0.7887, -0.3953, -0.8919,  1.7610]])