<a href="https://colab.research.google.com/github/Bipin-Gouda/DeepLearning/blob/main/PyTorchYT2%2C3_Autograd%2CBackpropagation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Autograd (does automatic gradience calculation in pytorch)

In [None]:
import torch

In [None]:
x=torch.randn(3, requires_grad=True)   #torch.rand_like()   # look
print(x,x.dtype)

tensor([ 0.1929, -0.2860,  2.6335], requires_grad=True) torch.float32


- `torch.rand()` is for Uniform distribution (in the half-open interval [0.0, 1.0))
- `torch.randn()` is for Standard Normal (aka. Gaussian) distribution (mean 0 and variance 1)

if we are  calculating gradients of a function wrt tensor x then we must specify as requires_grad=True for tensor x

In [None]:
y=x+2                   
print(y)
z=y*y*2
print(z)
z=z.mean()
print(z)

# here backward propagation is being used and gradient of y is calaulated wrt x ,
# grad  of z calculated wrt x ( finally z depends on x) 

tensor([2.1929, 1.7140, 4.6335], grad_fn=<AddBackward0>)
tensor([ 9.6172,  5.8754, 42.9378], grad_fn=<MulBackward0>)
tensor(19.4768, grad_fn=<MeanBackward0>)


In [None]:
z.backward()         # calculates gradient wrt x - dz/dx
print(x.grad)        # prints gradients of this tensor

# we can only get x.grad because z is a scalar value else we would have to enter a vector in z.backward()


tensor([2.9238, 2.2853, 6.1779])


- Error if `requires_grad` not set to True
- to get the gradients it does vector 'jacobian product' `(J.v, J= Jacobian matrix, v = vector)` to get the gradients (dz/dx)
- Jacobian matrix is a matrix of partial derivatives. Jacobian is the determinant of the jacobian matrix. The matrix will contain all partial derivatives of a vector function. The main use of Jacobian is found in the transformation of coordinates. It deals with the concept of differentiation with coordinate transformation.
- Note - Gradients help us to find the minima which we need for optimisation

In [None]:
y=x+2                   
print(y)
z=y*y*2
print(z)      #tensor([ 3.8336, 41.3203, 11.0452], grad_fn=<MulBackward0>)
v=torch.tensor([0.1, 1.0,0.001], dtype=torch.float32)
z.backward(v)     # as z is not a scalar we will need to enter a vector of same size as J.v  (jacobian matrix of z * vector v)
print(x.grad)


# most of time we will have scalar value as output so z.backward() will work noramlly else enter a vector

tensor([2.1929, 1.7140, 4.6335], grad_fn=<AddBackward0>)
tensor([ 9.6172,  5.8754, 42.9378], grad_fn=<MulBackward0>)
tensor([3.8010, 9.1412, 6.1965])


 - whenever we call the `backward()` function the gradient of the tensor will be accumlated  in the `.grad` attribute ( the values wil be sumed up)

# How to prevent PyTorch from tracking the history and calculating this `grad_fn=<AddBackward0>`, <MulBackward0> etc attribute when not required

(eg when we need to update our weighs during our training loop and this operation (+-* either ) must not be a part of gradient computation

In [None]:
# There are 3 methods

# x.requires_grad_(False)
# x.detach()
# with torch.no_grad():

In [None]:
x.requires_grad_(False)   # note requires_grad=True not showing anymore in op

tensor([ 0.1929, -0.2860,  2.6335])

In [None]:
print(y)
y=x.detach()
print(y)

tensor([2.1929, 1.7140, 4.6335], grad_fn=<AddBackward0>)
tensor([ 0.1929, -0.2860,  2.6335])


In [None]:
with torch.no_grad():
  z=x+2
  print(z)

tensor([2.1929, 1.7140, 4.6335])


- `Tensor.detach()` method in PyTorch is used to separate a tensor from the computational graph by returning a new tensor that doesn’t require a gradient. If we want to move a tensor from the Graphical Processing Unit (GPU) to the Central Processing Unit (CPU), then we can use detach() method. It will not take any parameter and return the detached tensor.
- same with other two functions

NOTE-  - whenever we call the `backward()` function the gradient of the tensor will be accumlated  in the `.grad` attribute ( the values wil be sumed up) eg-

In [None]:
weights = torch.ones(4, requires_grad=True)

for epoch in range(3):                   # let a training loop

  model_output= (weights*3).sum()        # just a dummy operation to simulate some model output 

  model_output.backward()

  print(weights.grad)                # values accumulated which can give wrong results

  weights.grad.zero_()              # to set the gradients to zero before next iteration to prevent 
  


tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])


- we will look at optimizers later properly

In [None]:
 # Trying inbuilt optimizers
weights = torch.ones(4, requires_grad=True)
weights2 = torch.ones(5, requires_grad=True)

#cumu=[weights,weights2]

optimizer = torch.optim.SGD(weights,lr=0.01)
optimizer.step()
optimizer.zero_grad()
 


TypeError: ignored

# BACKPROPAGATION