Gradient calculation with Autograd

In [1]:
import torch

2.1 
Here
- requires_grad = True: to optimize the tensor variable (tells pytorch that it needs to calculate gradient wrt this tensor later in  optimization steps)
And
- grad_fn =: tells pytorch to backpropagate later:  
    - for add function (since y=x+3 is add function)[ AddBackward0 dy/dx]
    - for multiplication function (since z= x*3) [MulBackward0 dz/dx]
    - for mean function ( w= z.mean())

In [2]:
x= torch.randn(3, requires_grad=True)
print(x)
y= x+3
print(y)
z= x*3
print(z)
w = z.mean()
print(z)

tensor([-0.9075,  1.5435, -0.3865], requires_grad=True)
tensor([2.0925, 4.5435, 2.6135], grad_fn=<AddBackward0>)
tensor([-2.7224,  4.6304, -1.1594], grad_fn=<MulBackward0>)
tensor([-2.7224,  4.6304, -1.1594], grad_fn=<MulBackward0>)


2.2 Calculating grad_fn ( gradient function) wrt input 
- input: x
- outputs : y,z,w
- gradient function( dz/dx | dy/dx | dw/dx) can only be used if the function is directly or in directly corelated with input ( where required_grad= True)
- .backward(): creates a vector Jacobian product (chain rule) to get the gradients (ie. this function calculates dw/dx then x will store its gradient at x.grad)
- Jacobian product = JM * GV
    - JM ( Jacobian matrix) : elements -> partial derivatives of each output(y) wrt input(x)
    - GV ( Gradient vector)

In [3]:
#2.2.1 grad for scalar outputs(w [since only one element])
w.backward() 
print(x.grad)



tensor([1., 1., 1.])


In [4]:
#2.2.2 grad for vector outputs (like y and z)
v= torch.tensor([0.2, 1.0023, 0.00124], dtype= torch.float32) #defining gradient vector [since y is not scalar]
y.backward(v) #must pass gradient vector as arguement
print(x.grad)

tensor([1.2000, 2.0023, 1.0012])


2.3 prevent creating gradient function (grad_fn) and  stopping from tracking history from computational graph
- Three ways:
    - x.requires_grad_(False) 
    - x.detach()
    - with torch.no_grad():
        #operations
- [x = tensor requiring gradient]

In [5]:
#2.3.1 .requires_grad_(False)
q= torch.randn(3, requires_grad= True) #tensor requiring gradient
print(q)
q.requires_grad_(False)
print(q)


tensor([ 0.9887,  0.5679, -0.5729], requires_grad=True)
tensor([ 0.9887,  0.5679, -0.5729])


In [6]:
#2.3.2 .detach()
q= torch.randn(3, requires_grad= True) #tensor requiring gradient
print(q)
r= q.detach()
print(r)

tensor([-1.3815,  0.1716,  1.0518], requires_grad=True)
tensor([-1.3815,  0.1716,  1.0518])


In [7]:
#2.3.3 with torch.no_grad():
        #operations
q= torch.randn(3, requires_grad= True) #tensor requiring gradient
print(q)
with torch.no_grad():
    r= q+2
    print(r)

tensor([ 0.0115, -0.6556, -0.7330], requires_grad=True)
tensor([2.0115, 1.3444, 1.2670])


        therefore no grad_fn

2.4 About .backward():
- it sums up the gradient value everytime it is called by the same function(fn OR y (y or z or w in case of 2nd cell))

In [8]:
weights = torch.ones(4, requires_grad=True)
for epoch in range(2):
    model_output= (weights*3).sum()
    model_output.backward()
    print(weights.grad)

tensor([3., 3., 3., 3.])
tensor([6., 6., 6., 6.])


- Here gradient is incorrect in all steps except the first one. Therefore we need to empty the gradient after each step of optmization.

After emptying gradient, the gradient is correct for every steps (very important)

In [9]:
weights = torch.ones(4, requires_grad=True)
for epoch in range(2):
    model_output= (weights*3).sum()
    model_output.backward()
    print(weights.grad)
    weights.grad.zero_() #empties gradient/ sets gradient value to zero

tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])


#Extra
- using optimizer provided by pytorch

In [10]:
# weights = torch.ones(4, requires_grad=True)
# optimizer = torch.optim.SGD(weights, lr=0.01) #optimize using Stochastic Gradient Descent
# optimizer.step() #.stepp() -> used for iteration
# optimizer.zero_grad() # clearing gradient

#Conclusion:
to calculate gradient:
- requires_grad = True @ tensor(say x) wrt which gradient is calculated
- call y.backward() to calculate gradient(dy/dx) and store in 'x.grad'
- flush the 'x.grad' with x.grad.zero_()