# Autograd 
### <span style='color:yellow'>Pytorch, fortunately, is equipped with the Autograd package for gradient computation.</span>

### <span style='color:yellow'>This tutorial covers the basics of gradient calculation using the Autograd package.</span>

In [10]:
# Gradient calculation
import torch
x=torch.randn(3)
print(x)

"Suppose that later we want to calculate the gradient of a function with respect to x, thus we need to specify the argument require_grad=True."

x=torch.randn(3,requires_grad=True)
print(x)

tensor([-1.3506, -0.4263,  0.7109])
tensor([ 1.1131, -0.1259,  1.2902], requires_grad=True)


### <span style='color:yellow'>Whenever we perform operations with tensor x, PyTorch will create a computational graph for the gradient computation of that tensor.</span>
$$
\Large{y=x+2}
$$
### <span style='color:yellow'>We used the tensor that requires a gradient, thus a computational graph is created as shown in the following figure.</span>
<img src='gradient_calc.png' width='400'>

### <span style='color:yellow'>The above figure summarizes the backpropagation procedure for gradient calculation.</span>

### <span style='color:yellow'> The first step entails a forward pass to compute y, and because we set require_grad=True, then PyTorch will create the computational graph to compute the gradient. Also, PyTorch creates an attribute for y that can be termed as grad_fn.</span>

### <span style='color:yellow'> The second step entails a backward pass in which the gradient can be calculated using the partial derivative.</span>

In [11]:
#The grad_fn attribute is created for gradient calculation
y=x+2 # 
print(y)

tensor([3.1131, 1.8741, 3.2902], grad_fn=<AddBackward0>)


In [12]:
# The type of of the grad_fn depends on the operation itself, e.g., if it is an addition, multiplication, etc.
z=y*y*2
print(z)

tensor([19.3824,  7.0244, 21.6509], grad_fn=<MulBackward0>)


In [13]:
# The mean grad_fn
z=z.mean()
print(z)

tensor(16.0192, grad_fn=<MeanBackward0>)


In [14]:
# Suppose that z is the desired output, to calculate the gradient of z with respect to x

z.backward()  # dz/dx

# The data x has an attribute to store the gradient termed as 'grad'
print(x.grad)

# Remember to specify require_grad=True to compute the gradient, otherwise you will face a problem. 

tensor([4.1508, 2.4988, 4.3869])


# Vector Jaccobian product
### <span style='color:yellow'>The backward method works in the backend through using the vector Jacobian product</span>

### <span style='color:yellow'>We have the Jacobian matrix with all possible partial derivatives, then we multiply it with the gradient vector</span>

### <span style='color:yellow'>We obtain the final gradient by multiplying the Jacobian matrix with the gradient vector. This concept is called the chain rule</span>

### <span style='color:yellow'>Autograd applies the Jacobian-vector products as shown in the following figure</span>

<img src='jaccob.png' width='800'>

### <span style='color:yellow'>The above z is a scalar value, thus there is no need to add an extra argument in z.backward() to compute the gradient. However, if the result is not a scalar, then we have to specify the argument for the Jacobian vector multiplication and gradient computation</span>




In [15]:
# Let us remove the mean operation from the above z, and check if we can compute the gradient
x=torch.randn(3,requires_grad=True)
y=x+2  
z=y*y*2
#z.backward() 
# The above z.backward() raises RuntimeError: grad can be implicitly created only for scalar outputs

In [16]:
# To fix teh above issue, we have to give the gardient argument v
v=torch.tensor([0.1,1.0,0.001],dtype=torch.float32)
z.backward(v)  # v is the gradient argument
print(x.grad)

# In many different cases, the last operation is the sum operation, and that can aggregate the tensor, leading to converting it to scalars, leading to relaxing the demand on the gradient argument v



tensor([3.9242e-01, 1.0720e+01, 9.0267e-03])


# Gradient tracking prevention



In [18]:
# In some cases, there are no need to calculte the gradient and update the weights
x=torch.randn(3,requires_grad=True)
print(x)

# To do so, there are three approaches approaches:

# 1- x.requires_grad_(False)

x.requires_grad_(False)
print(x) # Thsi will not show require grade attribute

tensor([1.5001, 0.9761, 0.8060], requires_grad=True)
tensor([1.5001, 0.9761, 0.8060])


In [19]:
# 2- x.detach(), this will create a new tensor that does not require the gardient
y=x.detach()
print(y)

tensor([1.5001, 0.9761, 0.8060])


In [21]:
# 3- Wrapping with (context) torch
with torch.no_grad():
    y=x+2
    print(y)

tensor([3.5001, 2.9761, 2.8060])


In [42]:
#  When using backward(), the gradient of the tensor x is accumulated into the `.grad` attribute. In other words, the values are summed up.

weights=torch.ones(4,requires_grad=True)

for epoch in range(10):
    model_output=(weights*10).sum()
    model_output.backward()
    print(weights.grad)
    # We noticed that the gradients are summed and accumulated in the `.grad` method.
    # Therefore, we must empty the gradient as follows:


tensor([10., 10., 10., 10.])
tensor([20., 20., 20., 20.])
tensor([30., 30., 30., 30.])
tensor([40., 40., 40., 40.])
tensor([50., 50., 50., 50.])
tensor([60., 60., 60., 60.])
tensor([70., 70., 70., 70.])
tensor([80., 80., 80., 80.])
tensor([90., 90., 90., 90.])
tensor([100., 100., 100., 100.])


In [22]:
# We noticed that the gradients are summed and accumulated in the `.grad` method.
# Therefore, we must reset the gradient as follows:

weights = torch.ones(4, requires_grad=True)
for epoch in range(10):
    model_output=(weights*10).sum()
    model_output.backward()
    print(weights.grad)
    weights.grad.zero_()

tensor([10., 10., 10., 10.])
tensor([10., 10., 10., 10.])
tensor([10., 10., 10., 10.])
tensor([10., 10., 10., 10.])
tensor([10., 10., 10., 10.])
tensor([10., 10., 10., 10.])
tensor([10., 10., 10., 10.])
tensor([10., 10., 10., 10.])
tensor([10., 10., 10., 10.])
tensor([10., 10., 10., 10.])


In [23]:

# In the next tutorial, we will integrate Autograd with the PyTorch optimizer.
# The optimizer is a PyTorch built-in optimization module that optimizes the gradient.
#optimizer = torch.optim.SGD(weights, lr=0.01)
#optimizer.step()
#optimizer.zero_grad()
