## Autograd

In [2]:
import torch
from torch.autograd import Variable

## Automatic derivation of simple 
Below we show some simple cases of automatic derivation, "simple" is reflected in the calculation results are scalar, that is, a number, we automatically deduct this scalar.

In [4]:
x = Variable(torch.Tensor([2]), requires_grad=True)
y = x + 2
z = y ** 2 + 3
print(z)

tensor([19.], grad_fn=<AddBackward>)


Through the above column operations, we get the final result out from x, we can represent it as a mathematical formula

$$
z = (x + 2)^2 + 3
$$
 
Then the result of our derivation from z to x is
$$
∂ z∂ x= 2 ( x+ 2 ) = 2 ( 2 + 2 ) = 8
∂z∂x=2(x+2)=2(2+2)=8
 $$


In [5]:
#using automatic derviation
z.backward()
print(x.grad)

tensor([8.])


For a simple example like this, we verified the automatic derivation and found that it is very convenient to use automatic derivation. If it's a more complicated example, then manual derivation can be very troublesome, so the auto-derivation mechanism can help us save the troublesome mathematics. Let's look at a more complicated example.

In [6]:
x = Variable(torch.randn(10, 20), requires_grad=True)
y = Variable(torch.randn(10, 5), requires_grad=True)
w = Variable(torch.randn(20, 5), requires_grad=True)

out = torch.mean(y - torch.matmul(x, w)) # torch.matmul doing matrix multiplication
out.backward()

In [7]:
# get the gradient of x
print(x.grad)

tensor([[-0.0423,  0.0249,  0.0351, -0.0131,  0.0017,  0.0439, -0.0653,  0.0279,
          0.0104,  0.0057,  0.0304,  0.0337, -0.0452, -0.0746, -0.0282, -0.0258,
         -0.0400,  0.0103,  0.0762,  0.1168],
        [-0.0423,  0.0249,  0.0351, -0.0131,  0.0017,  0.0439, -0.0653,  0.0279,
          0.0104,  0.0057,  0.0304,  0.0337, -0.0452, -0.0746, -0.0282, -0.0258,
         -0.0400,  0.0103,  0.0762,  0.1168],
        [-0.0423,  0.0249,  0.0351, -0.0131,  0.0017,  0.0439, -0.0653,  0.0279,
          0.0104,  0.0057,  0.0304,  0.0337, -0.0452, -0.0746, -0.0282, -0.0258,
         -0.0400,  0.0103,  0.0762,  0.1168],
        [-0.0423,  0.0249,  0.0351, -0.0131,  0.0017,  0.0439, -0.0653,  0.0279,
          0.0104,  0.0057,  0.0304,  0.0337, -0.0452, -0.0746, -0.0282, -0.0258,
         -0.0400,  0.0103,  0.0762,  0.1168],
        [-0.0423,  0.0249,  0.0351, -0.0131,  0.0017,  0.0439, -0.0653,  0.0279,
          0.0104,  0.0057,  0.0304,  0.0337, -0.0452, -0.0746, -0.0282, -0.0258,
      

In [8]:
#get the gradient of y
print(y.grad)

tensor([[0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200],
        [0.0200, 0.0200, 0.0200, 0.0200, 0.0200]])


In [9]:
#get the gradient of w
print(w.grad)

tensor([[-0.0205, -0.0205, -0.0205, -0.0205, -0.0205],
        [-0.0415, -0.0415, -0.0415, -0.0415, -0.0415],
        [ 0.0406,  0.0406,  0.0406,  0.0406,  0.0406],
        [ 0.0505,  0.0505,  0.0505,  0.0505,  0.0505],
        [-0.0064, -0.0064, -0.0064, -0.0064, -0.0064],
        [ 0.0154,  0.0154,  0.0154,  0.0154,  0.0154],
        [ 0.1067,  0.1067,  0.1067,  0.1067,  0.1067],
        [-0.0853, -0.0853, -0.0853, -0.0853, -0.0853],
        [-0.1139, -0.1139, -0.1139, -0.1139, -0.1139],
        [ 0.0826,  0.0826,  0.0826,  0.0826,  0.0826],
        [-0.0833, -0.0833, -0.0833, -0.0833, -0.0833],
        [ 0.0150,  0.0150,  0.0150,  0.0150,  0.0150],
        [-0.0652, -0.0652, -0.0652, -0.0652, -0.0652],
        [ 0.0135,  0.0135,  0.0135,  0.0135,  0.0135],
        [-0.0435, -0.0435, -0.0435, -0.0435, -0.0435],
        [ 0.1030,  0.1030,  0.1030,  0.1030,  0.1030],
        [ 0.0537,  0.0537,  0.0537,  0.0537,  0.0537],
        [-0.0519, -0.0519, -0.0519, -0.0519, -0.0519],
        [-

The above mathematical formula is more complicated. After matrix multiplication, the corresponding elements of the two matrices are multiplied, and then all the elements are averaged. Interested students can manually calculate the gradient. Using PyTorch's automatic derivation, we can easily get x. The derivatives of y and w, because deep learning is full of a large number of matrix operations, so we have no way to manually find these derivatives, with automatic derivation can easily solve the problem of network updates.