# 自动求导
In this course we will learn about the automatic derivation mechanism in PyTorch. Auto-derivation is a very important feature in PyTorch, which allows us to avoid manually calculating very complex derivatives, which can greatly reduce the time we build the model. It is also a feature that its predecessor, Torch, does not have. Let's take a look at the unique charm of PyTorch's automatic derivation and explore more uses for auto-derivation.


In [1]:
import torch
from torch.autograd import Variable

## Simple situation automatic derivation
Below we show some simple cases of automatic derivation, "simple" is reflected in the calculation results are scalar, that is, a number, we automatically deduct this scalar.


In [2]:
x = Variable(torch.Tensor([2]), requires_grad=True)
y = x + 2
z = y ** 2 + 3
print(z)

Variable containing:
 19
[torch.FloatTensor of size 1]



Through the above column operations, we get the final result out from x, we can represent it as a mathematical formula

$$
z = (x + 2)^2 + 3
$$

Then the result of our derivation from z to x is

$$
\frac{\partial z}{\partial x} = 2 (x + 2) = 2 (2 + 2) = 8
$$
If you are unfamiliar with the guide, you can check out the following [URL for review] (https:


In [3]:
# Using automatic derivation
z.backward()
print(x.grad)

Variable containing:
 8
[torch.FloatTensor of size 1]



For a simple example like this, we verified the automatic derivation and found that it is very convenient to use automatic derivation. If it's a more complicated example, then manual derivation can be very troublesome, so the auto-derivation mechanism can help us save the troublesome mathematics. Let's look at a more complicated example.


In [4]:
x = Variable(torch.randn(10, 20), requires_grad=True)
y = Variable(torch.randn(10, 5), requires_grad=True)
w = Variable(torch.randn(20, 5), requires_grad=True)

Out = torch.mean(y - torch.matmul(x, w)) # torch.matmul is doing matrix multiplication
out.backward()

If you are unfamiliar with matrix multiplication, you can check out the [URL for review] below (https:


In [5]:
# Get the gradient of x
print(x.grad)

Variable containing:

Columns 0 to 9 
-0.0600 -0.0242 -0.0514  0.0882  0.0056 -0.0400 -0.0300 -0.0052 -0.0289 -0.0172
-0.0600 -0.0242 -0.0514  0.0882  0.0056 -0.0400 -0.0300 -0.0052 -0.0289 -0.0172
-0.0600 -0.0242 -0.0514  0.0882  0.0056 -0.0400 -0.0300 -0.0052 -0.0289 -0.0172
-0.0600 -0.0242 -0.0514  0.0882  0.0056 -0.0400 -0.0300 -0.0052 -0.0289 -0.0172
-0.0600 -0.0242 -0.0514  0.0882  0.0056 -0.0400 -0.0300 -0.0052 -0.0289 -0.0172
-0.0600 -0.0242 -0.0514  0.0882  0.0056 -0.0400 -0.0300 -0.0052 -0.0289 -0.0172
-0.0600 -0.0242 -0.0514  0.0882  0.0056 -0.0400 -0.0300 -0.0052 -0.0289 -0.0172
-0.0600 -0.0242 -0.0514  0.0882  0.0056 -0.0400 -0.0300 -0.0052 -0.0289 -0.0172
-0.0600 -0.0242 -0.0514  0.0882  0.0056 -0.0400 -0.0300 -0.0052 -0.0289 -0.0172
-0.0600 -0.0242 -0.0514  0.0882  0.0056 -0.0400 -0.0300 -0.0052 -0.0289 -0.0172

Columns 10 to 19 
-0.0372  0.0144 -0.1074 -0.0363 -0.0189  0.0209  0.0618  0.0435 -0.0591  0.0103
-0.0372  0.0144 -0.1074 -0.0363 -0.0189  0.0209  0.0618  0.0435

In [6]:
# Get the gradient of y
print(y.grad)

Variable containing:
1.00000e-02 *
  2.0000  2.0000  2.0000  2.0000  2.0000
  2.0000  2.0000  2.0000  2.0000  2.0000
  2.0000  2.0000  2.0000  2.0000  2.0000
  2.0000  2.0000  2.0000  2.0000  2.0000
  2.0000  2.0000  2.0000  2.0000  2.0000
  2.0000  2.0000  2.0000  2.0000  2.0000
  2.0000  2.0000  2.0000  2.0000  2.0000
  2.0000  2.0000  2.0000  2.0000  2.0000
  2.0000  2.0000  2.0000  2.0000  2.0000
  2.0000  2.0000  2.0000  2.0000  2.0000
[torch.FloatTensor of size 10x5]



In [7]:
# Get the gradient of w
print(w.grad)

Variable containing:
 0.1342  0.1342  0.1342  0.1342  0.1342
 0.0507  0.0507  0.0507  0.0507  0.0507
 0.0328  0.0328  0.0328  0.0328  0.0328
-0.0086 -0.0086 -0.0086 -0.0086 -0.0086
 0.0734  0.0734  0.0734  0.0734  0.0734
-0.0042 -0.0042 -0.0042 -0.0042 -0.0042
 0.0078  0.0078  0.0078  0.0078  0.0078
-0.0769 -0.0769 -0.0769 -0.0769 -0.0769
 0.0672  0.0672  0.0672  0.0672  0.0672
 0.1614  0.1614  0.1614  0.1614  0.1614
-0.0042 -0.0042 -0.0042 -0.0042 -0.0042
-0.0970 -0.0970 -0.0970 -0.0970 -0.0970
-0.0364 -0.0364 -0.0364 -0.0364 -0.0364
-0.0419 -0.0419 -0.0419 -0.0419 -0.0419
 0.0134  0.0134  0.0134  0.0134  0.0134
-0.0251 -0.0251 -0.0251 -0.0251 -0.0251
 0.0586  0.0586  0.0586  0.0586  0.0586
-0.0050 -0.0050 -0.0050 -0.0050 -0.0050
 0.1125  0.1125  0.1125  0.1125  0.1125
-0.0096 -0.0096 -0.0096 -0.0096 -0.0096
[torch.FloatTensor of size 20x5]



The above mathematical formula is more complicated. After matrix multiplication, the corresponding elements of the two matrices are multiplied, and then all the elements are averaged. Interested students can manually calculate the gradient. Using PyTorch's automatic derivation, we can easily get x. The derivatives of y and w, because deep learning is full of a large number of matrix operations, so we have no way to manually find these derivatives, with automatic derivation can easily solve the problem of network updates.


## Automated Derivation of Complex Situations
Above we show the automatic derivation in simple cases, which are all automatic derivation of scalars. You may have a question, how to automatically derive a vector or matrix? Interested students can try it first. Below we will introduce the automatic derivation mechanism for multidimensional arrays.


In [8]:
m = Variable(torch.FloatTensor([[2, 3]]), requires_grad=True) #Build a 1 x 2 matrix
n = Variable(torch.zeros(1, 2)) #Build a 0 matrix of the same size
print(m)
print(n)

Variable containing:
 2  3
[torch.FloatTensor of size 1x2]

Variable containing:
 0  0
[torch.FloatTensor of size 1x2]



In [9]:
# Calculate the value of the new n by the value in m
n[0, 0] = m[0, 0] ** 2
n[0, 1] = m[0, 1] ** 3
print(n)

Variable containing:
  4  27
[torch.FloatTensor of size 1x2]



Write the above formula into a mathematical formula, you can get
$$
n = (n_0,\ n_1) = (m_0^2,\ m_1^3) = (2^2,\ 3^3) 
$$

Below we directly propagate n backproper, that is, the derivative of n to m.

At this time we need to define the definition of this derivative, that is, how to define

$$
\frac{\partial n}{\partial m} = \frac{\partial (n_0,\ n_1)}{\partial (m_0,\ m_1)}
$$


In PyTorch, if you want to call auto-derivation, you need to pass a parameter to `backward()`, which has the same shape as n, such as $(w_0,\ w_1)$, then the result of auto-derivation. Is:
$$
\frac{\partial n}{\partial m_0} = w_0 \frac{\partial n_0}{\partial m_0} + w_1 \frac{\partial n_1}{\partial m_0}
$$
$$
\frac{\partial n}{\partial m_1} = w_0 \frac{\partial n_0}{\partial m_1} + w_1 \frac{\partial n_1}{\partial m_1}
$$

In [10]:
N.backward(torch.ones_like(n)) # takes (w0, w1) as (1, 1)


In [11]:
print(m.grad)

Variable containing:
  4  27
[torch.FloatTensor of size 1x2]



By automatically deriving we got the gradients 4 and 27, we can check it out
$$
\frac{\partial n}{\partial m_0} = w_0 \frac{\partial n_0}{\partial m_0} + w_1 \frac{\partial n_1}{\partial m_0} = 2 m_0 + 0 = 2 \times 2 = 4
$$
$$
\frac{\partial n}{\partial m_1} = w_0 \frac{\partial n_0}{\partial m_1} + w_1 \frac{\partial n_1}{\partial m_1} = 0 + 3 m_1^2 = 3 \times 3^2 = 27
$$
By checking we can get the same result


## Multiple automatic derivation
By calling backward we can do an automatic derivation. If we call backward again, we will find that the program reports an error and there is no way to do it again. This is because PyTorch defaults to an automatic derivation, the calculation graph is discarded, so two automatic derivation needs to manually set a thing, we use the following small example to illustrate.


In [12]:
x = Variable(torch.FloatTensor([3]), requires_grad=True)
y = x * 2 + x ** 2 + 3
print(y)

Variable containing:
 18
[torch.FloatTensor of size 1]



In [13]:
Y.backward(retain_graph=True) # Set retain_graph to True to keep the calculation graph


In [14]:
print(x.grad)

Variable containing:
 8
[torch.FloatTensor of size 1]



In [15]:
Y.backward() # Do another automatic derivation, this time does not retain the calculation graph


In [16]:
print(x.grad)

Variable containing:
 16
[torch.FloatTensor of size 1]



It can be seen that the gradient of x becomes 16, because there are two automatic derivations, so the first gradient 8 and the second gradient 8 add up to a result of 16.


**Little exercises**

definition

$$
x = 
\left[
\begin{matrix}
x_0 \\
x_1
\end{matrix}
\right] = 
\left[
\begin{matrix}
2 \\
3
\end{matrix}
\right]
$$

$$
k = (k_0,\ k_1) = (x_0^2 + 3 x_1,\ 2 x_0 + x_1^2)
$$

We hope to get

$$
j = \left[
\begin{matrix}
\frac{\partial k_0}{\partial x_0} & \frac{\partial k_0}{\partial x_1} \\
\frac{\partial k_1}{\partial x_0} & \frac{\partial k_1}{\partial x_1}
\end{matrix}
\right]
$$

Reference answer:

$$
\left[
\begin{matrix}
4 & 3 \\
2 & 6 \\
\end{matrix}
\right]
$$

In [6]:
x = Variable(torch.FloatTensor([2, 3]), requires_grad=True)
k = Variable(torch.zeros(2))

k[0] = x[0] ** 2 + 3 * x[1]
k[1] = x[1] ** 2 + 2 * x[0]

In [7]:
print(k)

Variable containing:
 13
 13
[torch.FloatTensor of size 2]



In [8]:
j = torch.zeros(2, 2)

k.backward(torch.FloatTensor([1, 0]), retain_graph=True)
j[0] = x.grad.data

X.grad.data.zero_() # Gradient obtained before returning to zero

k.backward(torch.FloatTensor([0, 1]))
j[1] = x.grad.data

In [9]:
print(j)


 4  3
 2  6
[torch.FloatTensor of size 2x2]



In the next lesson we will introduce two neural network programming methods, dynamic graph programming and static graph programming.
