In [16]:
import torch
x = torch.ones(2,2, requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()
print(out)

tensor(27., grad_fn=<MeanBackward1>)


因为`out`包含单个标量，所以`out.backward（）`等同于`out.backward（torch.tensor（1.））`。

In [17]:
out.backward()

print gradients d(out)/dx

In [19]:
# (x + 2)^2*3 --> 2*3*(1+2) / 4 = 4.5
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


你会得到一个值为 ``4.5``的矩阵. 让我们称 ``out``为
*Tensor* “$o$”.
我们有 $o = \frac{1}{4}\sum_i z_i$,
$z_i = 3(x_i+2)^2$ 和 $z_i\bigr\rvert_{x_i=1} = 27$.
因此,
$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$, 因此
$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.

在数学上，如果你有一个向量值函数 $\vec{y}=f(\vec{x})$,那么$\vec{y}$关于$\vec{x}$的梯度就是是一个雅可比矩阵

\begin{align}J=\left(\begin{array}{ccc}
   \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\
   \vdots & \ddots & \vdots\\
   \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
   \end{array}\right)\end{align}

一般来说，``torch.autograd``是计算雅可比矢量积的引擎。也就是说，给定任何矢量$v=\left(\begin{array}{cccc} v_{1} & v_{2} & \cdots & v_{m}\end{array}\right)^{T}$,计算$J\cdot v$的乘积.

如果$v$刚好是标量函数$l=g\left(\vec{y}\right)$的梯度，也就是$v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}$，那么根据链规则，雅可比矢量积将是$l$相对于 $\vec{x}$的梯度

\begin{align}J\cdot v=\left(\begin{array}{ccc}
   \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\
   \vdots & \ddots & \vdots\\
   \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
   \end{array}\right)\left(\begin{array}{c}
   \frac{\partial l}{\partial y_{1}}\\
   \vdots\\
   \frac{\partial l}{\partial y_{m}}
   \end{array}\right)=\left(\begin{array}{c}
   \frac{\partial l}{\partial x_{1}}\\
   \vdots\\
   \frac{\partial l}{\partial x_{n}}
   \end{array}\right)\end{align}

雅可比矢量积的这种特性使得将外部梯度feed到具有非标量输出的模型中非常方便。

现在让我们来看一个雅可比矢量积的例子：

In [21]:
x = torch.randn(3, requires_grad=True)
print(x)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

tensor([ 2.2441,  1.7583, -1.2006], requires_grad=True)
tensor([1148.9552,  900.2434, -614.6918], grad_fn=<MulBackward0>)


现在在这种情况下，``y``不再是标量。``torch.autograd``无法直接计算完整雅可比行列式，但如果我们只想要雅可比矢量乘积，只需将向量作为参数传递给``backward``：

In [22]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])


我们也可以通过使用``torch.no_grad（）``包装``.requires_grad = True``来停止使用autograd跟踪历史记录：

In [23]:
print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
    print((x ** 2).requires_grad)

True
True
False


**Read Later:**

Documentation of ``autograd`` and ``Function`` is at
https://pytorch.org/docs/autograd