In [1]:
import torch
import torch.autograd as autograd

# torch.autograd

**Desciption**

``torch.autograd``提供了能够对任意标量函数自动求导的类和函数；目前支持自动求导的`Tensor`数据类型有浮点型 (half, float, double, bfloat16) 和复数型 (cfloat, cdouble)

**PACKAGE CONTENTS**

- \_functions (package)
- anomaly_mode
- function
- functional
- grad_mode
- gradcheck
- profiler
- variable

**CLASSES**
- autograd.function.\_ContextMethodMixin(builtins.object)
- autograd.function.\_HookMixin(builtins.object)
- torch.\_C.\_FunctionBase(builtins.object)
    - autograd.function.Function(<br>
        torch.\_C.\_FunctionBase,<br>
        autograd.function.\_ContextMethodMixin,<br>
        autograd.function.\_HookMixin)
- torch.\_C.\_LegacyVariableBase(builtins.object)
    - autograd.variable.Variable


**FUNCTIONS**

- backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)

**DATA**
- \_\_all__ = ['Variable', 'Function', 'backward', 'grad_mode']

**FILE**: \torch\autograd\\\_\_init__.py

#  autograd包
这是一个 define-by-run framework，这意味着反向传播由 how your code is run 来定义，并且每一轮迭代都可能不同。
#### Tensor
`torch.Tensor`是这个包的 central package，当设定某变量的属性`requires_grad=True` 时，pytorch 便会追踪该变量上的计算

In [3]:
x = torch.tensor([[1., 2.], [3., 4.]])
x.requires_grad_(True)
out = torch.mean(x * x)
print(out)
print(out.grad_fn)

tensor(7.5000, grad_fn=<MeanBackward0>)
<class 'MeanBackward0'>


通过调用`<输出变量>.backward()`函数以自动完成backpropagation

In [6]:
out.backward()

对于中间某一变量，可用`<变量名>.grad`属性来得到该变量处的梯度值

In [None]:
print(x.grad)
print(x.data)

对于网络输出值为一大于1维的张量的情况，如$y=(y_1,y_2,\cdots,y_m)^T$，`out.backward()`执行计算
$$v_T\cdot J$$
其中 $v$ 为人为取定，
$$J=\left(\begin{array}{ccc}
\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}} \\
\vdots & \ddots & \vdots \\
\frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{array}\right)$$
其意义为，若$v$巧好是某个标量函数$l=g(\vec{y})$的梯度，即 $v=\left(\frac{\partial l}{\partial y_{1}} \cdots \frac{\partial l}{\partial y_{m}}\right)^{T}$，则 $$J^{T} \cdot v=\left(\begin{array}{ccc}
\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}} \\
\vdots & \ddots & \vdots \\
\frac{\partial y_{1}}{\partial x n} & \cdots & \frac{\partial y_{m}}{\partial x n}
\end{array}\right)\left(\begin{array}{c}
\frac{\partial l}{\partial y_{1}} \\
\vdots \\
\frac{\partial l}{\partial y_{m}}
\end{array}\right)=\left(\begin{array}{c}
\frac{\partial l}{\partial x_{1}} \\
\vdots \\
\frac{\partial l}{\partial x n}
\end{array}\right)$$
即为函数 $l$ 对 $\vec{x}$ 的梯度值

In [None]:
x = torch.tensor([1., 2., 3.], requires_grad=True)
y = x*x
v = torch.tensor([1., 1/2., 1/3.])
y.backward(v)
print(x.grad)

若不需要某变量追踪历史，可以通过 `<变量名>.detach()`to detach it from the computation history，也可以用`with torch.no_grad():`实现。该操作在评估网络性能很有帮助。

In [None]:
y = x.detach()
print(y.requires_grad)
print(x.requires_grad)
print(x.eq(y).all())
with torch.no_grad():
    print((x**2).requires_grad)

### Automatic differentiation
在 PyTorch 中，自动求导机制主要依靠`torch.autograd` API 来完成；利用自动求导机制，模型会在前向传播时自动记录每个单位模块处的导数值；模型在反向传播时，只需将先前计算的所有梯度值按照求导的链式法则进行计算，便可得到目标参数处的导数，以用于参数更新，进而完成反向传播整个过程；