# 矩阵求导

### 标量导数

- 切线的斜率

### 亚导数

$$
\frac{\partial|x|}{\partial x}= \begin{cases}1 & \text { if } x>0 \\ -1 & \text { if } x<0 \\ a & \text { if } x=0, \quad a \in[-1,1]\end{cases}
$$

$$
\frac{\partial}{\partial x} \max (x, 0)= \begin{cases}1 & \text { if } x>0 \\ 0 & \text { if } x<0 \\ a & \text { if } x=0, \quad a \in[0,1]\end{cases}
$$

### 梯度

- 梯度指向“坡度”最大的方向，即值变化最大的方向
- 标量与列向量的导数

$$
\mathbf{x}=\left[\begin{array}{c}
x_{1} \\
x_{2} \\
\vdots \\
x_{n}
\end{array}\right] \quad \frac{\partial y}{\partial \mathbf{x}}=\left[\frac{\partial y}{\partial x_{1}}, \frac{\partial y}{\partial x_{2}}, \ldots, \frac{\partial y}{\partial x_{n}}\right]
$$

- 列向量与标量的导数

$$
\mathbf{y}^{}=\left[\begin{array}{c}
y_{1} \\
y_{2} \\
\vdots \\
y_{m}
\end{array}\right] \quad \frac{\partial \mathbf{y}}{\partial x}=\left[\begin{array}{c}
\frac{\partial y_{1}}{\partial x} \\
\frac{\partial y_{2}}{\partial x} \\
\vdots \\
\frac{\partial y_{m}}{\partial x}
\end{array}\right]
$$

- 列向量与列向量的导数

$$
\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\left[\begin{array}{c}
\frac{\partial y_{1}}{\partial \mathbf{x}} \\
\frac{\partial y_{2}}{\partial \mathbf{x}} \\
\vdots \\
\frac{\partial y_{m}}{\partial \mathbf{x}}
\end{array}\right]=\left[\begin{array}{c}
\frac{\partial y_{1}}{\partial x_{1}}, \frac{\partial y_{1}}{\partial x_{2}}, \ldots, \frac{\partial y_{1}}{\partial x_{n}} \\
\frac{\partial y_{2}}{\partial x_{1}}, \frac{\partial y_{2}}{\partial x_{2}}, \ldots, \frac{\partial y_{2}}{\partial x_{n}} \\
\vdots \\
\frac{\partial y_{m}}{\partial x_{1}}, \frac{\partial y_{m}}{\partial x_{2}}, \ldots, \frac{\partial y_{m}}{\partial x_{n}}
\end{array}\right]
$$

$$
\begin{array}{l|llll}
\mathbf{y} & \mathbf{a} & \mathbf{x} & \mathbf{A x} & \mathbf{x}^{T} \mathbf{A} \\
\hline \frac{\partial \mathbf{y}}{\partial \mathbf{x}} & \mathbf{0} & \mathbf{I} & \mathbf{A} & \mathbf{A}^{T}
\end{array}
$$

$$
\begin{array}{l|ccc}
\mathbf{y} & a \mathbf{u} & \mathbf{A u} & \mathbf{u}+\mathbf{v} \\
\hline \frac{\partial \mathbf{y}}{\partial \mathbf{x}} & a \frac{\partial \mathbf{u}}{\partial \mathbf{x}} & \mathbf{A} \frac{\partial \mathbf{u}}{\partial \mathbf{x}} & \frac{\partial \mathbf{u}}{\partial \mathbf{x}}+\frac{\partial \mathbf{v}}{\partial \mathbf{x}}
\end{array}
$$


# 自动求导

### 向量链式法则

$$
y=f(u), u=g(x) \quad \frac{\partial y}{\partial x}=\frac{\partial y}{\partial u} \frac{\partial u}{\partial x}
$$

$$
\frac{\partial y}{\partial \mathbf{x}}=\frac{\partial y}{\partial u} \frac{\partial u}{\partial \mathbf{x}} \quad \frac{\partial y}{\partial \mathbf{x}}=\frac{\partial y}{\partial \mathbf{u}} \frac{\partial \mathbf{u}}{\partial \mathbf{x}} \quad \frac{\partial \mathbf{y}}{\partial \mathbf{x}^{}}=\frac{\partial \mathbf{y}}{\partial \mathbf{u}} \frac{\partial \mathbf{u}}{\partial \mathbf{x}}
$$

### 计算图

- 将代码分解成操作子
- 将计算表示为无环图

![img_1](./images/calculation_graph.png)

- 显示构造

```Python
from mxnet import sym

a = sym.var()
b = sym.var()
c = 2 * a + b
```

- 显示构造

```Python
from mxnet import autograd, nd

with autograd.record():
    a = nd.ones((2, 1))
    b = nd.ones((2, 1))
    c = 2 * a + b
```

- 链式法则

$$
\frac{\partial y}{\partial x}=\frac{\partial y}{\partial u_{n}} \frac{\partial u_{n}}{\partial u_{n-1}} \ldots \frac{\partial u_{2}}{\partial u_{1}} \frac{\partial u_{1}}{\partial x}
$$

- 正向积累

$$
\frac{\partial y}{\partial x}=\frac{\partial y}{\partial u_{n}}\left(\frac{\partial u_{n}}{\partial u_{n-1}}\left(\cdots\left(\frac{\partial u_{2}}{\partial u_{1}} \frac{\partial u_{1}}{\partial x}\right)\right)\right)
$$


- 反向积累，或称反向传递

$$
\frac{\partial y}{\partial x}=\left(\left(\left(\frac{\partial y}{\partial u_{n}} \frac{\partial u_{n}}{\partial u_{n-1}}\right) \ldots\right) \frac{\partial u_{2}}{\partial u_{1}}\right) \frac{\partial u_{1}}{\partial x}
$$

- 正向积累
    - 需要存储所有的中间结果
    - 计算复杂度为 $O(n)$
    - 内存复杂度为 $O(n)$
- 反向积
    - 不需要存储中间结果
    - 从相反方向执行
    - 可以除去不需要的枝
    - 计算复杂度为 $O(n)$
    - 内存复杂度为 $O(1)$

![img_2](./images/upward_downward.png)


In [27]:
import torch

x = torch.arange(4.0)
print(x)

tensor([0., 1., 2., 3.])


In [28]:
# x = torch.arange(4.0, requires_grad=True)
x.requires_grad_(True)
print(x.grad)

None


In [29]:
y = 2 * torch.dot(x, x)
print(y)

tensor(28., grad_fn=<MulBackward0>)


In [30]:
y.backward()
print(x.grad)
print(4 * x == x.grad)

tensor([ 0.,  4.,  8., 12.])
tensor([True, True, True, True])


In [31]:
# 默认情况下，PyTorch会累计梯度，我们需要清除之前的值
# '_'代表重写
x.grad.zero_()
y = x.sum()
y.backward()
print(x.grad)

tensor([1., 1., 1., 1.])


In [33]:
# 非标量调用`backward()`需要传入一个`gradient`参数，该参数指定微分函数
x.grad.zero_()
y = x * x
# y.backward(torch.ones(len(x)))
y.sum().backward()
print(x.grad)

tensor([0., 2., 4., 6.])


In [34]:
x.grad.zero_()
y = x * x
# 把u当成常数而不是一个关于x的函数
u = y.detach()
z = u * x

z.sum().backward()
print(x.grad)
print(x.grad == u)

tensor([0., 1., 4., 9.])
tensor([True, True, True, True])


In [35]:
x.grad.zero_()
y.sum().backward()
print(x.grad)
print(x.grad == 2 * x)

tensor([0., 2., 4., 6.])
tensor([True, True, True, True])


In [214]:
def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    if b.sum() > 0:
        c = b
    else:
        c = 100 * b
    return c

a = torch.randn(size=(), requires_grad=True)
d = f(a)
d.backward()

print(a.grad)
print(a.grad == d / a)

tensor(1024.)
tensor(True)
