矩阵计算
=====

#### 1. 这里的矩阵计算包含几个层次
#### $y = f(x)$, $y = f(\mathbf{x})$, $\mathbf{y} = f(x)$, $\mathbf{y}=f(\mathbf{x})$
#### 分别是标对标、标对向、向对标、向对向
#### 他们的导数分别可以表示为：
#### $\frac{\partial{y}}{\partial{x}}$, $\frac{\partial{y}}{\partial{\mathbf{x}}}$, $\frac{\partial{\mathbf{y}}}{\partial{x}}$, $\frac{\partial{\mathbf{y}}}{\partial{\mathbf{x}}}$

### $\partial{y}/\partial{\mathbf{x}}$

#### $\mathbf{x} = \begin{bmatrix}x_1\\ x_2 \\ ... \\ x_n \end{bmatrix}$, $
\frac{\partial{y}}{\partial{\mathbf x}} 
=
\begin{bmatrix} 
\frac{\partial{y}}{\partial{x_1}},
\frac{\partial{y}}{\partial{x_2}},
...,
\frac{\partial{y}}{\partial{x_n}}\\
\end{bmatrix}$

### $\partial{\mathbf{y}}/\partial{x}$

#### $\mathbf{\mathbf y} = \begin{bmatrix}y_1\\ y_2 \\ ... \\ y_n \end{bmatrix}$, $
\frac{\partial{\mathbf y}}{\partial{x}} 
=
\begin{bmatrix} 
\frac{\partial{y_1}}{\partial{x}}\\
\frac{\partial{y_2}}{\partial{x}}\\
...\\
\frac{\partial{y_n}}{\partial{x}}\\
\end{bmatrix}$

### $\partial{\mathbf{y}}/\partial{\mathbf{x}}$

#### $\mathbf{x} = \begin{bmatrix}x_1\\ x_2 \\ ... \\ x_n \end{bmatrix}$, $\mathbf{\mathbf y} = \begin{bmatrix}y_1\\ y_2 \\ ... \\ y_m \end{bmatrix}$
#### $
\frac{\partial\mathbf{y}}{\partial{\mathbf x}} 
=
\begin{bmatrix} 
\frac{\partial{y_1}}{\partial{x_1}}\\
\frac{\partial{y_2}}{\partial{x_2}}\\
...\\
\frac{\partial{y_m}}{\partial{x_n}}\\
\end{bmatrix}
=
\begin{bmatrix} 
\frac{\partial{y_1}}{\partial{x_1}},
\frac{\partial{y_1}}{\partial{x_2}},
...,
\frac{\partial{y_1}}{\partial{x_n}}\\
\frac{\partial{y_2}}{\partial{x_1}},
\frac{\partial{y_2}}{\partial{x_2}},
...,
\frac{\partial{y_2}}{\partial{x_n}}\\
...\\
\frac{\partial{y_m}}{\partial{x_1}},
\frac{\partial{y_m}}{\partial{x_2}},
...,
\frac{\partial{y_m}}{\partial{x_n}}\\
\end{bmatrix}$

#### 2. 导数链式法则

#### 矩阵求导实例1

#### 假设：$\mathbf{x,\space w}\in \mathbb R^n, y \in \mathbb R
        $,  $\space z = (<\mathbf {x, w}> - y)^2$

#### 计算 ：$\frac{\partial z}{\partial{\mathbf {w}}}$

#### 令 $a = <\mathbf {x, w}>, \space b = a - y, \space z = b^2$

#### 进行求导：
##### $\begin{aligned}
            \frac{\partial y}{\partial{\mathbf w}}
             &= \frac{\partial y}{\partial b} 
             \frac{\partial b}{\partial a} \frac{\partial a}{\partial w}\\
             &= \frac{\partial b^2}{\partial b} 
             \frac{\partial {(a-y)}}{\partial a} \frac{\partial {<\mathbf {x, w}}>-y}{\partial w}\\
             &= 2b · 1 · x^T\\
             &= 2(<\mathbf {x,w}>-y)x^T
             \end{aligned}$

#### 使用pytorch实现自动求导

In [1]:
import torch

#### 先创建了一个向量$\mathbf x^T$,接下来要求$\mathbf{x^Tx}$的导数

In [2]:
x = torch.arange(4.0)
x

tensor([0., 1., 2., 3.])

#### 在计算导数之前，需要有一个地方来存梯度
#### 此处先将x.requires_grad_设置为True，然后x.grad就存了导数

In [3]:
x.requires_grad_(True)
x.grad

In [11]:
y = 2 * torch.dot(x, x)
y


tensor(28., grad_fn=<MulBackward0>)

#### 通过调用反向传播来自动计算y关于$\mathbf {x}$每个分量的梯度

In [12]:
y.backward()
x.grad

tensor([ 0., 12., 24., 36.])

In [6]:
x.grad == 4 * x

tensor([True, True, True, True])

#### 注意，PyTorch会累计梯度，所以说在再次运算前，需要清除梯度

In [13]:
x.grad.zero_()
y = x.sum() ## 等价于x_1 + x_2 + x_3 + x_4
y.backward()
x.grad

tensor([1., 1., 1., 1.])

In [15]:
x.grad.zero_()
y = x * x
y.sum().backward()## 等价于x_1^2 + x_2^2 + x_3^2 + x_4^2
x.grad

tensor([0., 2., 4., 6.])

In [16]:
x.grad.zero_()
y = x * x
u = y.detach()## 将y设置成为一个和x无关的值
z = u * x

z.sum().backward()
x.grad == u

tensor([True, True, True, True])

In [17]:
x.grad.zero_()
y.sum().backward()
x.grad == 2 * x

tensor([True, True, True, True])