# Matrix Calculus
> **Refs**
>
> [Linear Algebra: Matrix Calculus](https://www.colorado.edu/engineering/CAS/courses.d/IFEM.d/IFEM.AppC.d/IFEM.AppC.pdf)
>
> [The Matrix Calculus You Need For Deep Learning](http://explained.ai/matrix-calculus/index.html)


## Review Scalar derivative rules

Rule | fucntion | derivative notation with respect to $x$ | example for the rule
--- | --- | --- | ---
Constant | $c$ | $0$ | $\frac{d}{d x} 99 = 0$
Multiplication by constant | $cf$ | $c \frac{df}{dx}$ | $\frac{d}{dx} 9x = 9$
Power Rule | $x^n$ | $n x^{n-1}$ | $\frac{d}{dx} x^3 = 3x^2$
Sum Rule | $f + g$ | $\frac{df}{dx} + \frac{dg}{dx}$ | $(x^2 + 3x)^\prime = 2x + 3$
Product Rule | $fg$ | $f^\prime g + f g^\prime$ | $(x \sin x)^\prime = \sin x + x \cos x$
Chain Rule | $f(g(x))$ | $\frac{df(g(x))}{dg(x)} \frac{d g(x)}{dx}$ | $(\ln x^2)^\prime = \frac{1}{x^2} 2x = \frac{2}{x}$

## 向量函数(Vector Function) 的导数(Derivative)

$$\boldsymbol{x} = \left[ \begin{matrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{matrix} \right], \boldsymbol{y} = \left[ \begin{matrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{matrix} \right]$$

**Defs**

Derivative of vector with respect to vector

$$\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}} \stackrel{\mathsf{def}}{=} \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots &\frac{\partial y_1}{\partial x_n} \\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots &\frac{\partial y_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots &\frac{\partial y_m}{\partial x_n}\end{matrix} \right]$$

> 这里的偏导数矩阵就是 Jacobian Matrix, 不过有些教材是 $J^\top$

**Jacobian Matrix**

对于 $\boldsymbol{f}: \mathbb{R}^n \rightarrow \mathbb{R}^m$

$$\boldsymbol{J}_{i,j} = \frac{\partial f_i}{\partial x_j}$$

$$\boldsymbol{J} = \left[ \begin{matrix} \frac{\partial \boldsymbol{f}}{\partial x_1} & \cdots & \frac{\partial \boldsymbol{f}}{\partial x_n} \end{matrix} \right] = \left[ \begin{matrix} \nabla f_1(\boldsymbol{x}) \\ \vdots \\ \nabla f_m(\boldsymbol{x}) \end{matrix} \right] = \left[ \begin{matrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{matrix} \right]$$

Jacobian Determinant is the determinant of Jacobian Matrix(**iif $m = n$**).

> Jacobian Determinant can be used to variable transformation(e.g., from spherical to Cartesian coordinates)


Similarly the following


$$\frac{\partial y}{\partial \boldsymbol{x}} \stackrel{\mathsf{def}}{=} \left[ \begin{matrix} \frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} & \cdots & \frac{\partial y}{\partial x_n}\end{matrix} \right] $$



$$\frac{\partial \boldsymbol{y}}{\partial x} \stackrel{\mathsf{def}}{=} \left[ \begin{matrix} \frac{\partial y_1}{\partial x} \\ \frac{\partial y_2}{\partial x} \\ \vdots \\ \frac{\partial y_m}{\partial x}\end{matrix} \right]$$

### useful vector derivative formulas

$\boldsymbol{y}$ | $\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}}$
--- | ---
$\boldsymbol{Ax}$ | $\boldsymbol{A}^\top$
$\boldsymbol{x}^\top \boldsymbol{A}$ | $\boldsymbol{A}$
$\boldsymbol{x}^\top \boldsymbol{x}$ | $2 \boldsymbol{x}$
$\boldsymbol{x}^\top \boldsymbol{Ax}$ | $\boldsymbol{Ax} + \boldsymbol{A}^\top \boldsymbol{x}$

### Chain Rule for Vector Function

$$\boldsymbol{x} = \left[ x_1 \cdots x_n \right]^\top \\
\boldsymbol{y} = \left[y_1 \cdots y_r \right]^\top \\
\boldsymbol{z} = \left[ z_1 \cdots z_m \right]^\top \\
\frac{\partial \boldsymbol{z}}{\partial \boldsymbol{x}} =  \left[ \begin{matrix} \frac{\partial z_1}{\partial x_1} & \frac{\partial z_1}{\partial x_2} & \cdots &\frac{\partial z_1}{\partial x_n} \\ \frac{\partial z_2}{\partial x_1} & \frac{\partial z_2}{\partial x_2} & \cdots &\frac{\partial z_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial z_m}{\partial x_1} & \frac{\partial z_m}{\partial x_2} & \cdots &\frac{\partial z_m}{\partial x_n}\end{matrix} \right]
$$

由定义:

$$\frac{\partial z_i}{\partial x_j} = \sum_{q=1}^r \frac{\partial z_i}{\partial y_q} \frac{\partial y_q}{\partial z_j}$$

所以

$$\frac{\partial \boldsymbol{z}}{\partial \boldsymbol{x}} = \left[\begin{matrix} \sum \frac{\partial z_1}{\partial y_q} \frac{\partial y_q}{\partial x_1} & \cdots & \sum \frac{\partial z_1}{\partial y_q} \frac{\partial y_q}{\partial x_n} \\  \vdots & \ddots & \vdots \\ \sum \frac{\partial z_m}{\partial y_q} \frac{\partial y_q}{\partial x_1} & \cdots & \sum \frac{\partial z_m}{\partial y_q} \frac{\partial y_q}{\partial x_n} \end{matrix}\right]\\
= \left[\begin{matrix}\frac{\partial z_1}{\partial y_1} & \cdots & \frac{\partial z_1}{\partial y_r} \\ \vdots & \ddots & \vdots \\ \frac{\partial z_m}{\partial y_1} & \cdots & \frac{\partial z_m}{\partial y_r}\end{matrix}\right] \left[\begin{matrix}\frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \vdots & \ddots & \vdots \\ \frac{\partial y_r}{\partial x_1} & \cdots & \frac{\partial y_r}{\partial x_n} \end{matrix} \right] \\
= \frac{\partial \boldsymbol{z}}{\partial \boldsymbol{y}} \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}}$$

Forward differentiation | Backward differentiation
--- | ---
$\frac{d y}{d x} = \frac{d u}{d x} \frac{d y}{d u}$ | $\frac{d y}{d x} = \frac{d y}{d u} \frac{d u}{d x}$

## The deriavtive of scalar function of a matrix

Let $\boldsymbol{X}$ be a matrix of order $(m \times n)$.

$$y = f(\boldsymbol{X})$$

Gradient matrix

$$\boldsymbol{G} = \frac{\partial y}{\partial \boldsymbol{X}} = \left[\begin{matrix} \frac{\partial y}{\partial x_{11}} & \cdots & \frac{\partial y}{\partial x_{1n}} \\
    \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{n1}} & \cdots & \frac{\partial y}{\partial x_{nn}} \end{matrix} \right] = \sum_{i,j} \boldsymbol{E}_{i,j} \frac{\partial y}{\partial x_{i,j}}$$
    
> 矩阵 $\boldsymbol{E}_{i,j}$ denotes the **elementary matrix**(has all zero entries except for the $(i, j)$ entry) of order $(m \times n)$

## Matrix Differential

For a scalar function, the differential with respect to n-vector $\boldsymbol{x}$ is defined to:

$$df = \sum_{i=1}^n \frac{\partial f}{\partial x_i} d x_i$$

the differential of an $m \times n$ matrix $\boldsymbol{X} = [x_{ij}]$:

$$d \boldsymbol{X} = \left[\begin{matrix}d x_{11} & d x_{12} & \cdots & d x_{1n} \\
d x_{21} & d x_{22} & \cdots & x_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
d x_{m1} & d x_{m2} & \cdots & d x_{mn}
\end{matrix}\right]$$

muliplicative and associative rules:

$$d (\alpha \boldsymbol{X}) = \alpha d \boldsymbol{X}, d(\boldsymbol{X} + \boldsymbol{Y}) = d(\boldsymbol{X}) + d(\boldsymbol{Y})$$

if $\boldsymbol{X}$ and $\boldsymbol{Y}$ are **product-conforming matrices**, then

$$d(\boldsymbol{XY}) = (d \boldsymbol{X}) \boldsymbol{Y} + \boldsymbol{X} d(\boldsymbol{Y})$$