In [1]:
import torch

### Quick Review


Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$. Then, the derivative of $f(\boldsymbol{x})$ with respect to the vector $\boldsymbol{x}$ can be described by the derivative with respect to each of the scalar elements $x_i$ in $\boldsymbol{x}$, which is given by $\frac{\partial f(\boldsymbol{x})}{\partial x_i}$ for each $i=1,...,n$. 

We can concatenate the partial derivatives of the multivariate function $f(\boldsymbol{x})$ with respect to each of the elements $x_i$ to obtain a vector that is called the gradient. The $\textbf{gradient}$ is represented as:

$$
\nabla_{\boldsymbol{x}}f(\boldsymbol{x}) = \Big[ \frac{\partial f(\boldsymbol{x})}{\partial x_1}, \frac{\partial f(\boldsymbol{x})}{\partial x_2}, ..., \frac{\partial f(\boldsymbol{x})}{\partial x_d} \Big]
$$

Some useful rules are:

> 1) For all $A \in \mathbb{R}^{m \times n}$, if $f(\boldsymbol{x})=A\boldsymbol{x} \implies \nabla_{\boldsymbol{x}}f(\boldsymbol{x})=A^{'}$ 
> 2) For all $A \in \mathbb{R}^{m \times n}$, if $f(\boldsymbol{x})=\boldsymbol{x}^{'}A \implies \nabla_{\boldsymbol{x}}f(\boldsymbol{x})=A$ 
> 3) For all $A \in \mathbb{R}^{m \times m}$, if $f(\boldsymbol{x})=\boldsymbol{x}^{'}A\boldsymbol{x} \implies \nabla_{\boldsymbol{x}}f(\boldsymbol{x})=(A + A^{'})\boldsymbol{x}$ 
> 4)  If $f(\boldsymbol{x})=\boldsymbol{x}^{'}\boldsymbol{x} \implies \nabla_{\boldsymbol{x}}f(\boldsymbol{x})=2\times\boldsymbol{x}$ 

We can generalize this notion to functions with image on $\mathbb{R}^{m}$. Let $f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$. Then, we define the the $\textbf{jacobian}$ of $f(\boldsymbol{x})$ with respect to $\boldsymbol{x}$, denoted $\boldsymbol{J}$, as:

$$
\boldsymbol{J} = \Big[ \frac{\partial \boldsymbol{f}(\boldsymbol{x})}{\partial x_1}, \frac{\partial \boldsymbol{f}(\boldsymbol{x})}{\partial x_2}, ..., \frac{\partial \boldsymbol{f}(\boldsymbol{x})}{\partial x_d} \Big] = \begin{bmatrix}
    \nabla_{\boldsymbol{x}} f_1^T \\
    \nabla_{\boldsymbol{x}} f_2^T \\
    \vdots \\
    \nabla_{\boldsymbol{x}} f_m^T
\end{bmatrix} = \begin{bmatrix}
    \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\
    \vdots & \vdots & \ddots & \vdots \\
    \frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \cdots & \frac{\partial f_m}{\partial x_n}
\end{bmatrix}
$$

In [2]:
x = torch.arange(4.0, requires_grad=True)
x

tensor([0., 1., 2., 3.], requires_grad=True)

### 1.

Let $\boldsymbol{x} = [0, 1, 2, 3]$ and $f(\boldsymbol{x}) = 2 \times \boldsymbol{x}^{'}\boldsymbol{x}$. Then

$$
\frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}} = 4 \times \boldsymbol{x}
$$

In [3]:
y = 2 * torch.dot(x, x)
y

tensor(28., grad_fn=<MulBackward0>)

In [4]:
y.backward()
x.grad

tensor([ 0.,  4.,  8., 12.])

In [5]:
4 * x == x.grad

tensor([True, True, True, True])

### 2)

Again, let $\boldsymbol{x} = [0, 1, 2, 3]$ and $f(\boldsymbol{x}) = \boldsymbol{1}^{'}\boldsymbol{x}$, that is, $f(\boldsymbol{x})$ is equal to the sum of the elements of $\boldsymbol{x}$. Then

$$
\frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}} = \boldsymbol{1}
$$

where $\boldsymbol{1} = [1, 1, 1, 1]$.

In [6]:
x.grad.zero_()
y = x.sum()
y

tensor(6., grad_fn=<SumBackward0>)

In [7]:
y.backward()
x.grad

tensor([1., 1., 1., 1.])

In [8]:
torch.ones(x.shape[0]) == x.grad

tensor([True, True, True, True])