# Differential Calculus

## Derivative

For scalar or vector-valued functions of a single variable:

$$
y = f(x), \; f: x \in \mathbb{R} \to y \in \mathbb{R}^p
$$

$$
f'(x) := \lim_{\Delta x \to 0} \frac{f(x+\Delta x)-f(x)}{\Delta x} \in \mathbb{R}^p
$$

$$
\varepsilon(x, \Delta x) := \left(\frac{f(x+\Delta x)-f(x)}{\Delta x} - f'(x)\right) \frac{\Delta x}{|\Delta x|} \to 0 \; \mbox{ when } \; \Delta x \to 0.
$$

$$
f(x + \Delta x) = f(x) + f'(x) \Delta x + \varepsilon(x, \Delta x) |\Delta x|
$$

$$
f(x + \Delta x) = f(x) + f'(x) \Delta x + o(\Delta x)
$$

## Partial Derivative

For functions of several variables:

$$
y = f(x), \; f: x \in \mathbb{R}^m \to y \in \mathbb{R}^p
$$

$$
y = (f_1(x), \dots, f_i(x), \dots, f_p(x))
$$


$$
\partial_i f(x) = \frac{\partial f}{\partial x_i}(x) 
:= 
\lim_{\Delta x_i \to 0} \frac{f(x_1, \dots, x_{i-1}, x_i + \Delta x_i, x_{i+1}, \dots)-f(x)}{\Delta x_i} \in \mathbb{R}^p
$$

## Jacobian Matrix

$$
f: x \in \mathbb{R}^m \to y \in \mathbb{R}^p
$$


$$
J_f(x) :=
\left[
\begin{array}{cccc}
\vert & \vert & \cdots & \vert \\
\partial_1 f (x) & \partial_2 f (x) & \cdots & \partial_n f (x) \\
\vert & \vert & \cdots & \vert \\
\end{array}
\right] \in \mathbb{R}^{p\times m}.
$$

$$
J_f(x) :=
\left[
\begin{array}{cccc}
\partial_1 f_1 (x) & \partial_2 f_1 (x) & \cdots & \partial_n f_1 (x) \\
\partial_1 f_2 (x) & \partial_2 f_2 (x) & \cdots & \partial_n f_2 (x) \\
\vdots & \vdots & \vdots & \vdots \\
\partial_1 f_m (x) & \partial_2 f_m (x) & \cdots & \partial_n f_m (x) \\
\end{array}
\right]
$$




$$
[J_f(x)]_{ij} := \partial_{j} f_i(x),
$$

```{note} Exercise
Show that for every $x=(x_1, x_2)$ de $\R^2$, the jacobian matrix of the function
$$
f:(x_1, x_2) \in \R^2 \mapsto (-2(x_2^2 - x_1) + 2 (x_1 - 1), 4 (x_2^2 - x_1)x_2) \in \R^2.
$$
is defined and satisfies
$$
J_f(x_1, x_2) = 
\left[ 
  \begin{array}{cc}
  4 & -4x_2 \\
  -4x_2 & 12 x_2^2
  \end{array}
  \right]\in \R^{2 \times 2}.
$$
```

## Gradient

For functions of several variables but a single value:

$$
y = f(x), \; f: x \in \mathbb{R}^m \to y \in \mathbb{R}
$$

The gradient of $f$ at $x$ is defined as:
$$
\nabla f(x) := (\partial_1 f(x), \partial_2 f(x), \dots, \partial_n f(x)) \in \R^n.
$$
If we identify the vector to a column vector, the gradient of $f$ at $x$ is
the transposed jacobian matrix of $f$ at $x$:
$$
\nabla f(x) = J_f(x)^{\top} = 
\left[ 
\begin{array}{c}
\partial_1 f(x) \\
\partial_2 f(x) \\
\vdots \\
\partial_n f(x)
\end{array}
\right] \in \R^{n\times 1}.
$$

```{note} Worked example

Both partial functions of

$$
f:(x_1,x_2) \in \R^2 \mapsto (x_2^2 - x_1)^2 + (x_1 - 1)^2 \in \R
$$ 

have a derivative ; 
they satisfy $\partial_1 f(x_1, x_2) = -2(x_2^2 - x_1) + 2 (x_1 - 1)$ and 
$\partial_2 f(x_1, x_2) = 4 (x_2^2 - x_1)x_2$. 
Therefore

$$
J_f(x_1, x_2) = 
\left[ 
  \begin{array}{cc}
  -2(x_2^2 - x_1) + 2 (x_1 - 1) &
  4 (x_2^2 - x_1)x_2
  \end{array}
  \right] \in \R^{1 \times 2}.
$$

Its gradient is given by

$$
\nabla f(x_1, x_2)
=
(-2(x_2^2 - x_1) + 2 (x_1 - 1), 4 (x_2^2 - x_1)x_2) \in \R^2
$$

or, represented as a column vector:

$$
\nabla f(x_1, x_2) = J_f(x_1, x_2)^{\top} =
\left[ 
  \begin{array}{c}
  -2(x_2^2 - x_1) + 2 (x_1 - 1) \\
  4 (x_2^2 - x_1)x_2
  \end{array}
  \right] \in \R^{2\times 1}.
$$
```

## Differential

$$
y = f(x), \; f: x \in \mathbb{R}^m \to y \in \mathbb{R}^p
$$


The function $f$ is differentiable at $x$ if the jacobian matrix is defined at $x$ and

$$
f(x + \Delta x) = f(x) + J_f(x) \Delta x + o(\Delta x).
$$

In other words, if
$$
\frac{f(x + \Delta x) - f(x) - J_f(x) \Delta x}{\|\Delta x\|} \to 0 \; 
\mbox{ when } \; \Delta x  \to 0.
$$

In this case, the jacobian-vector product (jvp) of $f$ at $x$ is the series expansion of order $1$ of $f$ at $x$. We denote $df(x)$ the differential of $f$ at $x$ the function that associate to a vector the corresponding jacobian-vector product. 
$$
df(x)(\Delta x) := J_f(x) \Delta x
$$

## Chain Rule

```{important} Chain Rule
If 
$f: \mathbb{R}^p \to \mathbb{R}^{m}$ and
$g: \mathbb{R}^m \to \mathbb{R}^{r}$ 
are both differentiable, the composite function $g \circ f$ is differentiable and
$$
d(g \circ f)(x) = dg(f(x)) \circ df(x)
$$

The jacobian matrix of $g \circ f$ at $x$ satisfies:
$$
J_{g \circ f}(x) = J_{g}(f(x)) J_f(x)
$$
```

```{hint} Implementation
Concretely, if there are three variables $x$, $y$ and $z$, such that
$$
y = f(x) \; \mbox{ and } \; z= g(y),
$$
to compute the first-order variation of $z$ with respect to $x$, one can compute
$$
\Delta y = J_f(x) \Delta x
\;
\mbox{ and then }
\;
\Delta z  = J_g(y) \Delta y. 
$$
```
