# Differentiation
**Author:** Alejandro Sánchez Yalí

In this chapter, we review key differentiation concepts. In particular, we emphasize on the fundamental role played by
linear approximations in the context of numerical differentiation. We also discuss the concept of automatic
differentiation, which is a powerful tool for computing derivatives of functions implemented in computer programs.

## Univariate differentiation
### Derivatives
Before studying derivatives, we recall the definition of function continuity.

<div class="definition"><p>Definition 1.1. Continuous function</p> 
  <P>A function $f: \mathbb{R} \rightarrow \mathbb{R}$ is continuous at a point $x_0$ if $$\lim_{x \to x_0} f(x) = f(x_0).$$
  A function $f$ is continuous if it is continuous at every point in its domain.</p>
</div>

In the following, we use Landau's notation to describe the behavior of functions near a point. We write $$f(x) = o\big(g(x)\big) \quad \text{as} \quad x \to x_0$$
if $$\lim_{x \to x_0} \frac{|f(x)|}{|g(x)|} = 0.$$

That is, $f(x)$ is much smaller than $g(x)$ as $x$ approaches $x_0$. For example, $f$ is continuous at $x_0$ 
if $$f(x_0 + \delta) = f(x_0) + o(1) \quad \text{as} \quad \delta \to 0.$$

We now introduce the concept of derivative. Consider a function $f: \mathbb{R} \rightarrow \mathbb{R}$ and a point
$x_0$ in its domain. Its value on an interval $[x_0, x_0 + h]$ can be approximated by the secant between $\big(x_0, f(x_0)\big)$
and $\big(x_0 + h, f(x_0 + h)\big)$. The slope of this **secant** is given by the difference quotient $$\frac{f(x_0 + h) -
f(x_0)}{h}.$$ In the limit of an infinitesimal $h$, the secant converges to the **tangent** at $\big(x_0, f(x_0)\big)$. The slope
of this tangent is the derivative of $f$ at $x_0$, denoted by $f'(x_0)$. The definition below
formalizes this intuition.

<div class="definition"><p>Definition 1.2. Derivative</p>
  <p>The derivative of a function $f: \mathbb{R} \rightarrow \mathbb{R}$ at a point $x_0$ is defined as 
  $$f'(x_0) = \lim_{h \to 0} \frac{f(x_0 + h) - f(x_0)}{h}.$$ If $f'(x_0)$ is well-defined at a particular $x_0$, 
  we say that the function $f$ is differentiable at $x_0$.</p>
</div>

Here, and in the following definitions, if $f$ is differentiable at any $x$, we say that it is **differentiable
everywhere** of simply **differentiable**. If $f$ is differentiable at a given $x$, then it is necessarily continuous at
$x$.

<div class="theorem"><p>Theorem 1.1. Differentiability implies continuity</p> 
  <P>If a function $f: \mathbb{R} \rightarrow \mathbb{R}$ is differentiable at a point $x_0$, then it is continuous at $x_0$.
  </p>
</div>

*Proof.* The proof follows from the definition of derivative. We have $$f(x_0 + h) = f(x_0) + f'(x_0)h + o(h) \quad
\text{as} \quad h \to 0.$$ Since $f'(x_0)h + o(h) = o(1)$ as $h \to 0$, we have that  $$\lim_{h \to 0} |f(x_0 + h) -
f(x_0)| = 0.$$ Therefore, $f$ is continuous at $x_0$.

In addition to enabling the computation of the slope of a function at a point, the derivative provides information about
the **mononicity** of $f$ near that point. For example, if $f'(x_0) > 0$, then $f$ is increasing near $x_0$. If $f'(x_0) < 0$,
then $f$ is decreasing near $x_0$. If $f'(x_0) = 0$, then $f$ has a local extremum at $x_0$. Such information can be
used to develop iterative algorithms to minimize or maximize $f$ by computing iterates of the form $$x_{n+1} = x_n - \alpha f'(x_n),$$
where $\alpha$ is a step size. If $\alpha > 0$, the algorithm converges to a local minimum of $f$. If $\alpha < 0$, it
converges to a local maximum. If $f$ is convex, the algorithm converges to the global minimum. 

For several elementary functions, the derivative can be computed analytically. 



<div class="example"><p>Example 1.1. Derivative of power function</p> 
  <P>The derivative of $f(x) = x^n$ for $x \in \mathbb{R}$ and $n \in \mathbb{N}\setminus \{0\}$ is given by $f'(x) = nx^{n-1}$. In fact, we consider $f(x) = x^n$ for $x \in \mathbb{R}$ and $n \in \mathbb{N}\setminus \{0\}$. We have $$f(x + h) = (x + h)^n = \sum_{k=0}^n \binom{n}{k} x^{n-k} h^k.$$ Therefore, $$f(x + h) - f(x) = \sum_{k=0}^n \binom{n}{k} x^{n-k} h^k - x^n = \sum_{k=1}^n \binom{n}{k} x^{n-k} h^k.$$ Dividing by $h$ and taking the limit as $h \to 0$, we obtain $$f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h} = \lim_{h \to 0} \sum_{k=1}^n \binom{n}{k} x^{n-k} h^{k-1} = nx^{n-1}.$$
  </p>
</div>
<div class="remark"><p>Remark 1.1. Functions on a subset $U$ of $\mathbb{R}$</p> 
  <P>For simplicity, we consider functions $f: \mathbb{R} \rightarrow \mathbb{R}$. However, the concept of derivative can be extended to functions defined on a subset $U$ of $\mathbb{R}$. If a fuction $f: U \rightarrow \mathbb{R}$ is defined on a subset $U$ of $\mathbb{R}$, as it is thecase for $f(x) = \sqrt{x}$, defined on $U=\mathbb{R}^+$, ...
  </p>
</div>
