## Calculus Review
#### Differentiation
The key to calculus (in fact the fundamental theorem of calculus) states:

$$
\frac{d}{dx}\int_{a}^{x} f(t)dt = f(x)
$$

or if you prefer:

$$
\int_{a}^{b} f(x)dx = F(b) - F(a)
$$

This is important because it equates integral and differential calculus.

The definition of a derivative is:

$$
\frac{dy}{dx} = \lim_{h\to 0}\frac{f(x+h) - f(x)}{h}
$$

Understanding the definition of a derviative is important because computers are by nature discrete machines and thus cannot deal with continuous differentiation or integration. It instead uses approximation with h becoming incredibly small.

#### Chain Rule
Differentiation is important, but as functions get complicated, it becomes important to remember how to differentiate functions with multiple components. This is where chain rule comes into play. It is used extensively in machine learning (see backpropagation for an example). It can be defined as:

$$
\frac{dy}{dx} = \frac{dy}{du} \frac{du}{dx}
$$

To put this into perspective, if a mathematical model has multiple internal parameters that need to be updated (i.e. different weights) then we need to update the error calculation with respect to the individual weight. This will result in different paths and requires using the chain rule. Let's take a look at a more concrete example:

#### Gradients
Gradients are used all over in system identification. The easiest way to think about it is the partial derivative of each variable in a with a multivariable function:

$$
\nabla f(p) = \begin{bmatrix}
               \frac{\partial f}{\partial x_1} (p) \\
               . \\
               . \\
               . \\
               \frac{\partial f}{\partial x_n} (p)
              \end{bmatrix}
$$

In hyperdimensional planes, this gradient is used to calculate the direction and fastest increase (or fastest decrease if the vector is negated). This is where gradient descent, hill climbing, etc. is derived.

#### Optimization
Back in Calculus I/II the term critical points was likely used a lot. These denoted maxima or minima of a function (local or global). These exist when the derivative of a function is 0. Because of the 0 slope, this means these points are either saddle points, maxima, or minima. All are important areas of study and points to watch out for in machine learning and system identification as they can create traps that our models can get stuck in.  