# Unconstrained Non-Linear Programming
## One dimensional problems set-up
Let us first consider just one decision variable. The objective is to search a maximum or minimum of a non-linear function. Without loss of generalisation, let us consider that the objective is to minimize the objective function:

$\min f(x) \quad x \in \mathbb{R}$

Any (local or global) minimum of $f(x)$, $x^*$ will have the following properties:

1. **First derivative test**: The first derivative of a function at a point $x^*$ represents the slope of a line tangent to $f(x)$ at $x^*$. In a minimum, the first derivative is equal to zero, $\frac{df(x)}{dx}=0$, meaning that the tangent is horizontal.

2. **Second derivative test**: The second derivative is positive $\frac{d^2f(x)}{dx^2}$, meaning that the slope of the tangent is increasing and therefore, it must be less than zero for $x < x^*$, meaning that the function decreases before $x^*$, and greater than zero for $x > x^*$, meaning that the function increases after $x^*$. Therefore $f(x)$ takes its minimum value at $x^*$, at least in its proximity.

By definition,  $x^*$ is a critical point. For a maximum, the second derivative is negative and using the same logic, $f(x)$ decreases in the proximity of $x^*$.

### Bi-section algorithm
Since the sign of the first derivative changes at a critical point, it is possible to search a critical point if we find two initial points such that the sign of the first derivative is different. The simplest algorithm that uses this property is known as the bi-section method and it can be formulated as this: 

**Initialisation**

1. Choose an acceptable error $\epsilon$
2. Find an initial left point $x_L$ such that $\frac{df(x_L)}{dx} < 0$
3. Find an initial right point $x_R$ such that $\frac{df(x_R)}{dx} > 0$

**Iteration**
1. Bisect the range $[x_L, x_R] \quad x'=\frac{𝑥_𝑅−𝑥_𝐿}{2}$
2. If $\frac{df(x')}{dx} < 0$ then make $𝑥_𝐿=x'$, else if $\frac{df(x')}{dx} < 0$ then make $𝑥_𝑅=x'$
3. If $𝑥_𝑅−𝑥_𝐿 \leq 2\epsilon$ then the solution is $\frac{𝑥_𝑅−𝑥_𝐿}{2}$, else repeat

That is, we start with two initial values for which the first derivative has different sign. If the function is continuously differentiable, then there must be a value somewhere in between these values where the first derivative is zero. Then, we make a bi-section and continue looking in the part of the range for which there is a change in the sign of the first derivative. The process stops either when we find exactly a critical point or when the distance between both points is less than the acceptable error. 

## Multi-dimensional problems set-up
For multi-dimensional problems the logic applied is the same, except that in this case it is necessary to consider partial derivatives in the different decision variables or dimensions of the problem. 
Thus, given an n-dimensional function: 

$f(x) \quad x=[x_1, x_2, ..., x_n]$ 

The **gradient vector** is defined as a vector containing the first partial derivatives: 

$\nabla f(x) = \left[\begin{array}{c}
\dfrac{\partial f(x)}{\partial x_1}\\
\dfrac{\partial f(x)}{\partial x_2}\\
\vdots \\
\dfrac{\partial f(x)}{\partial x_n}
\end{array}\right]
$

The gradient represents the slopes of the tangent planes to $f(x)$ in every dimension. For a minimum, the gradient must be zero, as an extension of the first derivative test of one dimensional problems.

1. **Gradient test** The gradient must be zero at a critical point $x^*$ $\nabla f(x^*) = [0, ..., 0]^T$.

Now, to know if the function decreases **in every direction** around $x^*$, we need ot define the **Hessian**:

$\text{Hess} f(x)= 
  \begin{bmatrix}
    \frac{d^2f(x)}{dx_1^2} & \frac{d^2f(x)}{dx_1dx_2} & ... & \frac{d^2f(x)}{dx_1dx_n} \\
    \frac{d^2f(x)}{dx_2^dx_1} &\frac{d^2f(x)}{dx_2^2} & ... & \frac{d^2f(x)}{dx2dx_n} \\
    \vdots \\
    \frac{d^2f(x)}{dx_n^dx_1} &\frac{d^2f(x)}{dx_ndx_2} & ... & \frac{d^2f(x)}{dx_n^2}
  \end{bmatrix}$ 
   
   $\text{Hess} f(x)=   \begin{bmatrix}
    h_{11} & h_{12} & ... & h_{1n} \\
    h_{21} & h_{22} & ... & h_{2n} \\    
    \vdots \\
    h_{n1} & h_{n2} & ... & h_{nn}
  \end{bmatrix}$

That is, the Hessian is a nxn matrix that contains the second order derivatives. 

To determine how the function increases or decreases in any direction around a given point, we can check the **Hessian determinants**, which are the determinants of the Hessian matrix taking into account different dimensions:

$H_1 = \begin{vmatrix} h_{11} \end{vmatrix}$

$H_2 = \begin{vmatrix}
    h_{11} & h_{12} \\
    h_{21} & h_{22}
\end{vmatrix}$

$...$

A n-dimensional function has n Hessian determinants. Now, the Hessian determinants can help us decide whether a function is a local or global minimum, according to the following definitions:

- **Definitive positive Hessian:** all determinants are greater than zero $𝐻_𝑖>0 \forall i \in [1,..,𝑛]$.
- **Semi-definitive positive Hessian:** all determinants are greater or equal than zero $𝐻_𝑖 \geq 0 \forall i \in [1,..,𝑛]$.
- **Definitive negative Hessian:** all determinants are lower than zero $𝐻_𝑖 < 0 \forall i \in [1,..,𝑛]$.
- **Semi-definitive negative Hessian:** all determinants are less or equal than zero $𝐻_𝑖 \leq 0 \forall i \in [1,..,𝑛]$.

Now, the second derivative test becomes: 

2. **Second partial derivative test**:

    - In a global maximum, the Hessian is definitive negative
    - In a global minimum, the Hessian is definitive positive
    - In a local maximum, the Hessian is semi-definitive negative
    - In a local minimum, the Hessian is semi-definitive positive




