# 3.3 Unconstrained Optimization
<a target="_blank" href="https://colab.research.google.com/github/SaajanM/mat422-homework/blob/main/3.3%20Unconstrained%20Optimization/unconstrained_opt.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---

In [None]:
# Install a numpy package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install scipy
!{sys.executable} -m pip install matplotlib

In [None]:
# Import the numpy package
import numpy as np
import matplotlib
import matplotlib.pyplot as pyplt
from mpl_toolkits.mplot3d import Axes3D
import math
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate

$\newcommand\norm[1]{\left\lVert#1\right\rVert}$
$\newcommand\argmax{\text{arg}\,\text{max}}$
$\newcommand\argmin{\text{arg}\,\text{min}}$

## Section 3.3.1 Necessary and Suﬀicent Conditions of Local Minimizers

Unconstrained optimization deals with the following optimzation objective:
$$
\min_{\mathbf{x}\in\mathbb{R}^d} f(\mathbf{x})
$$

where $f$ is a function that maps from the $d$-dimensional reals to the single dimension reals. In this subsection we discuss several definitions of the term "solution" as well as derive corresponding characterizations.

Ideally we wish to find a global minimizer to the optimization problem above:

**Definition:** $\mathbf{X}^\ast \in\mathbb{R}^d$ is a **global minimizer** over $f$ if:
$$
f(\mathbf{x})\geq f(\mathbf{x}^\ast), \forall{\mathbf{x}\in\mathbb{R}^d}
$$

Often this is fairly difficult to find numerically unless certain properties of $f$ are presented. So we define other forms of minimization.

**Definition:** $\mathbf{x}^\ast$ is a **local minimizer** over $f$ if there is an open ball around $\mathbf{x}^\ast$ where $\mathbf{x}^\ast$ is the minimum value.

We will characterize local minimizers in terms of the **gradient** and **Hessian** of the function.

But first let us discuss what a **descent direction** is. $\mathbf{v}$ is a descent direction of $f$ at $\mathbf{x}_0$ if there is an $\alpha^\ast > 0$ such that $f(\mathbf{x}_0 + \alpha\mathbf{v}) < \mathbf{x}_0$ for all $\alpha <\alpha^\ast$

Basically it is a direction where all step sizes up to an upper bound guarantee decreasing the function.

It turns out that $\mathbf{v}$ is a descent direction if the directional derivative in direction $\mathbf{v}$ is negative.

In fact at all points where the gradient is not zero, the function has a descent direction. Namely it is the negation of the gradient vector.

The following theorem extends the result that the derivative of a function
is zero at a minimizer.

**First Order Necessary Condition:** If $\mathbf{x}_0$ is a local minimizer then the gradient of $f$ at this point is the zero vector.

If $f$ is twice continuously differentiable, the Hessian of the function can
play an important role.

**Definition:** A square symmetrix $d\times d$ matrix $H$ is **positive-semidefinite** (PSD) if $\mathbf{x}^TH\mathbf{x} \geq 0$ for all $\mathbf{x}\in\mathbb{R}^d$.

**Second Order Necessary Condition:** If $\mathbf{x}_0$ is a local minimizer, then the hessian (if it exists) is PSD.

**Second Order Sufficient Condition:** If, at a point $\mathbf{x}_0$, $f$ has a gradient of zero and the Hessian is positive definite, then this point is a strict local minimizer.

## Section 3.3.2 Convexity and Global Minimizers

Here we only consider convex functions. This means the line segment between any two points on the graph of the function lies above the function. This type of function has the great benefit of making all local minima into global minima.

**Definition:** A set $D\subseteq\mathbb{R}^d$ is a **convex set** if for all $\mathbf{x},\mathbf{y}\in\mathbb{R}^d$ and all $\alpha\in[0,1]$, we have that $(1-\alpha)\mathbf{x}+\alpha\mathbf{y}\in D$.

We can extend this to **convex functions**. That is if we linearly interpolate over function arguments, we result in a value no more than the linear interpolation of the function outputs.

It turns out that all affine functions are convex.

We can prove other types of functions are convex by looking at their Hessian. But we start with the first order condition.

**First-Order Convexity Condition** $f$ is convex if and only if for all $\mathbf{x},\mathbf{y}\in\mathbb{R}^d$,
$$
    f(\mathbf{y})\geq f(\mathbf{x}) + \nabla f(\mathbf{x})^T(\mathbf{y}-\mathbf{x})
$$

We can build on this to get the **second order convexity condition**: The function is convex if at all points the hessian is PSD

Due to these conditions, then we know that all that is sufficient and necessary to be a global minimizer is to have a gradient of zero at that point.

### 3.3.3 Gradient Descent

**Gradient descent** is an iterative optimization algorithm for finding a local minimum of a differentiable function.

Because most functions are nonlinear and therefore cannot have their stationary points easily found.

Essentially, all gradient descent (particularly **steepest descent**) is doing is taking steps in the direction of the negative gradient. The size of the $k$-th step are determined by:
$$
\argmin_{\alpha > 0} f(\mathbf{x}^k) - \alpha\nabla f(\mathbf{x}^k)
$$