# MATH 405/607 

# Method of Steepest Descent 
# (Also known as Gradient Descent)

* Optimization problems & Root finding
* Gradient Descent Scheme
* Error Analysis
* Global Convergence

### Literature 

* [Boyd and Vandenberghe, Convex Optimization, Chapter 9](https://web.stanford.edu/~boyd/cvxbook/)

In [3]:
include("math405.jl")

## Optimization Problems

**Motivation:** Finding the minimum ("optimum") points of a function numerically. 

**Fundamental Principle:** By following the opposite direction of its gradient in each iteration.

In other words, step towards where the function decreases the fastest. ("Steepest" descent)


Simple example:

Finding the minimum of
$$f(x,y) = x^2 + y^2 $$


![GradientDescenturl](https://blog.paperspace.com/content/images/2018/05/68747470733a2f2f707669676965722e6769746875622e696f2f6d656469612f696d672f70617274312f6772616469656e745f64657363656e742e676966.gif)

### Formulation
Given a function $f \in \mathbb{R}^M$, which we need to find its minimum.

Starting at some initial guess: $U_0 = \begin{pmatrix} 
    x_{1, 0} \\ x_{2, 0} \\ \vdots \\ x_{M, 0}
    \end{pmatrix}$

The gradient of the function at this point is given by 
$$\nabla f(U_n)= 
    \begin{pmatrix} 
    \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_M}
    \end{pmatrix} \, \Bigg|_{\,U_n}$$

Then, we define the Gradient Descent Scheme:

$$ U_{n+1} = U_n + h\big( -\nabla f(U_n)\big)$$

Where $h$ is the step size

With the exit condition being $\lVert \nabla f(U_k) \rVert \leq \epsilon$ for some small tolerance $\epsilon$


### Advantage of the Gradient Descent Scheme:

* Consistent convergence rate
* Resilient to not so "nice" functions (Whose higher order derivatives does not exist or have unwanted zeros)
* Moderate computation complexity (Requires only first order derivative)


### Comparison to Newton's Method in Optimization （1-D Case)
The Newton's Method of Optimization is defined as 
$$U_{n+1} = U_n - \frac{f'(U_n)}{f''(U_n)}$$
(Finding the root of derivative instead of $f^0(x)$)

Where the 1-D Gradient Descent Method is 
$$ U_{n+1} = U_n - hf'(U_n)$$