# Second Order Methods

Second order methods provide for both step size and direction as opposed to just direction in first order methods.

## Netwon's Method

We shall use a quadratic approximation about a design point (e.g., using second order taylor series expansion) to obtain an approximate step size as well as derivate to discern the direction to take for minimizing the function. 


### Univariate Case

**Second order taylor series expansion**

$q(x) = f(x^k) + (x - x^k) * f'(x^k) + (x - x^k)^2/2 * f''(x^k)$ where $x^k$ is our design point


**Set derivative to zero and solve for root == Newton's method**

- $d/dx(q(x)) = f' * (x^k) + (x - x^k) * f''(x^k) = 0$
- $x^{(k+1)} = x^k - f'(x^k) / f''(x^k)$


**Note** instability can occur when the second order derivative is zero or very close to it (may lead to oscilliations)

### Multivariate Case

**Second order taylor series expansion**

$f(x) = q(x) = f(x^k) + (g^k)^T * (x - x^k) + 1/2 * (x -x^k)^T * H^k * (x - x^k)$ where $g^k$ and $H^k$ are the gradient and Hessian at $x^k$

**Set gradient to zero and solve == Newton's Method**

- $∇q(x) = g^k + H^k * (x - x^k) = 0$
- $x^{(k+1)} = x^k - (H^k)^{-1} * g^k$

In [11]:
using LinearAlgebra

function newtons_method(∇f, H, x, ϵ, k_max)
    k, Δ = 1, fill(Inf, length(x))
    while norm(Δ) > ϵ && k < k_max
        Δ = -H(x) \ ∇f(x)
        x += Δ
        k += 1
    end
    return x
end

newtons_method (generic function with 2 methods)

In [12]:
# Example - booth function 
booth(x) = (x[1] + 2x[2] - 7)^2 + (2x[1] + x[2] - 5)^2

# Gradient of booth function
∇booth(x) = [10x[1] + 8x[2] - 34, 8x[1] + 10x[2] - 38]

# Hessian of booth function
Hbooth(x) = [10 8; 8 10]

# Initial guess
x0 = [9., 8.]
x = newtons_method(∇booth, Hbooth, x0, 1e-6, 1000)

println("x = ", x)

x = [0.9999999999999982, 3.0000000000000018]


## Secant Method

Unknown second order derivative -> approximate using difference of prior two derivatives.

**univariate case**

- $f''(x^k) = (f'(x^k) - f'(x^{(k-1)})) / (x^k - x^{(k-1)})$
- $x^{(k+1)} = x^k - (x^k - x^{(k-1)}) / (f'(x^k)) - f'(x^{(k-1)}) * f'(x^k)$

As one can see this requires at least 2 iterations to acquire a second order approximation.

## Quasi-Newton Methods

$x^{(k+1)} = x^k - \alpha^k * Q^k * g^k$

where $α^k$ is the step factor, $Q^k$ approximates the invse of the Hessian at the design point $x^k$

- $\gamma^{k+1} = g^{k+1} - g^k$
- $\delta^{k+1} = x^{k+1} - x^{k}$

$Q^k$ starts at an identity matrix then gets updated each iteration

### Davidon-Fletcher-Powell (DFP)

$Q = Q - (Q * \gamma * \gamma^T * Q) / (\gamma^T * Q * \gamma) + (\delta * \delta^T) / (\delta^T * \gamma)$

- Q symmetric positive definite
- If $f(x) = 1/2 * x^T * A * x + b^T * x + c$ then $Q = A^{-1}$
- For high dimensional problems this method can require an expensive amount of memory and compute

### Broyden-Fletch-Goldfarb-Shanno (BFGS)

Alternative to DFP using an approximate line search but still uses nxn dense matrix... L-BFGS stores m values (L = limited space) vs. full hessian.

$Q = Q - (\delta * \gamma^T * Q + Q * \gamma * \delta^T) / (\delta^T * \gamma) + (1 + (\gamma^T * Q * \gamma) / (\delta^T * \gamma)) * (\delta * \delta^T) / (\delta^T * \gamma)$

In [None]:
# TODO code