# 0. `BFGS` Description
<font color="steelblue" size="4">

Website
-------
1. `Code`: https://github.com/trsav/bfgs/blob/master/BFGS.py

Newton method
-------------
1. The `classic Newton method` approximates the function to be optimised $f(x)$ as a quadratic using the `Taylor series expansion`:
$\begin{equation}
f(x+p) = f(x) + \nabla f(x)^Tp + \frac{1}{2}p^T\nabla^2f(x)p \\
\end{equation}$
2. By `minimising this function` with respect to $p$, `the optimal search direction` $p$ can be found as:
$\begin{equation}
-\nabla^2f(x_{k})^{-1} \nabla f(x_k)
\end{equation}$
3. The `step length` $\alpha$ is then computed via a `backtrack linesearch` using `Wolfe conditions` that assure sufficient descrease.
4. The `inverse of the Hessian matrix` $\nabla^2 f^{-1}$ is `computationally expensive` to compute due to both finite difference limitations and the cost of inverting a particularly large matrix. 

BFGS
----
1. For this reason an approximation to the `inverse of the Hessian` is used $H$, This approximation is updated at each iteration based on:
    - `the change in` $x$ 
    - `the change in`  $\nabla f$
$\begin{equation}
H_{k+1} = (I-\frac{sy^T}{y^Ts})H_k(I-\frac{ys^T}{y^Ts}) + \frac{ss^T}{y^Ts}
\end{equation}$

</font>

# 1. Example 
<font color="steelblue" size="4">

1. Testing the `BFGS algorithm` on the `Rosenbrock function` in 2 dimensions, an optimal solution is found in `34 iterations`.
2. The code implements an `initial inverse of Hessian` $H_0$ as the `identity matrix`.
    - If the problem is two dimensional then the code can produce a trajectory plot of the optimisation scheme. 
    - The `central difference method` is used for the calculation of gradients.

</font>

In [22]:
import numpy as np
import matplotlib.pyplot as plt

## 1.1. Rosenbrock function
<font color="steelblue" size="4">

`Website`
---------
1. https://www.sfu.ca/~ssurjano/rosen.html

`Rosenbrock function formula`:
-----------------------------
$\begin{equation}
f(\vec{x}) = \sum_{i=1}^{d-1}\left[ 100(x_{i+1}-x_i^2)^2 + (x_i-1)^2 \right]
\end{equation}$

`Vectorize`:
------------
$\begin{aligned}
dia &=
\begin{bmatrix}
x_1 & 0 & \cdots & 0 \\
0 & x_2 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & x_N \\
\end{bmatrix}_{N*N}
\newline
offdia &=
\begin{bmatrix}
0 & 1 & 0 & \cdots & 0 \\
0 & 0 & 1 & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & \cdots & 1 \\
0 & 0 & 0 & \cdots & 0 \\
\end{bmatrix}_{N*N}
\newline
-dia+offdia &= 
\begin{bmatrix}
-x_1 & 1 & 0 & \cdots & 0 & 0 \\
0 & -x_2 & 1 & \cdots & 0 & 0  \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & -x_{N-1} & 1 \\
0 & 0 & 0 & \cdots & 0 & -x_N\\
\end{bmatrix}_{N*N}
\newline
(-dia+offdia)\vec{x} &= 
\begin{bmatrix}
-x_1 & 1 & 0 & \cdots & 0 & 0 \\
0 & -x_2 & 1 & \cdots & 0 & 0  \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & -x_{N-1} & 1 \\
0 & 0 & 0 & \cdots & 0 & -x_N\\
\end{bmatrix}_{N*N}
\begin{bmatrix}
x_1 \\
x_2 \\
\vdots \\
x_{N-1} \\
x_N 
\end{bmatrix}=
\begin{bmatrix}
x_2 - x_1^2 \\
x_3 - x_2^2 \\
\vdots \\
x_N - x_{N-1}^2 \\
-x_N^2
\end{bmatrix}
\end{aligned}$

</font>

In [30]:
def rosenbrock_fun(x_array: np.array):
    '''
    Description
    -----------
        1. 计算 Rosenbrock 函数

    Parameters
    ----------
        1. x_array: np.array    
    '''
    num_dims = x_array.shape[0]
    dia = np.diag(x_array)
    offdia = np.ones(num_dims - 1)
    offdia = np.diag(offdia, 1)
    """
    Input
    -----
        x_array: np.array([98, 99, 100])
    Output
    ------
        operator:
            [[ -98.    1.    0.]
            [   0.  -99.    1.]
            [   0.    0. -100.]]
    """
    first_term = 100 * np.power(-dia + offdia, 2)
    second_term = np.power(x_array - 1, 2)
    result = np.sum( (first_term+second_term)[:-1], axis=0 )
    return result

In [29]:
x_array = np.array([98, 99, 100])

print(rosenbrock_fun(x_array=x_array))

[979218. 999408.  19702.]


## 1.2. Central finite difference calculation

In [None]:
def gradient(func, x_array):
    '''
    Description
    -----------

    Parameters
    ----------
        1. func:
            本例中使用 Rosenbrock 函数
        2. x_array: np.array
            一维形式
    '''
    h = np.cbrt( np.finfo(float).eps )
    num_dims = x_array.shape[0]
    nabla = np.zeros(num_dims)
    