# Numerical Optimization 2024
## Project 1, Phase 2 (group)
**Team members:** Aral Cimcim, Lena Libon, Marianna Giannimara, Julia Stepanova, Tomáš Zima 

**Exercises solved:** 1, 2, 3, 4 (bonus task inclusive), 5, 6 --> We solved all exercises

**Notes:**

- As function 1 we refer to the Rosenbrock function. As function 2 we refer to $f(x)$. 
- If the number of iterations for a procedure is greater than $10^{4}$ and the stopping criterion is still not reached, we stop
- In the BFGS method, sometimes the computation of $\rho_k$ leads to an error, as the denominator is very, very small and due to machine precision seen as 0 (-> division by zero -> error). In this case, we set $\rho_k$ to a default value of $10^{10}$.
- In the SR1 method: If the denominator for calculating $H_{k + 1}$ is very close to 0, then we stop and return the result

In [1]:
import numpy as np
import pandas as pd
from utils import rosenbrock_function, f
from utils import closest_index
from derivatives_approximation import approximate_gradient, approximate_hessian

In [2]:
starting_points_1 = [np.array([1.2, 1.2], dtype=np.longdouble),
                     np.array([-1.2, 1], dtype=np.longdouble),
                     np.array([0.2, 0.8], dtype=np.longdouble)]

starting_points_2 = [np.array([-0.2, 1.2], dtype=np.longdouble),
                     np.array([3.8, 0.1], dtype=np.longdouble),
                     np.array([1.9, 0.6], dtype=np.longdouble)]

In [3]:
function_1, gradient_1, hessian_1, minima_1 = rosenbrock_function()
function_2, gradient_2, hessian_2, minima_2 = f()

## 1. Newton method
### Standard Newton Method

In [4]:
from newton_method import newton_method

results = pd.DataFrame(
    columns=["Function", "Starting point", "Kind of derivative", "Number of iterations", "x_k", "x*", "||grad f(x_k)||",
             "||x* - x_k||"])

# Function 1
for i in range(len(starting_points_1)):
    starting_point = starting_points_1[i]

    # Exact derivatives
    x_tilde_exact, n_iters_exact = newton_method(starting_point, function_1, gradient_1, hessian_1)
    closest_i_exact = closest_index(x_tilde_exact, minima_1)

    # Approximated derivatives
    x_tilde_approx, n_iters_approx = newton_method(starting_point, function_1, approximate_gradient(function_1),
                                                   approximate_hessian(function_1))
    closest_i_approx = closest_index(x_tilde_approx, minima_1)

    new_data = pd.DataFrame({"Function": [1, 1],
                             "Starting point": [starting_points_1[i], starting_points_1[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_1[closest_i_exact], minima_1[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_1(x_tilde_exact)),
                                                 np.linalg.norm(gradient_1(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_1[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_1[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

# Function 2
for i in range(len(starting_points_2)):
    starting_point = starting_points_2[i]

    # Exact derivatives
    x_tilde_exact, n_iters_exact = newton_method(starting_point, function_2, gradient_2, hessian_2)
    closest_i_exact = closest_index(x_tilde_exact, minima_2)

    # Approximated derivatives
    x_tilde_approx, n_iters_approx = newton_method(starting_point, function_2, approximate_gradient(function_2),
                                                   approximate_hessian(function_2))
    closest_i_approx = closest_index(x_tilde_approx, minima_2)

    new_data = pd.DataFrame({"Function": [2, 2],
                             "Starting point": [starting_points_2[i], starting_points_2[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_2[closest_i_exact], minima_2[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_2(x_tilde_exact)),
                                                 np.linalg.norm(gradient_2(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_2[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_2[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

results

Unnamed: 0,Function,Starting point,Kind of derivative,Number of iterations,x_k,x*,||grad f(x_k)||,||x* - x_k||
0,1,"[1.2, 1.2]",Exact,7,"[1.0000000000599822, 1.0000000001197604]","[1, 1]",2.056256e-10,1.339419e-10
1,1,"[1.2, 1.2]",Approx,7,"[1.0000000009508754, 1.0000000018992754]","[1, 1]",2.93392e-09,2.124008e-09
2,1,"[-1.2, 1.0]",Exact,19,"[0.9999998517851633, 0.9999997031166005]","[1, 1]",1.464394e-07,3.318243e-07
3,1,"[-1.2, 1.0]",Approx,19,"[0.9999998986320103, 0.9999997970308782]","[1, 1]",1.189923e-07,2.268743e-07
4,1,"[0.2, 0.8]",Exact,11,"[0.9999997847485415, 0.999999569184952]","[1, 1]",3.119441e-07,4.815961e-07
5,1,"[0.2, 0.8]",Approx,11,"[0.9999996482220533, 0.9999992958066573]","[1, 1]",4.663005e-07,7.871696e-07
6,2,"[-0.2, 1.2]",Exact,10000,"[-0.1571651902032335, 0.7241271495570358]","[0, 1]",25.51276,0.3175008
7,2,"[-0.2, 1.2]",Approx,10000,"[-0.15714388004857385, 0.7241981838337922]","[0, 1]",25.51406,0.3174285
8,2,"[3.8, 0.1]",Exact,10000,"[1.4508600693076394, 0.06064898214526147]","[0, 1]",33.68963,1.728403
9,2,"[3.8, 0.1]",Approx,10000,"[1.4505038790569782, 0.06066314145867495]","[0, 1]",33.67917,1.728096


As you can see, the result for using the exact and approximated gradient and hessian matrix are really close to each other. The reason for this is that the approximation is quite close and thus, doesn't influence the result. For function 2, our result did not converge. One possible reason can be seen when looking at a plot of the function. In the area around the two minima, the function is very flat. The backtracking like search leads to very small step sizes. As a result, you need more than 10,000 steps to approach the minima.

### Newton method with Hessian modification

In [4]:
from newton_method import newton_method

results = pd.DataFrame(
    columns=["Function", "Starting point", "Kind of derivative", "Number of iterations", "x_k", "x*", "||grad f(x_k)||",
             "||x* - x_k||"])

# Function 1
for i in range(len(starting_points_1)):
    starting_point = starting_points_1[i]

    # Exact derivatives
    x_tilde_exact, n_iters_exact = newton_method(starting_point, function_1, gradient_1, hessian_1,
                                                 hessian_modification=True)
    closest_i_exact = closest_index(x_tilde_exact, minima_1)

    # Approximated derivatives
    x_tilde_approx, n_iters_approx = newton_method(starting_point, function_1, approximate_gradient(function_1),
                                                   approximate_hessian(function_1), hessian_modification=True)
    closest_i_approx = closest_index(x_tilde_approx, minima_1)

    new_data = pd.DataFrame({"Function": [1, 1],
                             "Starting point": [starting_points_1[i], starting_points_1[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_1[closest_i_exact], minima_1[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_1(x_tilde_exact)),
                                                 np.linalg.norm(gradient_1(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_1[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_1[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

# Function 2
for i in range(len(starting_points_2)):
    starting_point = starting_points_2[i]

    # Exact derivatives
    x_tilde_exact, n_iters_exact = newton_method(starting_point, function_2, gradient_2, hessian_2,
                                                 hessian_modification=True)
    closest_i_exact = closest_index(x_tilde_exact, minima_2)

    # Approximated derivatives
    x_tilde_approx, n_iters_approx = newton_method(starting_point, function_2, approximate_gradient(function_2),
                                                   approximate_hessian(function_2), hessian_modification=True)
    closest_i_approx = closest_index(x_tilde_approx, minima_2)

    new_data = pd.DataFrame({"Function": [2, 2],
                             "Starting point": [starting_points_2[i], starting_points_2[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_2[closest_i_exact], minima_2[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_2(x_tilde_exact)),
                                                 np.linalg.norm(gradient_2(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_2[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_2[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

results

Unnamed: 0,Function,Starting point,Kind of derivative,Number of iterations,x_k,x*,||grad f(x_k)||,||x* - x_k||
0,1,"[1.2, 1.2]",Exact,7,"[1.0000000000599698, 1.0000000001197356]","[1, 1]",2.056569e-10,1.339141e-10
1,1,"[1.2, 1.2]",Approx,7,"[1.0000000009512635, 1.0000000019000514]","[1, 1]",2.934861e-09,2.124876e-09
2,1,"[-1.2, 1.0]",Exact,19,"[0.9999998517852049, 0.9999997031166837]","[1, 1]",1.464393e-07,3.318242e-07
3,1,"[-1.2, 1.0]",Approx,19,"[0.999999887509527, 0.9999997747592314]","[1, 1]",1.317305e-07,2.517688e-07
4,1,"[0.2, 0.8]",Exact,7,"[1.0000001464050128, 1.000000292474299]","[1, 1]",4.323556e-07,3.270713e-07
5,1,"[0.2, 0.8]",Approx,7,"[1.000000232906024, 1.0000004651846872]","[1, 1]",7.276792e-07,5.202326e-07
6,2,"[-0.2, 1.2]",Exact,9,"[-6.580251882233632e-17, 1.0000000000000033]","[0, 1]",2.969177e-14,3.331319e-15
7,2,"[-0.2, 1.2]",Approx,9,"[9.653683373658318e-14, 0.9999999999913731]","[0, 1]",6.981904e-11,8.627417e-12
8,2,"[3.8, 0.1]",Exact,9,"[4.000000001708678, 2.0731565568267324e-11]","[4, 0]",1.030986e-07,1.708803e-09
9,2,"[3.8, 0.1]",Approx,8,"[4.000000000004157, 1.437169610616674e-13]","[4, 0]",6.993077e-10,4.159159e-12


When using Hessian modification (Cholesky with added multiple of identity), we also achieve convergence for the problems with function 2.

## 2. Linear CG vs. SD

In [6]:
from scipy.linalg import hilbert
from linear_cg import linear_cg
from steepest_descent import steepest_descent


def problem2(Q, b):
    return lambda x: 0.5 * x.T @ Q @ x - b.T @ x


def gradient_problem2(Q, b):
    return lambda x: x @ Q - b


size_hilbert = [5, 8, 12, 20, 30]

### Linear CG

In [7]:
results = pd.DataFrame(
    columns=["n", "Starting point", "Number of iterations", "x_k", "x*", "||grad f(x_k)||",
             "||x* - x_k||"])

for n in size_hilbert:
    Q = hilbert(n).astype(np.longfloat)

    starting_value = np.zeros((n,), dtype=np.longfloat)

    x_tilde, n_iters = linear_cg(starting_value, Q)

    Q_solve = hilbert(n)
    b_solve = np.ones((n,))
    actual_solution = np.linalg.solve(Q_solve, b_solve)

    new_data = pd.DataFrame({"n": [n],
                             "Starting point": [starting_value],
                             "Number of iterations": [n_iters],
                             "x_k": [x_tilde],
                             "x*": [actual_solution],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_problem2(Q_solve, b_solve)(x_tilde))],
                             "||x* - x_k||": [np.linalg.norm(actual_solution - x_tilde)]})

    results = pd.concat([results, new_data], ignore_index=True)

results

Unnamed: 0,n,Starting point,Number of iterations,x_k,x*,||grad f(x_k)||,||x* - x_k||
0,5,"[0.0, 0.0, 0.0, 0.0, 0.0]",6,"[4.999999783377816, -119.99999989795843, 630.0...","[5.000000000055515, -120.00000000109249, 630.0...",7.468829e-08,3.583304e-07
1,8,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]",19,"[-8.000011509945448, 504.0002096724077, -7560....","[-8.000000518189353, 504.00002489574075, -7560...",4.865258e-09,0.02652634
2,12,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",34,"[-9.610831026496056, 815.4744265639367, -16497...","[-12.875802092659683, 1827.0048985656138, -635...",6.762035e-07,413313200.0
3,20,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",67,"[-10.97319907705808, 1050.8207582621808, -2395...","[-30.619818826132814, 5796.311082714353, -2676...",7.678289e-07,11642770000.0
4,30,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",140,"[-12.554788420176621, 1387.2244097104976, -366...","[-70.91100734459087, 12138.613689596617, -5094...",2.920172e-07,13439560000.0


The Hilbert matrix is ill-conditioned. Its condition number grows rapidly with the size of the matrix. This explains the increase of number of iterations and the increase of the error.

### Steepest Descent (SD) using exact line search
For this exercise, we increase the maximal number of iterations to $10^{5}$.

In [8]:
results = pd.DataFrame(
    columns=["n", "Starting point", "Number of iterations", "x_k", "x*", "||grad f(x_k)||",
             "||x* - x_k||"])

for n in size_hilbert:
    Q = hilbert(n).astype(np.longfloat)

    starting_value = np.zeros((n,), dtype=np.longfloat)

    x_tilde, n_iters = steepest_descent(starting_value, Q, gradient_problem2(Q, np.ones((n,))), max_iterations=1e5)

    Q_solve = hilbert(n)
    b_solve = np.ones((n,))
    actual_solution = np.linalg.solve(Q_solve, b_solve)

    new_data = pd.DataFrame({"n": [n],
                             "Starting point": [starting_value],
                             "Number of iterations": [n_iters],
                             "x_k": [x_tilde],
                             "x*": [actual_solution],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_problem2(Q_solve, b_solve)(x_tilde))],
                             "||x* - x_k||": [np.linalg.norm(actual_solution - x_tilde)]})

    results = pd.concat([results, new_data], ignore_index=True)

results

Unnamed: 0,n,Starting point,Number of iterations,x_k,x*,||grad f(x_k)||,||x* - x_k||
0,5,"[0.0, 0.0, 0.0, 0.0, 0.0]",100000,"[0.03283056999826649, -26.159994815897374, 222...","[5.000000000055515, -120.00000000109249, 630.0...",0.005397,804.1746
1,8,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]",100000,"[2.9306486044716613, -37.91048229518706, 42.02...","[-8.000000518189353, 504.00002489574075, -7560...",0.005374,314759.5
2,12,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",100000,"[-2.35713857488014, 54.74113206747995, -222.86...","[-12.875802092659683, 1827.0048985656138, -635...",0.00622,413345400.0
3,20,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",100000,"[-0.9861783641237102, -0.2909342583787817, 80....","[-30.619818826132814, 5796.311082714353, -2676...",0.009334,11642780000.0
4,30,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",100000,"[2.8501473551383643, -49.987117094868665, 144....","[-70.91100734459087, 12138.613689596617, -5094...",0.009628,13439580000.0


As you can see, the SD algorithm does not converge to the solution in our max number of steps. One reason for the non-convergence is that the Hilbert matrix is ill-conditioned. 

## 3. Nonlinear CG
### Fletcher-Reevers method (F-R)

In [5]:
from nonlinear_cg import fletcher_reeves

results = pd.DataFrame(
    columns=["Function", "Starting point", "Kind of derivative", "Number of iterations", "x_k", "x*", "||grad f(x_k)||",
             "||x* - x_k||"])

for i in range(len(starting_points_1)):
    starting_point = starting_points_1[i]

    x_tilde_exact, n_iters_exact = fletcher_reeves(function_1, gradient_1, starting_point)
    closest_i_exact = closest_index(x_tilde_exact, minima_1)

    x_tilde_approx, n_iters_approx = fletcher_reeves(function_1, approximate_gradient(function_1), starting_point)
    closest_i_approx = closest_index(x_tilde_approx, minima_1)

    new_data = pd.DataFrame({"Function": [1, 1],
                             "Starting point": [starting_points_1[i], starting_points_1[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_1[closest_i_exact], minima_1[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_1(x_tilde_exact)),
                                                 np.linalg.norm(gradient_1(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_1[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_1[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

for i in range(len(starting_points_2)):
    starting_point = starting_points_2[i]

    x_tilde_exact, n_iters_exact = fletcher_reeves(function_2, gradient_2, starting_point)
    closest_i_exact = closest_index(x_tilde_exact, minima_2)

    x_tilde_approx, n_iters_approx = fletcher_reeves(function_2, approximate_gradient(function_2), starting_point)
    closest_i_approx = closest_index(x_tilde_approx, minima_2)

    new_data = pd.DataFrame({"Function": [2, 2],
                             "Starting point": [starting_points_2[i], starting_points_2[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_2[closest_i_exact], minima_2[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_2(x_tilde_exact)),
                                                 np.linalg.norm(gradient_2(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_2[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_2[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

results

Unnamed: 0,Function,Starting point,Kind of derivative,Number of iterations,x_k,x*,||grad f(x_k)||,||x* - x_k||
0,1,"[1.2, 1.2]",Exact,263,"[0.9999999677057821, 0.9999999374742284]","[1, 1]",9.806462e-07,7.037321e-08
1,1,"[1.2, 1.2]",Approx,316,"[0.9999999111859438, 0.9999998202071866]","[1, 1]",8.131015e-07,2.005328e-07
2,1,"[-1.2, 1.0]",Exact,441,"[1.0000003177387122, 1.0000006351334447]","[1, 1]",7.761664e-07,7.101777e-07
3,1,"[-1.2, 1.0]",Approx,374,"[1.0000002247767923, 1.0000004524806265]","[1, 1]",9.289149e-07,5.052359e-07
4,1,"[0.2, 0.8]",Exact,454,"[0.9999998249418696, 0.9999996472811941]","[1, 1]",8.650422e-07,3.937714e-07
5,1,"[0.2, 0.8]",Approx,454,"[0.99999991313003, 0.9999998279376654]","[1, 1]",9.089692e-07,1.927481e-07
6,2,"[-0.2, 1.2]",Exact,1671,"[-1.3466633950665309e-09, 1.0000001144395139]","[0, 1]",9.29596e-07,1.144474e-07
7,2,"[-0.2, 1.2]",Approx,383,"[1.1627774411461685e-09, 1.000000077031658]","[0, 1]",7.975775e-07,7.704043e-08
8,2,"[3.8, 0.1]",Exact,397,"[4.000000104021418, -2.1087450284248836e-10]","[4, 0]",8.074915e-07,1.040216e-07
9,2,"[3.8, 0.1]",Approx,422,"[3.9999999225388785, 2.2605289341192234e-10]","[4, 0]",9.327258e-07,7.746145e-08


We can see that with this method, all of our experiments converge. 

### Polak-Riviere method (P-R)

In [4]:
from nonlinear_cg import polak_ribiere

results = pd.DataFrame(
    columns=["Function", "Starting point", "Kind of derivative", "Number of iterations", "x_k", "x*", "||grad f(x_k)||",
             "||x* - x_k||"])

for i in range(len(starting_points_1)):
    starting_point = starting_points_1[i]

    x_tilde_exact, n_iters_exact = polak_ribiere(function_1, gradient_1, starting_point)
    closest_i_exact = closest_index(x_tilde_exact, minima_1)

    x_tilde_approx, n_iters_approx = polak_ribiere(function_1, approximate_gradient(function_1), starting_point)
    closest_i_approx = closest_index(x_tilde_approx, minima_1)

    new_data = pd.DataFrame({"Function": [1, 1],
                             "Starting point": [starting_points_1[i], starting_points_1[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_1[closest_i_exact], minima_1[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_1(x_tilde_exact)),
                                                 np.linalg.norm(gradient_1(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_1[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_1[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

for i in range(len(starting_points_2)):
    starting_point = starting_points_2[i]

    x_tilde_exact, n_iters_exact = polak_ribiere(function_2, gradient_2, starting_point)
    closest_i_exact = closest_index(x_tilde_exact, minima_2)

    x_tilde_approx, n_iters_approx = polak_ribiere(function_2, approximate_gradient(function_2), starting_point)
    closest_i_approx = closest_index(x_tilde_approx, minima_2)

    new_data = pd.DataFrame({"Function": [2, 2],
                             "Starting point": [starting_points_2[i], starting_points_2[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_2[closest_i_exact], minima_2[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_2(x_tilde_exact)),
                                                 np.linalg.norm(gradient_2(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_2[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_2[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

results

Unnamed: 0,Function,Starting point,Kind of derivative,Number of iterations,x_k,x*,||grad f(x_k)||,||x* - x_k||
0,1,"[1.2, 1.2]",Exact,347,"[1.000000384726292, 1.0000007695605806]","[1, 1]",7.266333e-07,8.603707e-07
1,1,"[1.2, 1.2]",Approx,279,"[1.00000015128884, 1.0000003037933134]","[1, 1]",3.046996e-07,3.393799e-07
2,1,"[-1.2, 1.0]",Exact,218,"[0.9999993236921857, 0.9999986435530281]","[1, 1]",7.872389e-07,1.515698e-06
3,1,"[-1.2, 1.0]",Approx,222,"[0.9999993909917241, 0.9999987780148405]","[1, 1]",8.756121e-07,1.365335e-06
4,1,"[0.2, 0.8]",Exact,153,"[0.9999996643726988, 0.9999993256548584]","[1, 1]",8.374466e-07,7.53251e-07
5,1,"[0.2, 0.8]",Approx,208,"[0.999999986410885, 0.9999999706228115]","[1, 1]",9.591722e-07,3.236794e-08
6,2,"[-0.2, 1.2]",Exact,39,"[-1.6955106132115354e-09, 0.9999999463277316]","[0, 1]",7.535167e-07,5.369904e-08
7,2,"[-0.2, 1.2]",Approx,39,"[-1.695065851862996e-09, 0.9999999462082684]","[0, 1]",7.541516e-07,5.381843e-08
8,2,"[3.8, 0.1]",Exact,59,"[3.9999994613319734, 4.0141135588671265e-10]","[4, 0]",8.93935e-07,5.386682e-07
9,2,"[3.8, 0.1]",Approx,60,"[3.9999998505931518, 4.568177524007898e-12]","[4, 0]",2.867492e-07,1.494068e-07


We changed the parameters $\rho$ and $c$ for line search, as for the default parameters, the algorithm was not converging. Our guess is that with the old parameters, the algorithm jumped over the right solution.

## 4. QN methods
### BFGS method

In [7]:
from qn import bfgs

results = pd.DataFrame(
    columns=["Function", "Starting point", "Kind of derivative", "Number of iterations", "x_k", "x*", "||grad f(x_k)||",
             "||x* - x_k||"])

# Function 1
for i in range(len(starting_points_1)):
    starting_point = starting_points_1[i]
    H0 = np.eye(2)

    # Exact derivatives
    x_tilde_exact, n_iters_exact = bfgs(starting_point, function_1, gradient_1, H0)
    closest_i_exact = closest_index(x_tilde_exact, minima_1)

    # Approximated derivatives
    x_tilde_approx, n_iters_approx = bfgs(starting_point, function_1, approximate_gradient(function_1), H0)
    closest_i_approx = closest_index(x_tilde_approx, minima_1)

    new_data = pd.DataFrame({"Function": [1, 1],
                             "Starting point": [starting_points_1[i], starting_points_1[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_1[closest_i_exact], minima_1[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_1(x_tilde_exact)),
                                                 np.linalg.norm(gradient_1(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_1[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_1[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

# Function 2
for i in range(len(starting_points_2)):
    starting_point = starting_points_2[i]
    H0 = np.eye(2).astype(np.longdouble)

    # Exact derivatives
    x_tilde_exact, n_iters_exact = bfgs(starting_point, function_2, gradient_2, H0)
    closest_i_exact = closest_index(x_tilde_exact, minima_2)

    # Approximated derivatives
    x_tilde_approx, n_iters_approx = bfgs(starting_point, function_2, approximate_gradient(function_2), H0)
    closest_i_approx = closest_index(x_tilde_approx, minima_2)

    new_data = pd.DataFrame({"Function": [2, 2],
                             "Starting point": [starting_points_2[i], starting_points_2[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_2[closest_i_exact], minima_2[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_2(x_tilde_exact)),
                                                 np.linalg.norm(gradient_2(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_2[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_2[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

results

Unnamed: 0,Function,Starting point,Kind of derivative,Number of iterations,x_k,x*,||grad f(x_k)||,||x* - x_k||
0,1,"[1.2, 1.2]",Exact,19,"[0.9999999999158601, 0.9999999994759363]","[1, 1]",1.589609e-07,5.307752e-10
1,1,"[1.2, 1.2]",Approx,19,"[0.9999999997159227, 0.9999999990760621]","[1, 1]",1.586031e-07,9.666236e-10
2,1,"[-1.2, 1.0]",Exact,20,"[1.0000000020769013, 1.0000000038753096]","[1, 1]",1.282746e-07,4.396765e-09
3,1,"[-1.2, 1.0]",Approx,20,"[1.00000000187677, 1.00000000347517]","[1, 1]",1.278593e-07,3.949566e-09
4,1,"[0.2, 0.8]",Exact,26,"[0.9999999999589824, 0.9999999999170179]","[1, 1]",3.519398e-10,9.256603e-11
5,1,"[0.2, 0.8]",Approx,26,"[0.9999999997590051, 0.9999999995170635]","[1, 1]",2.156834e-10,5.397279e-10
6,2,"[-0.2, 1.2]",Exact,15,"[-1.838329247262121e-09, 1.0000000000506375]","[0, 1]",5.523264e-07,1.839027e-09
7,2,"[-0.2, 1.2]",Approx,15,"[-1.8383292559015527e-09, 1.0000000000506375]","[0, 1]",5.523264e-07,1.839027e-09
8,2,"[3.8, 0.1]",Exact,9,"[4.00000000091774, -1.7046464849810893e-10]","[4, 0]",8.177586e-07,9.334368e-10
9,2,"[3.8, 0.1]",Approx,9,"[4.00000000091774, -1.7046464095759664e-10]","[4, 0]",8.177585e-07,9.334368e-10


### SR1 method with line-search

In [11]:
from qn import sr1_line_search

results = pd.DataFrame(
    columns=["Function", "Starting point", "Kind of derivative", "Number of iterations", "x_k", "x*", "||grad f(x_k)||",
             "||x* - x_k||"])

# Function 1
for i in range(len(starting_points_1)):
    starting_point = starting_points_1[i]
    H0 = np.eye(2)

    # Exact derivatives
    x_tilde_exact, n_iters_exact = sr1_line_search(starting_point, function_1, gradient_1, H0)
    closest_i_exact = closest_index(x_tilde_exact, minima_1)

    # Approximated derivatives
    x_tilde_approx, n_iters_approx = sr1_line_search(starting_point, function_1, approximate_gradient(function_1), H0)
    closest_i_approx = closest_index(x_tilde_approx, minima_1)

    new_data = pd.DataFrame({"Function": [1, 1],
                             "Starting point": [starting_points_1[i], starting_points_1[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_1[closest_i_exact], minima_1[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_1(x_tilde_exact)),
                                                 np.linalg.norm(gradient_1(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_1[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_1[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

# Function 2
for i in range(len(starting_points_2)):
    starting_point = starting_points_2[i]
    H0 = np.eye(2).astype(np.longdouble)

    # Exact derivatives
    x_tilde_exact, n_iters_exact = sr1_line_search(starting_point, function_2, gradient_2, H0)
    closest_i_exact = closest_index(x_tilde_exact, minima_2)

    # Approximated derivatives
    x_tilde_approx, n_iters_approx = sr1_line_search(starting_point, function_2, approximate_gradient(function_2), H0)
    closest_i_approx = closest_index(x_tilde_approx, minima_2)

    new_data = pd.DataFrame({"Function": [2, 2],
                             "Starting point": [starting_points_2[i], starting_points_2[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_2[closest_i_exact], minima_2[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_2(x_tilde_exact)),
                                                 np.linalg.norm(gradient_2(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_2[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_2[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

results

Unnamed: 0,Function,Starting point,Kind of derivative,Number of iterations,x_k,x*,||grad f(x_k)||,||x* - x_k||
0,1,"[1.2, 1.2]",Exact,7,"[1.183312829353713, 1.4016219878155516]","[1, 1]",0.4039778,0.4414791
1,1,"[1.2, 1.2]",Approx,7,"[1.183312829233458, 1.4016219875302958]","[1, 1]",0.4039778,0.4414791
2,1,"[-1.2, 1.0]",Exact,14,"[1.1311877867839712, 1.2741605675979062]","[1, 1]",2.925799,0.3039313
3,1,"[-1.2, 1.0]",Approx,14,"[1.131187789758555, 1.2741605743988376]","[1, 1]",2.925799,0.3039313
4,1,"[0.2, 0.8]",Exact,17,"[0.8207762417137141, 0.6658203135955328]","[1, 1]",2.71935,0.379206
5,1,"[0.2, 0.8]",Approx,17,"[0.8207762302185185, 0.6658202947244001]","[1, 1]",2.71935,0.3792061
6,2,"[-0.2, 1.2]",Exact,13,"[-1.9161842658662725e-10, 0.9999999999537394]","[0, 1]",5.767878e-08,1.971235e-10
7,2,"[-0.2, 1.2]",Approx,13,"[-1.9161842698424966e-10, 0.9999999999537394]","[0, 1]",5.767878e-08,1.971235e-10
8,2,"[3.8, 0.1]",Exact,10,"[4.000000000176488, 4.9345753299417837e-14]","[4, 0]",5.968048e-10,1.764882e-10
9,2,"[3.8, 0.1]",Approx,10,"[4.000000000176489, 4.934557449936535e-14]","[4, 0]",5.968066e-10,1.764891e-10


### SR1 method within the trust-region framework

In [12]:
from qn import sr1_trust_region

results = pd.DataFrame(
    columns=["Function", "Starting point", "Kind of derivative", "Number of iterations", "x_k", "x*", "||grad f(x_k)||",
             "||x* - x_k||"])

# Function 1
for i in range(len(starting_points_1)):
    starting_point = starting_points_1[i]
    B0 = np.eye(2)

    # Exact derivatives
    x_tilde_exact, n_iters_exact = sr1_trust_region(starting_point, function_1, gradient_1, B0)
    closest_i_exact = closest_index(x_tilde_exact, minima_1)

    # Approximated derivatives
    x_tilde_approx, n_iters_approx = sr1_trust_region(starting_point, function_1, approximate_gradient(function_1), B0)
    closest_i_approx = closest_index(x_tilde_approx, minima_1)

    new_data = pd.DataFrame({"Function": [1, 1],
                             "Starting point": [starting_points_1[i], starting_points_1[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_1[closest_i_exact], minima_1[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_1(x_tilde_exact)),
                                                 np.linalg.norm(gradient_1(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_1[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_1[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

# Function 2
for i in range(len(starting_points_2)):
    starting_point = starting_points_2[i]
    B0 = np.eye(2).astype(np.longdouble)

    # Exact derivatives
    x_tilde_exact, n_iters_exact = sr1_trust_region(starting_point, function_2, gradient_2, B0)
    closest_i_exact = closest_index(x_tilde_exact, minima_2)

    # Approximated derivatives
    x_tilde_approx, n_iters_approx = sr1_trust_region(starting_point, function_2, approximate_gradient(function_2), B0)
    closest_i_approx = closest_index(x_tilde_approx, minima_2)

    new_data = pd.DataFrame({"Function": [2, 2],
                             "Starting point": [starting_points_2[i], starting_points_2[i]],
                             "Kind of derivative": ["Exact", "Approx"],
                             "Number of iterations": [n_iters_exact, n_iters_approx],
                             "x_k": [x_tilde_exact, x_tilde_approx],
                             "x*": [minima_2[closest_i_exact], minima_2[closest_i_approx]],
                             "||grad f(x_k)||": [np.linalg.norm(gradient_2(x_tilde_exact)),
                                                 np.linalg.norm(gradient_2(x_tilde_approx))],
                             "||x* - x_k||": [np.linalg.norm(minima_2[closest_i_exact] - x_tilde_exact),
                                              np.linalg.norm(minima_2[closest_i_approx] - x_tilde_approx)]})

    results = pd.concat([results, new_data], ignore_index=True)

results

Unnamed: 0,Function,Starting point,Kind of derivative,Number of iterations,x_k,x*,||grad f(x_k)||,||x* - x_k||
0,1,"[1.2, 1.2]",Exact,11,"[1.0101074523433107, 1.0199006604862149]","[1, 1]",0.206041,0.02232
1,1,"[1.2, 1.2]",Approx,11,"[1.0101076831823723, 1.0199010438336809]","[1, 1]",0.206079,0.022321
2,1,"[-1.2, 1.0]",Exact,49,"[0.9989497814566656, 0.9978964762177491]","[1, 1]",0.00094,0.002351
3,1,"[-1.2, 1.0]",Approx,26,"[0.2643158003317542, 0.05210044941128574]","[1, 1]",3.57567,1.199894
4,1,"[0.2, 0.8]",Exact,626,"[0.9989078602625593, 0.9978129427846919]","[1, 1]",0.000994,0.002445
5,1,"[0.2, 0.8]",Approx,33,"[1.000599423074193, 1.0012014254544566]","[1, 1]",0.000542,0.001343
6,2,"[-0.2, 1.2]",Exact,10,"[2.6130973964451267e-07, 0.9999493856950454]","[0, 1]",0.000405,5.1e-05
7,2,"[-0.2, 1.2]",Approx,10,"[2.613097443775024e-07, 0.9999493856949708]","[0, 1]",0.000405,5.1e-05
8,2,"[3.8, 0.1]",Exact,202,"[3.9999143352492337, 1.02035607819751e-07]","[4, 0]",0.000322,8.6e-05
9,2,"[3.8, 0.1]",Approx,202,"[3.999907097983414, 1.0978270454627371e-07]","[4, 0]",0.000345,9.3e-05


The tasks for which the algorithm fail are due to the denominator being 0 and therefore, the search is stopped. The reason for this is the minimization function of scipy we are using. The results are also not exact and contain rounding errors due to machine precision. However, in most of the cases it converges to the right solution rather quick. 

## 5. Derivatives approximation

We have already included the results when using the approximation of the Gradient and the Hessian in the respective sections for each algorithm separately. They are marked with "Approx" as kind of derivative.

## 6. Outperforming the NM with a QN

See Task6_bonus.ipynb