<h1>Bonus Assignment</h1>
<h3>Goal: Outperform NM with QN</h3>

<h3>Our Function:</h3>

$$
f(x, y) = (x^2 + y - 11)^2 + (x + y^2 - 7)^2
$$

In [7]:
import numpy as np

def f(x):
    return (x[0]**2 + x[1] - 11)**2 + (x[0] + x[1]**2 - 7)**2

def grad_f(x):
    df_dx = 2 * (2 * x[0] * (x[0]**2 + x[1] - 11) + x[0] + x[1]**2 - 7)
    df_dy = 2 * (x[0]**2 + 2 * x[1] * (x[0] + x[1]**2 - 7) + x[1] - 11)
    return np.array([df_dx, df_dy])

def hessian_f(x):
    d2f_dx2 = 4 * (x[0]**2 + x[1] - 11) + 8 * x[0]**2 + 2
    d2f_dy2 = 4 * (x[0] + x[1]**2 - 7) + 8 * x[1]**2 + 2
    d2f_dxdy = 4 * x[0] + 4 * x[1]
    return np.array([[d2f_dx2, d2f_dxdy], [d2f_dxdy, d2f_dy2]])

def backtracking_line_search(f, x, grad, p, alpha=0.3, beta=0.5):
    t = 1.0
    while f(x + t * p) > f(x) + alpha * t * np.dot(grad, p):
        t *= beta
    return t

def newton_method(f, grad, hessian, x0, max_iter=1000, tol=1e-6):
    x = x0.copy()
    for i in range(max_iter):
        grad_x = grad(x)
        if np.linalg.norm(grad_x) < tol:
            break
        hess_x = hessian(x)
        try:
            p = np.linalg.solve(hess_x, -grad_x)
        except np.linalg.LinAlgError:
            break  # In case Hessian is singular, break
        t = backtracking_line_search(f, x, grad_x, p)
        x += t * p
    return x, np.linalg.norm(grad_x), i

def bfgs_update(H, s, y):
    ys = np.dot(y, s)
    if ys < 1e-10:  # Prevent division by zero or very small values
        return H
    rho = 1.0 / ys
    I = np.eye(len(H))
    V = I - rho * np.outer(s, y)
    H = V.T @ H @ V + rho * np.outer(s, s)
    return H

def quasi_newton_method(f, grad, x0, max_iter=1000, tol=1e-6):
    x = x0.copy()
    n = len(x)
    H = np.eye(n)
    for i in range(max_iter):
        grad_x = grad(x)
        if np.linalg.norm(grad_x) < tol:
            break
        p = -np.dot(H, grad_x)
        t = backtracking_line_search(f, x, grad_x, p)
        s = t * p
        x_next = x + s
        y = grad(x_next) - grad_x
        if np.dot(y, s) > 1e-10:
            H = bfgs_update(H, s, y)
        x = x_next
    return x, np.linalg.norm(grad(x)), i

x0_list = [(1.0, 1.0), (1.2, 1.2), (-1.2, 1), (0.2, 0.8)]
for i, x0 in enumerate(x0_list):
    print(f"Starting point {i+1}: {x0}")
    x, grad_norm, num_iter = newton_method(f, grad_f, hessian_f, np.array(x0))
    print(f"Newton Method: Iterations: {num_iter}, Final iterate: {x}, Gradient norm: {grad_norm}")

    x, grad_norm, num_iter = quasi_newton_method(f, grad_f, np.array(x0))
    print(f"Quasi-Newton Method: Iterations: {num_iter}, Final iterate: {x}, Gradient norm: {grad_norm}\n")

Starting point 1: (1.0, 1.0)
Newton Method: Iterations: 999, Final iterate: [-0.94897959 -2.45918367], Gradient norm: 44.335083754394404
Quasi-Newton Method: Iterations: 15, Final iterate: [3.         2.00000001], Gradient norm: 4.390315402132542e-07

Starting point 2: (1.2, 1.2)
Newton Method: Iterations: 6, Final iterate: [3. 2.], Gradient norm: 2.6752441484954133e-11
Quasi-Newton Method: Iterations: 17, Final iterate: [3.         1.99999999], Gradient norm: 7.085236158428695e-07

Starting point 3: (-1.2, 1)
Newton Method: Iterations: 999, Final iterate: [-1.2  1. ], Gradient norm: 53.11210543746123
Quasi-Newton Method: Iterations: 11, Final iterate: [-2.80511809  3.13131251], Gradient norm: 3.8978054414003885e-07

Starting point 4: (0.2, 0.8)
Newton Method: Iterations: 999, Final iterate: [-0.59097274 -1.66551889], Gradient norm: 20.859414016004283
Quasi-Newton Method: Iterations: 20, Final iterate: [3.00000001 1.99999998], Gradient norm: 6.631955937915146e-07



<p>Based on the optimization results for different starting points, the performance of Newton's method and Quasi-Newton method can be compared. Starting with the initial point (1.0, 1.0), Newton's method required 999 iterations to converge to a final iterate of (-0.94897959, -2.45918367) with a gradient norm of 44.335083754394404, while the Quasi-Newton method achieved convergence in just 15 iterations, reaching a final iterate of (3.0, 2.00000001) with a much smaller gradient norm of 4.390315402132542e-07. Similarly, at starting point (1.2, 1.2), Newton's method converged in 6 iterations to (3.0, 2.0) with a negligible gradient norm of 2.6752441484954133e-11, whereas the Quasi-Newton method required 17 iterations to converge to (3.0, 1.99999999) with a slightly higher gradient norm of 7.085236158428695e-07. For the starting point (-1.2, 1.0), Newton's method again needed 999 iterations to converge to (-1.2, 1.0) with a gradient norm of 53.11210543746123, while the Quasi-Newton method achieved convergence in just 11 iterations, reaching (-2.80511809, 3.13131251) with a gradient norm of 3.8978054414003885e-07. Finally, at (0.2, 0.8), Newton's method required 999 iterations to reach (-0.59097274, -1.66551889) with a gradient norm of 20.859414016004283, whereas the Quasi-Newton method converged in 20 iterations to (3.00000001, 1.99999998) with a similar gradient norm of 6.631955937915146e-07. Overall, the Quasi-Newton method consistently outperforms Newton's method in terms of the number of iterations required for convergence and the achieved gradient norm across different starting points.</p>