### Convex Optimization ~  Ex 5

#### Init and defining variables

In [1]:
import numpy as np
from numpy.linalg import inv
import time
#Calculation of gradient 
def gradf(x,y):
    theta_x = -400*y*x + 400*x**3 + 2*x - 2
    theta_y = 200*y - 200*x**2
    return np.array([theta_x, theta_y])

#Calculation of Hessian Matrix
def hessian(x,y):
    theta_yy = 200
    theta_xy = -400*x # theta_xy equals theta_xy
    theta_xx = 1200*x**2 - 400*y+2
    return np.array([[theta_xx, theta_xy], [theta_xy, theta_yy]])

#calculation of f
def fx(x,y):
    return 100*(y-x**2)**2 + (1-x)**2

#init x0
x0 = np.array([-1.2, 1])
#hessian(x0[0],x0[1])# testing output - > array([[1330.,  480.], [ 480.,  200.]]) , 2x2

#### Method Newton-Raphson

This is how our function looks: 

<img src="rbrok.png" />

By looking to our problem we can observe that the global minimum is inside a long, narrow, parabolic shaped flat path. This makes it difficult to converge to our global minimum.

Possible problems:

    1) Saddle point because of the geometry of the problem-function. NR method searches locally.
    2) Maybe H is not invertible because its not positive.

### NR method calculating pk by solving the linear system

[Solve function](https://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.linalg.solve.html) works as follows:

The solution to the system of linear equations is computed using an LU decomposition with partial pivoting and row interchanges.

reference: https://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.linalg.solve.html

In [4]:
start_time = time.time()
p = -np.dot(np.linalg.inv(hessian(x0[0],x0[1])), gradf(x0[0],x0[1])) # direction 
xold = x0 
xnew = xold + p
tol = np.linalg.norm(xold-xnew, 2)
    
k = 0
while tol > 10**(-10):
    xold = xnew
    hess = hessian(xold[0],xold[1])
    grad = gradf(xold[0],xold[1])
    #np.linalg.solve(hessian(xold[0],xold[1]), -gradf(xold[0],xold[1])) same thing as below, another syntax
    p = np.linalg.solve([[hess[0][0],hess[0][1]],[hess[1][0],hess[1][1]]], [-grad[0],-grad[1]])
    xnew = xold + p
    tol = np.linalg.norm(xold - xnew, 2)
    k += 1
elapsed_time = time.time() - start_time
print("iterations: ",k)
print("X that minimizes our function: ",xnew)
print("f value on that X: ",fx(xnew[0],xnew[1]))
print("Τime to run",elapsed_time)

iterations:  6
X that minimizes our function:  [1. 1.]
f value on that X:  0.0
Τime to run 0.00091552734375


### NR method with a shift calculating pk by solving the linear system

We have a square matrix to inverse 2x2. This means we want his eigenvalues to not be zero so that the matrix is invertible. That's why we add this __a__.

We will use a big __a__ at the beginning and we will reduce it as iterations increase.

We will divide with __a__ with the numbers of iterations k every time

In [6]:
x0 = np.array([-1.2, 1])
a = 20 # init a for shifting
start_time = time.time()
p = -np.dot(np.linalg.inv(hessian(x0[0],x0[1])), gradf(x0[0],x0[1])) # direction 
xold = x0 
xnew = xold + p
tol = np.linalg.norm(xold-xnew, 2)
a = 0.007
k = 0
while tol > 10**(-10):
    xold = xnew
    hess = hessian(xold[0],xold[1]) + a*np.eye(2)
    grad = gradf(xold[0],xold[1])
    p = np.linalg.solve([[hess[0][0],hess[0][1]],[hess[1][0],hess[1][1]]], [-grad[0],-grad[1]])
    xnew = xold + p
    tol = np.linalg.norm(xold - xnew, 2)
    k += 1
    a = a/k # reduce a as iterations increase
elapsed_time = time.time() - start_time
print("iterations: ",k)
print("X that minimizes our function: ",xnew)
print("f value on that X: ",fx(xnew[0],xnew[1]))
print("Τime to run",elapsed_time)

iterations:  8
X that minimizes our function:  [1. 1.]
f value on that X:  0.0
Τime to run 0.002198934555053711


### NR method using inverse of Hessian matrix

Ιn our case the Hessian is just a 2x2 matrix. So Inverting is not computationally expensive. But, for bigger matrices the 1st method of solving the equation is preferred as it is computationally better.

In [19]:
start_time = time.time()
p = -np.dot(np.linalg.inv(hessian(x0[0],x0[1])), gradf(x0[0],x0[1])) # direction 
xold = x0 
xnew = xold + p
tol = np.linalg.norm(xold-xnew, 2)
    
k = 0
while tol > 10**(-10):
    xold = xnew
    #maybe Hessian is not invertible and we could add try-except. But in our case it is.
    p = -np.dot(np.linalg.inv(hessian(xold[0],xold[1])), gradf(xold[0],xold[1]))
    xnew = xold + p
    tol = np.linalg.norm(xold - xnew, 2)
    k += 1
elapsed_time = time.time() - start_time
print(k)
print(xnew)
print(xnew[0],xnew[1])
print("Τime to run",elapsed_time)

6
[1. 1.]
1.0 1.0
Τime to run 0.0009098052978515625


### Newton-Raphson with a shift using Inverse of matrix

In [20]:
x0 = np.array([-1.2, 1])
a = 20 # init a for shifting
start_time = time.time()
p = -np.dot(np.linalg.inv(hessian(x0[0],x0[1])), gradf(x0[0],x0[1])) # direction 
xold = x0 
xnew = xold + p
tol = np.linalg.norm(xold-xnew, 2)
a = 0.007
k = 0
while tol > 10**(-10):
    xold = xnew
    #maybe Hessian is not invertible and we could add try-except. But in our case it is.
    p = -np.dot(np.linalg.inv(hessian(xold[0],xold[1]) + a*np.eye(2)), gradf(xold[0],xold[1]))
    xnew = xold + p
    tol = np.linalg.norm(xold - xnew, 2)
    k += 1
    a = a/k # reduce a as iterations increase
elapsed_time = time.time() - start_time
print(k)
print(xnew)
print(xnew[0],xnew[1])
print("Τime to run",elapsed_time)

8
[1. 1.]
1.0 1.0
Τime to run 0.002001523971557617
