$$\mathbf{Exercise-2}$$

Given function is:$$f(x)=\sqrt{x_1^2+4}+\sqrt{x_2^2+4}$$
gradient of $f(x)$ can be written as:
$$\nabla f(x)=\begin{bmatrix}
  \frac{x_1}{\sqrt{x_1^2 +4}} & \frac{x_2}{\sqrt{x_2^2 +4}}
\end{bmatrix}$$
Hassian matrix can be written as:
$$\nabla^2 f(x)=\begin{bmatrix}
  \frac{4}{(x_1^2 +4)^{3/2}} & 0\\
  0 & \frac{4}{(x_2^2 +4)^{3/2}}
\end{bmatrix}$$

In [None]:
import numpy as np

In [None]:
def evalf(x):
  assert type(x) is np.ndarray
  assert len(x)==2
  return (((x[0]**2)+4)**(1/2))+(((x[1]**2)+4)**(1/2))

In [None]:
def evalg(x):
  assert type(x) is np.ndarray
  assert len(x)==2
  return np.array([x[0]/(x[0]**2+4)**(1/2),x[1]/(x[1]**2+4)**(1/2)])

In [None]:
evalg(np.array([1,2]))

array([0.4472136 , 0.70710678])

In [None]:
def evalh(x):
  assert type(x) is np.ndarray
  assert len(x)==2
  return np.array([[4/(x[0]**2+4)**(3/2),0],[0,4/(x[1]**2+4)**(3/2)]])

In [None]:
evalh(np.array([1,2]))

array([[0.35777088, 0.        ],
       [0.        , 0.1767767 ]])

In [None]:
def d_k(x):
  assert type(x) is np.ndarray
  assert len(x) == 2
  if np.linalg.det(evalh(x)) == 0:
    print('determinant of hessian matrix at point',x,' =',np.linalg.det(evalh(x)))
    raise ValueError('Inverse does not exists. Please check!')
  return np.linalg.inv(evalh(x))

In [None]:
def compute_steplength_backtracking_scaled_direction(x, gradf, alpha_start, rho, gamma): #add appropriate arguments to the function 
  assert type(x) is np.ndarray and len(gradf) == 2 
  assert type(gradf) is np.ndarray and len(gradf) == 2 
  assert type(alpha_start) is float and alpha_start>=0. 
  assert type(rho) is float and rho>=0.
  assert type(gamma) is float and gamma>=0. 
  alpha = alpha_start
  p = - gradf
  D_k = d_k(x)
  r=rho
  y=gamma
  while evalf(x + alpha*np.matmul(D_k,p)) > evalf(x) + y*alpha* (np.matmul(np.matrix.transpose(gradf), np.matmul(D_k,p)) ):
    alpha=alpha*r
  return alpha


In [None]:
BACKTRACKING_LINE_SEARCH = 1
CONSTANT_STEP_LENGTH = 2

In [None]:

def find_minimizer_gdscaling(start_x, tol, line_search_type,*args):
  #Input: start_x is a numpy array of size 2, tol denotes the tolerance and is a positive float value
  assert type(start_x) is np.ndarray and len(start_x) == 2 #do not allow arbitrary arguments 
  assert type(tol) is float and tol>=0 
  x = start_x
  g_x = evalg(x)

  #initialization for backtracking line search
  if(line_search_type == BACKTRACKING_LINE_SEARCH):
    alpha_start = args[0]
    rho = args[1]
    gamma = args[2]
    #print('Params for Backtracking LS: alpha start:', alpha_start, 'rho:', rho,' gamma:', gamma)

  k = 0
  while (np.linalg.norm(g_x) > tol):
    print('iter:',k, ' x:', x, ' f(x):', evalf(x), ' grad at x:', g_x, ' gradient norm:', np.linalg.norm(g_x)) #continue as long as the norm of gradient is not close to zero upto a tolerance tol
    D_k = d_k(x)
    if line_search_type == BACKTRACKING_LINE_SEARCH:
      
      step_length = compute_steplength_backtracking_scaled_direction(x, g_x, alpha_start, rho, gamma) #call the new function you wrote to compute the steplength
      #raise ValueError('BACKTRACKING LINE SEARCH NOT YET IMPLEMENTED')
    elif line_search_type == CONSTANT_STEP_LENGTH: #do a gradient descent with constant step length
      step_length = 1.0
    else:  
      raise ValueError('Line search type unknown. Please check!')
    
    #implement the gradient descent steps here  
    x = np.subtract(x, np.multiply(step_length,np.matmul(D_k, g_x))) #update x = x - step_length*g_x
    k += 1 #increment iteration
    g_x = evalg(x) #compute gradient at new point
  return x, k
  #Complete the code  

#Que.2

In [None]:
my_start_x = np.array([2.0,2.0])
alpha_start = 1.0
rho = 0.5
gamma = 0.5
my_tol= 1e-9

x_opt_bls, k = find_minimizer_gdscaling(my_start_x, my_tol,BACKTRACKING_LINE_SEARCH,1.0,0.5,0.5)
#print(x_opt_bls)

print('\n\nminimizer of the function by using backtracking line search with scaling=',x_opt_bls)
print('the number of iterations using backtracking line search computation with scaling=',k)
print('minimum value of the function by using backtracking line search with scaling=',evalf(x_opt_bls))

iter: 0  x: [2. 2.]  f(x): 5.656854249492381  grad at x: [0.70710678 0.70710678]  gradient norm: 0.9999999999999999


minimizer of the function by using backtracking line search with scaling= [0. 0.]
the number of iterations using backtracking line search computation with scaling= 1
minimum value of the function by using backtracking line search with scaling= 4.0


In [None]:
my_start_x = np.array([2.0,2.0])
alpha_start = 1.0
rho = 0.5
gamma = 0.5
my_tol= 1e-9

x_opt_bls, k = find_minimizer_gdscaling(my_start_x, my_tol,CONSTANT_STEP_LENGTH)
#print(x_opt_bls)

print('\n\nminimizer of the function by using backtracking line search with scaling=',x_opt_bls)
print('the number of iterations using backtracking line search computation with scaling=',k)
print('minimum value of the function by using backtracking line search with scaling=',evalf(x_opt_bls))

iter: 0  x: [2. 2.]  f(x): 5.656854249492381  grad at x: [0.70710678 0.70710678]  gradient norm: 0.9999999999999999
iter: 1  x: [-2. -2.]  f(x): 5.656854249492381  grad at x: [-0.70710678 -0.70710678]  gradient norm: 0.9999999999999999
iter: 2  x: [2. 2.]  f(x): 5.656854249492381  grad at x: [0.70710678 0.70710678]  gradient norm: 0.9999999999999999
iter: 3  x: [-2. -2.]  f(x): 5.656854249492381  grad at x: [-0.70710678 -0.70710678]  gradient norm: 0.9999999999999999
iter: 4  x: [2. 2.]  f(x): 5.656854249492381  grad at x: [0.70710678 0.70710678]  gradient norm: 0.9999999999999999
iter: 5  x: [-2. -2.]  f(x): 5.656854249492381  grad at x: [-0.70710678 -0.70710678]  gradient norm: 0.9999999999999999
iter: 6  x: [2. 2.]  f(x): 5.656854249492381  grad at x: [0.70710678 0.70710678]  gradient norm: 0.9999999999999999
iter: 7  x: [-2. -2.]  f(x): 5.656854249492381  grad at x: [-0.70710678 -0.70710678]  gradient norm: 0.9999999999999999
iter: 8  x: [2. 2.]  f(x): 5.656854249492381  grad at x:

KeyboardInterrupt: ignored

#Observations:

We observed that, In constant step length method the point x is oscillating from $[2,2]$ to $[-2,-2]$. hence it doesn't converge to the optimizer and for Newton's method with backtracking line search, the number of iteration is 1 and the vlaue of optimizer is $[0,0]$ and the optimum value is 4.

#Que.3

In [None]:
def compute_steplength_backtracking(x, gradf, alpha_start, rho, gamma): #add appropriate arguments to the function 
  assert type(x) is np.ndarray and len(gradf) == 2 
  assert type(gradf) is np.ndarray and len(gradf) == 2 
  assert type(alpha_start) is float and alpha_start>=0. 
  assert type(rho) is float and rho>=0.
  assert type(gamma) is float and gamma>=0. 
  
  #Complete the code 
  alpha = alpha_start
  p=rho
  y=gamma
  #implement the backtracking line search
  while evalf(x+alpha*(-gradf)) > evalf(x)-y*alpha*np.dot((gradf.T),gradf):
    alpha=p*alpha


  #print('final step length:',alpha)
  return alpha
  

In [None]:
def find_minimizer_gd(start_x, tol, line_search_type,*args):
  #Input: start_x is a numpy array of size 2, tol denotes the tolerance and is a positive float value
  assert type(start_x) is np.ndarray and len(start_x) == 2 #do not allow arbitrary arguments 
  assert type(tol) is float and tol>=0 
  x = start_x
  g_x = evalg(x)
  #initialization for backtracking line search
  if(line_search_type == BACKTRACKING_LINE_SEARCH):
    alpha_start = args[0]
    rho = args[1]
    gamma = args[2]
    print('Params for Backtracking LS: alpha start:', alpha_start, 'rho:', rho,' gamma:', gamma)

  k = 0
  #print('iter:',k, ' x:', x, ' f(x):', evalf(x), ' grad at x:', g_x, ' gradient norm:', np.linalg.norm(g_x))

  while (np.linalg.norm(g_x) > tol): #continue as long as the norm of gradient is not close to zero upto a tolerance tol
  

    if line_search_type == BACKTRACKING_LINE_SEARCH:
      step_length = compute_steplength_backtracking(x,g_x, alpha_start,rho, gamma) #call the new function you wrote to compute the steplength
      #raise ValueError('BACKTRACKING LINE SEARCH NOT YET IMPLEMENTED')
    elif line_search_type == CONSTANT_STEP_LENGTH: #do a gradient descent with constant step length
      step_length = 1
    else:  
      raise ValueError('Line search type unknown. Please check!')
    
    #implement the gradient descent steps here   
    x = np.subtract(x, np.multiply(step_length,g_x)) #update x = x - step_length*g_x
    k += 1 #increment iteration
    g_x = evalg(x) #compute gradient at new point

    #print('iter:',k, ' x:', x, ' f(x):', evalf(x), ' grad at x:', g_x, ' gradient norm:', np.linalg.norm(g_x))
  return x ,k  


In [None]:
my_start_x = np.array([2.0,2.0])
alpha_start = 1.0
rho = 0.5
gamma = 0.5
my_tol= 1e-9

x_opt_bls, k = find_minimizer_gd(my_start_x, my_tol,BACKTRACKING_LINE_SEARCH,1.0,0.5,0.5)
#print(x_opt_bls)

print('\n\nminimizer of the function by using backtracking line search without scaling=',x_opt_bls)
print('the number of iterations using backtracking line search computation without scaling=',k)
print('minimum value of the function by using backtracking line search without scaling=',evalf(x_opt_bls))

Params for Backtracking LS: alpha start: 1.0 rho: 0.5  gamma: 0.5


minimizer of the function by using backtracking line search without scaling= [7.62525638e-10 7.62525638e-10]
the number of iterations using backtracking line search computation without scaling= 32
minimum value of the function by using backtracking line search without scaling= 4.0


#Observations:
In the previous question we observe that x is diversing for constant step length method and backtracking line search with scaling method takes 1 iteration to terminate while in this question backtracking line search without scaling takes 32 iterations to terminate.By this we can say that backtracking with scaling is more faster than backtraking without scaling.

****
$\textbf{Constant step length line search in Newton's Method :}$

here x is not converging to the optimizer value. it is oscallating between $[2,2]$ to $[-2,-2]$.
****
$\textbf{Backtracking line search with scaling in Newton's Method:}$

Minimizer: $[0.0,0.0]$

Minimum function value: $4.0 $

No. of Iterations: $1$
****
****
$\textbf{Backtracking line search without scaling:}$

Minimizer: $[7.62525638e-10,7.62525638e-10]$

Minimum function value: $4.0 $

No. of Iterations: $32$
****


#Que.4

In [None]:
my_start_x = np.array([8.0,8.0])
alpha_start = 1.0
rho = 0.5
gamma = 0.5
my_tol= 1e-9

x_opt_bls, k = find_minimizer_gdscaling(my_start_x, my_tol,CONSTANT_STEP_LENGTH)
#print(x_opt_bls)

print('\n\nminimizer of the function by using backtracking line search with scaling=',x_opt_bls)
print('the number of iterations using backtracking line search computation with scaling=',k)
print('minimum value of the function by using backtracking line search with scaling=',evalf(x_opt_bls))

iter: 0  x: [8. 8.]  f(x): 16.492422502470642  grad at x: [0.9701425 0.9701425]  gradient norm: 1.3719886811400708
iter: 1  x: [-128. -128.]  f(x): 256.03124809288414  grad at x: [-0.99987795 -0.99987795]  gradient norm: 1.414040960485301
iter: 2  x: [524288. 524288.]  f(x): 1048576.0000076294  grad at x: [1. 1.]  gradient norm: 1.4142135623628054
iter: 3  x: [-3.6028797e+16 -3.6028797e+16]  f(x): 7.205759403792794e+16  grad at x: [-1. -1.]  gradient norm: 1.4142135623730951
iter: 4  x: [1.16920131e+49 1.16920131e+49]  f(x): 2.3384026197294447e+49  grad at x: [1. 1.]  gradient norm: 1.4142135623730951
iter: 5  x: [-3.99583814e+146 -3.99583814e+146]  f(x): 7.99167628880894e+146  grad at x: [-1. -1.]  gradient norm: 1.4142135623730951
determinant of hessian matrix at point [-3.99583814e+146 -3.99583814e+146]  = 0.0


  after removing the cwd from sys.path.


ValueError: ignored

we are getting this error because here we are using  $\mathbf{D}^k = (\nabla^2f(\mathbf{x}))^{-1} $, so we need to find the inverse of the hassian matrix which is only possible if hassian is non-singular.but here our starting point is $[8.0,8.0]$ and while converging to optimizer there exist a value for which the hassian is singular.

In [None]:
my_start_x = np.array([8.0,8.0])
alpha_start = 1.0
rho = 0.5
gamma = 0.5
my_tol= 1e-9

x_opt_bls, k = find_minimizer_gdscaling(my_start_x, my_tol,BACKTRACKING_LINE_SEARCH,1.0,0.5,0.5)
#print(x_opt_bls)

print('\n\nminimizer of the function by using backtracking line search with scaling=',x_opt_bls)
print('the number of iterations using backtracking line search computation with scaling=',k)
print('minimum value of the function by using backtracking line search with scaling=',evalf(x_opt_bls))

iter: 0  x: [8. 8.]  f(x): 16.492422502470642  grad at x: [0.9701425 0.9701425]  gradient norm: 1.3719886811400708
iter: 1  x: [-0.5 -0.5]  f(x): 4.123105625617661  grad at x: [-0.24253563 -0.24253563]  gradient norm: 0.3429971702850177
iter: 2  x: [-0.234375 -0.234375]  f(x): 4.027372165879384  grad at x: [-0.11639103 -0.11639103]  gradient norm: 0.16460177506779788
iter: 3  x: [-0.11557817 -0.11557817]  f(x): 4.006673590120265  grad at x: [-0.05769283 -0.05769283]  gradient norm: 0.08158998647858538
iter: 4  x: [-0.0575961 -0.0575961]  f(x): 4.001658311393147  grad at x: [-0.02878611 -0.02878611]  gradient norm: 0.04070971277400298
iter: 5  x: [-0.02877417 -0.02877417]  f(x): 4.000413954866833  grad at x: [-0.01438559 -0.01438559]  gradient norm: 0.020344301811841013
iter: 6  x: [-0.0143841 -0.0143841]  f(x): 4.000103449894273  grad at x: [-0.00719187 -0.00719187]  gradient norm: 0.010170834833290178
iter: 7  x: [-0.00719168 -0.00719168]  f(x): 4.000025860048939  grad at x: [-0.00359

here, number of iterations taken by backtracking line search with scaling to terminate is 13 and the value of optimizer is approximately $[0,0]$ and the optimal value is 4.0

#Que.5

In [None]:
my_start_x = np.array([8.0,8.0])
alpha_start = 1.0
rho = 0.5
gamma = 0.5
my_tol= 1e-9

x_opt_bls, k = find_minimizer_gd(my_start_x, my_tol,BACKTRACKING_LINE_SEARCH,1.0,0.5,0.5)
#print(x_opt_bls)

print('\n\nminimizer of the function by using backtracking line search without scaling=',x_opt_bls)
print('the number of iterations using backtracking line search computation with scaling=',k)
print('minimum value of the function by using backtracking line search with scaling=',evalf(x_opt_bls))

Params for Backtracking LS: alpha start: 1.0 rho: 0.5  gamma: 0.5


minimizer of the function by using backtracking line search without scaling= [8.3177047e-10 8.3177047e-10]
the number of iterations using backtracking line search computation with scaling= 39
minimum value of the function by using backtracking line search with scaling= 4.0


$\textbf{Backtracking line search with scaling in Newton's Method:}$

Minimizer: $[2.83764947e-12,2.83764947e-12]$

Minimum function value: $ 4.0 $

No. of Iterations: $13$
****
$\textbf{ Backtracking line search (without scaling) in Gradient Descent Method}$

Minimizer: $[8.3177047e-10,8.3177047e-10]$

Minimum function value: $ 4.0 $

No. of Iterations: $39$
****

here, we can see that the major difference is in the number of iteration taken by the method to terminate for starting point $[8.0,8.0]$, backtracking line search without scaling is taking 39 iteration while backtraking line search with scaling is taking only 13 iteration.