Here, the function under consideration is: \
 \\
$\hspace6ex f(\mathbf{x}) = 100x_1^2 +  0.001x_2^4$.  \
 \\
$\Rightarrow \ Hessian \ = \nabla^2 f(\mathbf{x}) =
\begin{bmatrix}
  f_{x_1^2}(\mathbf{x}) & 
    f_{x_1x_2}(\mathbf{x})  \\
  f_{x_2x_1}(\mathbf{x}) & 
    f_{x_2^2}(\mathbf{x})
\end{bmatrix}
=
\begin{bmatrix}
  200 & 0 \\ 0 & 0.012x_2^2
\end{bmatrix} $

In [61]:
import numpy as np 
 
#method to find Hessian matrix
def evalh(x): 
  assert type(x) is np.ndarray 
  assert len(x) == 2 
  return np.array([[200, 0], [0, 0.012*x[1]**2]])

In [62]:
def evalf(x):  
  #Input: x is a numpy array of size 2 
  assert type(x) is np.ndarray and len(x) == 2 #do not allow arbitrary arguments 
  #after checking if the argument is valid, we can compute the objective function value
  #compute the function value and return it 
  return 100*x[0]**2 + 0.001*x[1]**4

In [63]:
def evalg(x):  
  #Input: x is a numpy array of size 2 
  assert type(x) is np.ndarray and len(x) == 2 #do not allow arbitrary arguments 
  #after checking if the argument is valid, we can compute the gradient value
  #compute the gradient value and return it 
  return np.array([200*x[0], 0.004*x[1]**3])

In [64]:
def compute_D_k(x):
  assert type(x) is np.ndarray
  assert len(x) == 2
  if np.linalg.det(evalh(x)) == 0:
    raise ValueError('Determinant does not exist. Please check!!')
  return np.linalg.inv(evalh(x))  #computing inverse of Hessian.

In [65]:
def compute_steplength_backtracking_Newton_method(x, gradf, alpha_start, rho, gamma): #add appropriate arguments to the function 
  assert type(x) is np.ndarray and len(gradf) == 2 
  assert type(gradf) is np.ndarray and len(gradf) == 2 
  #assert type(direction) is np.ndarray and len(direction) == 2 
  assert type(alpha_start) is float and alpha_start>=0. 
  assert type(rho) is float and rho>=0.
  assert type(gamma) is float and gamma>=0. 
  alpha = alpha_start
  D_k = compute_D_k(x)
  while evalf(x + alpha*np.matmul(D_k,-gradf)) > evalf(x) + gamma*alpha*(np.matmul(np.matrix.transpose(gradf), np.matmul(D_k,-gradf)) ):
    alpha = rho*alpha
  return alpha

In [66]:
#line search type 
CONSTANT_STEP_LENGTH = 1
BACKTRACKING_LINE_SEARCH = 2
BACKTRACKING_LINE_SEARCH_SCALING = 3

In [67]:
#complete the code for gradient descent with scaling to find the minimizer
 
def find_minimizer_Newton_method(start_x, tol, line_search_type, *args):
  #Input: start_x is a numpy array of size 2, tol denotes the tolerance and is a positive float value
  assert type(start_x) is np.ndarray and len(start_x) == 2 #do not allow arbitrary arguments 
  assert type(tol) is float and tol>=0 
  x = start_x
  g_x = evalg(x)
 
  #initialization for backtracking line search
  if(line_search_type == BACKTRACKING_LINE_SEARCH):
    alpha_start = args[0]
    rho = args[1]
    gamma = args[2]
    #print('Params for Backtracking LS: alpha start:', alpha_start, 'rho:', rho,' gamma:', gamma)
 
  k = 0
  while (np.linalg.norm(g_x) > tol): #continue as long as the norm of gradient is not close to zero upto a tolerance tol
    D_k = compute_D_k(x)
    import scipy
    from scipy.linalg import sqrtm
    d = scipy.linalg.sqrtm(D_k)
    if line_search_type == CONSTANT_STEP_LENGTH: #do a gradient descent with constant step length
      step_length = 1.0
    elif line_search_type == BACKTRACKING_LINE_SEARCH:
      step_length = compute_steplength_backtracking_Newton_method(x, g_x, alpha_start, rho, gamma) #call the new function you wrote to compute the steplength
      #raise ValueError('BACKTRACKING LINE SEARCH NOT YET IMPLEMENTED') 
    else:  
      raise ValueError('Line search type unknown. Please check!')    
    #implement the gradient descent steps here  
    x = np.subtract(x, np.multiply(step_length,np.matmul(D_k, g_x))) #update x = x - step_length*g_x
    k += 1 #increment iteration
    g_x = evalg(x) #compute gradient at new point
  return x, k, evalf(x)

In [68]:
my_start_x = np.array([1.0,1.0])
my_tol= 1e-9

**3. Answer  :**

In [69]:
print("For Newton's Method with CONSTANT_STEP_LENGTH procedure :")
x_opt, k, f_value = find_minimizer_Newton_method(my_start_x, my_tol, CONSTANT_STEP_LENGTH)
print("Minimizer = ",x_opt,",  Iteration = ",k,",  Minimum function value = ", f_value) 
 
print("\nFor Newton's Method with BACKTRACKING_LINE_SEARCH :")
x_opt_bls, k, f_value = find_minimizer_Newton_method(my_start_x, my_tol, BACKTRACKING_LINE_SEARCH, 1.0, 0.5,0.5)
print("Minimizer = ",x_opt_bls,",  Iteration = ", k , ",  Minimum function value = ",f_value)

For Newton's Method with CONSTANT_STEP_LENGTH procedure :
Minimizer =  [0.         0.00513823] ,  Iteration =  13 ,  Minimum function value =  6.970349091039817e-13

For Newton's Method with BACKTRACKING_LINE_SEARCH :
Minimizer =  [0.         0.00513823] ,  Iteration =  13 ,  Minimum function value =  6.970349091039817e-13


**COMMENTS :** \
**a) Here from the above obtained result, it is clear that the number of iterations is same (i.e. 13) in both the cases.** \

**b) Also, in both cases, the minimizer is (0, 0.00513823)  and the minimum function value is 6.970349091039817e-13.**  

**4. SOLUTION :**

In [70]:
def compute_D_k_diagonal(x):
  assert type(x) is np.ndarray
  assert len(x) == 2
  #compute and return D_k
  return np.array([[1/200, 0],[0, 1/(0.012*x[1]**2)]])    ## this is same as the inverse of Hessian in this question.

In [71]:
#Complete the module to compute the steplength by using the backtracking line search without scaling.
def compute_steplength_backtracking(x, gradf, alpha_start, rho, gamma): #add appropriate arguments to the function 
  assert type(x) is np.ndarray and len(x) == 2 
  assert type(gradf) is np.ndarray and len(gradf) == 2 
  
  alpha = alpha_start
  gr_t = np.matrix.transpose(gradf)
  #implement the backtracking line search
  while evalf(np.add(x,-alpha*gradf)) > evalf(x)-gamma*alpha*np.matmul(gr_t, gradf):
    alpha = rho*alpha
  #print('final step length:',alpha)
  return alpha

In [72]:
def compute_steplength_backtracking_scaling(x, gradf, direction, alpha_start, rho, gamma): #add appropriate arguments to the function 
  assert type(x) is np.ndarray and len(gradf) == 2 
  assert type(gradf) is np.ndarray and len(gradf) == 2 
  assert type(direction) is np.ndarray and len(direction) == 2 
  assert type(alpha_start) is float and alpha_start>=0. 
  assert type(rho) is float and rho>=0.
  assert type(gamma) is float and gamma>=0. 
  alpha = alpha_start
  gr_t = np.matrix.transpose(gradf)
  #direction = -(D_k)*gradf
  #implement the backtracking line search
  while evalf(np.add(x,alpha*direction)) > evalf(x)+gamma*alpha*np.matmul(gr_t, direction):
    alpha = rho*alpha
  #print('final step length:',alpha)
  return alpha

In [73]:
#complete the code for gradient descent to find the minimizer
def find_minimizer_gd(start_x, tol, line_search_type, *args):
  #Input: start_x is a numpy array of size 2, tol denotes the tolerance and is a positive float value
  assert type(start_x) is np.ndarray and len(start_x) == 2 #do not allow arbitrary arguments 
  assert type(tol) is float and tol>=0 
  # construct a suitable A matrix for the quadratic function 
  x = start_x
  A = (1/2)*evalh(x)
  g_x = evalg(x)
 
  #initialization for backtracking line search
  if(line_search_type == BACKTRACKING_LINE_SEARCH):
    alpha_start = args[0]
    rho = args[1]
    gamma = args[2]
    #print('Params for Backtracking LS: alpha start:', alpha_start, 'rho:', rho,' gamma:', gamma)
 
  k = 0
  #print('iter:',k, ' x:', x, ' f(x):', evalf(x), ' grad at x:', g_x, ' gradient norm:', np.linalg.norm(g_x))
 
  while (np.linalg.norm(g_x) > tol): #continue as long as the norm of gradient is not close to zero upto a tolerance tol
  
    if line_search_type == BACKTRACKING_LINE_SEARCH:
      step_length = compute_steplength_backtracking(x,g_x, alpha_start,rho, gamma) #call the new function you wrote to compute the steplength
      #raise ValueError('BACKTRACKING LINE SEARCH NOT YET IMPLEMENTED')
    elif line_search_type == CONSTANT_STEP_LENGTH: #do a gradient descent with constant step length
      step_length = 0.1
    else:  
      raise ValueError('Line search type unknown. Please check!')
    
    #implement the gradient descent steps here   
    x = np.subtract(x, np.multiply(step_length,g_x)) #update x = x - step_length*g_x
    k += 1 #increment iteration
    g_x = evalg(x) #compute gradient at new point
 
    #print('iter:',k, ' x:', x, ' f(x):', evalf(x), ' grad at x:', g_x, ' gradient norm:', np.linalg.norm(g_x))
  return x , k , evalf(x)

In [74]:
#complete the code for gradient descent with scaling to find the minimizer
 
def find_minimizer_gdscaling(start_x, tol, line_search_type, *args):
  #Input: start_x is a numpy array of size 2, tol denotes the tolerance and is a positive float value
  assert type(start_x) is np.ndarray and len(start_x) == 2 #do not allow arbitrary arguments 
  assert type(tol) is float and tol>=0 
  x = start_x
  #A = evalh(x) 
  g_x = evalg(x)  
  D_k = compute_D_k_diagonal(x)

  #initialization for backtracking line search
  if(line_search_type == BACKTRACKING_LINE_SEARCH_SCALING):
    alpha_start = args[0]
    rho = args[1]
    gamma = args[2]
 
  k = 0
  while (np.linalg.norm(g_x) > tol):
    direction = -np.matmul(D_k, g_x)        # direction varies with each iteration.
    if line_search_type == BACKTRACKING_LINE_SEARCH_SCALING:
      step_length = compute_steplength_backtracking_scaling(x, g_x, direction, alpha_start, rho, gamma)
    else:  
      raise ValueError('Line search type unknown. Please check!')
    #implement the gradient descent steps here   
    x = np.add(x, np.multiply(step_length,direction)) #update x = x + step_length*direction
    k += 1 #increment iteration
    D_k = compute_D_k_diagonal(x)           # Its Necessary to update the direction.
    g_x = evalg(x) #compute gradient at new point
  return x , k , evalf(x)

**4.ANSWER:**

In [75]:
#check gradient descent with scaling and backtracking line search 
print("\nFor BACKTRACKING_LINE_SEARCH_WITH_SCALING:")
x_opt_bls, k, f_value = find_minimizer_gdscaling(my_start_x, my_tol, BACKTRACKING_LINE_SEARCH_SCALING, 1.0, 0.5,0.5)
print("Minimizer = ",x_opt_bls,",Iteration = ", k , ", Minimum function value = ",f_value)


For BACKTRACKING_LINE_SEARCH_WITH_SCALING:
Minimizer =  [0.         0.00513823] ,Iteration =  13 , Minimum function value =  6.970349091039817e-13


In [None]:
#check gradient descent without scaling and with backtracking line search 
print("\nFor BACKTRACKING_LINE_SEARCH WITHOUT_SCALING :")
x_opt_bls, k, f_value = find_minimizer_gd(my_start_x, my_tol, BACKTRACKING_LINE_SEARCH, 1.0, 0.5,0.5)
print("Minimizer = ",x_opt_bls,",Iteration = ", k , ", Minimum function value = ",f_value)


For BACKTRACKING_LINE_SEARCH WITHOUT_SCALING :
Minimizer =  [1.41173892e-12 6.21297178e-03] ,Iteration =  226638545 , Minimum function value =  1.4900386194863004e-12


From the previous question(i.e. 3) we have the following outputs for the two variants of Newton's Method: \
**1. For Newton's Method with CONSTANT_STEP_LENGTH procedure :** \
Minimizer =  [0. , 0.00513823] , 
Minimum function value =  6.970349091039817e-13, 
Iteration =  13 

**2. For Newton's Method with BACKTRACKING_LINE_SEARCH :** \
Minimizer =  [0.  , 0.00513823] , 
Minimum function value =  6.970349091039817e-13, 
Iteration =  13  \
 \\
Now from question no.4 we have the following outputs:

**1. For gradient descent algorithm (with scaling using a diagonal matrix) with backtracking line search:** \
Minimizer =  [0.  ,  0.00513823] ,
Minimum function value =  6.970349091039817e-13,
Iteration =  13, \\
**2. For gradient descent algorithm (without scaling) with backtracking line search:** \
Minimizer =  [1.41173892e-12 6.21297178e-03] ,
Minimum function value =  1.4900386194863004e-12,
Iteration =  226638545 \\
 \\
**COMMENTS:** \\
1. Number of Iterations , Minimizer and the Minimum function value remains same in case of both the variants of Newton's Method and the gradient descent algorithm (with scaling using a diagonal matrix) with backtracking line search. \
2.Whereas in case of gradient descent algorithm (without scaling) with backtracking line search the number of Iterations is considerably large (i.e. 226638545). Also minimizer and the minimum function value is different from the previous methods. \
3. **So, here it is obvious that, scaling with a diagonal matrix or applying Newton's Method, helps in converging to the minimum value at a very fast rate.** 



