# Steepest descent and Newton's method

## Let us define the same function as on the previous lesson for testing

In [112]:
def f_simple(x):
    return (x[0] - 10.0)**2 + (x[1] + 5.0)**2+x[0]**2

## Automatic differentiation in Python

Import automatic differentiation package for Python

Needs to be installed typing
```
pip install ad
```

In [6]:
import ad

You can ask for gradient and hessian using the <pre>ad.gh</pre> function. Let us do that for the function <it>f</it>  that we defined.

In [115]:
grad_f, hess_f = ad.gh(f_simple)

In [116]:
print "At the point (1,2) gradient is ", grad_f([1,2]), " and hessian is ",hess_f([1,2])

At the point (1,2) gradient is  [-16.0, 14.0]  and hessian is  [[4.0, 0.0], [0.0, 2.0]]


## Base algorithm for the steepest descent and Newton's algorithms
**Input:** function $f$ to be optimized, starting point $x_0$, step length rule $alpha$, stopping rule $stop$  
**Output:** A solution $x^*$ that is close to a locally optimal solution
```
set f_old as a big number and f_new as f(x*)
while a stopping criterion has not been met:
    f_old = f_new
    determine search direction d_h according to the method
    determine the step length alpha
    set x = x + alpha *d_h
    f_new = f(x)
return x
```

The way to determine search direction distinguishes steepest descent algorithm and the Newton algorithm. Different stopping rules and step sizes can be mixed and matched with both algorithms.

## Steepest Descent algorithm for unconstrained optimization

In the steepest descent algorithm, the search direction is determined by the negative of the gradient $-\nabla f(x)$.

### Code in Python

Let us use a simple stopping rule, where we stop when the change is not bigger than precision and step size is constant. 

In [100]:
def steepest_descent(f,start,step,precision):
    f_old = float('Inf')
    x = np.array(start)
    steps = []
    f_new = f(x)
    while abs(f_old-f_new)>precision:
        f_old = f_new
        d = -np.array(ad.gh(f)[0](x))
        x = x+d*step
        f_new = f(x)
        steps.append(list(x))
    return x,f_new,steps

###Solve the problem using the Python function

In [117]:
start = [2.0,-10.0]
(x_value,f_value,steps) = steepest_descent(f_simple,start,0.2,0.0001)
print "Optimal solution is ",x_value

Optimal solution is  [ 5.         -5.00653035]


Plot the steps of solving

In [102]:
import matplotlib.pyplot as plt

def plot_2d_steps(steps,start):
    myvec = np.array([start]+steps).transpose()
    plt.plot(myvec[0,],myvec[1,],'ro')
    for label,x,y in zip([str(i) for i in range(len(steps)+1)],myvec[0,],myvec[1,]):
        plt.annotate(label,xy = (x, y))
    return plt

In [111]:
plot_2d_steps(steps,start).show()

##Newton's method

Based on setting the research direction as $-[Hf(x)]^{-1}\nabla f(x)$.

In one-dimensional case, it is easy to see that since
$$f(x+\Delta x)\approx f(x)+f'(x)\Delta x+\frac12f''(x)\Delta x^2$$
with the Taylor series.

We want to find $x$ such that $f(x)$ is at minimum and, thus, we seek to solve the equation that sets the derivative of this expression with respect to $\Delta x$ equal to zero:

$$ 0 = \frac{d}{d\Delta x} \left(f(x_n)+f'(x_n)\Delta x+\frac 1 2 f''(x_n) \Delta x^2\right) = f'(x_n)+f'' (x_n) \Delta x.$$

The solution of the above equation is $\Delta x=-f'(x)/f''(x)$. Thus, the best approximation of $x$ as the minimum is $x-f'(x)/f''(x)$.


In [92]:
def newton(f,start,step,precision):
    f_old = float('Inf')
    x = np.array(start)
    steps = []
    f_new = f(x)
    while abs(f_old-f_new)>precision:
        f_old = f_new
        H_inv = np.linalg.inv(np.matrix(ad.gh(f)[1](x)))
        d = (-H_inv*(np.matrix(ad.gh(f)[0](x)).transpose())).transpose()
        x = np.array(x+d*step)[0] #Change the type back to array so that we can use it in our function
        f_new = f(x)
        steps.append(list(x))
    return x,f_new,steps

In [118]:
start = [2.0,-10.0]
(x_value,f_value,steps) = newton(f_simple,start,1,0.01)
print "Optimal solution is ",x_value

Optimal solution is  [ 5. -5.]


In [98]:
plot_2d_steps(steps,start).show()

In [20]:
import test_problems

In [21]:
test_problems.rosenbrock(start)

12101.0

In [25]:
(x_value,f_value,steps) = steepest_descent(test_problems.rosenbrock,start,0.1,1e-10)

In [30]:
print [start]+steps

[[2.0, -10.0], [-438.20000000000005, 210.0], [3385118541.1600018, 3853942.8000000007], [-1.5516066618863538e+30, 2.2918055061862639e+20], [1.4941868092114043e+92, 4.8149664664202277e+61], [nan, nan]]
