# 9

This part focuses on optimization of multivariate functions. Lets start with the initializations as usual.

In [None]:
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

mpl.rcParams['figure.figsize'] = (5.0, 5.0)

%matplotlib inline

<div class="alert alert-info">
    
## Derivatives of Multivariate Functions
</div>

In the lecture, we have discussed gradient and Hessian of multivariate functions. Let's consider the following function $f:\mathbb{R}^2 \rightarrow \mathbb{R}$ defined by

$$
f(x,y) = \sin(x^2) - \cos(y)^4 + \frac{1}{2}xy
$$

We can plot the graph of this function, i.e. the surface over the $(x,y)$-plane whose height is given by $f(x,y)$, over the rectangle $[-1,1]^2$:

In [None]:
def func(x, y):
    """evaluate f at (x,y)"""
    return np.sin(x**2) - np.cos(y)**4 + 0.5*x*y

# define a grid 
X, Y = np.mgrid[-1:1:50j,-1:1:50j]

# make a surface plot of the graph of f
fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(111, projection='3d')

ax.plot_surface(X, Y, func(X, Y), cmap="viridis"); 

Looking closely, it appears that $f$ has a minimum at $(0,0)$. Let's verify that this is really the case. To do so, we need two ingredients:

1. The gradient $\nabla f(x,y) = (\frac{\partial f}{\partial x}(x,y), \frac{\partial f}{\partial y}(x,y))$, which gives the local rates of change in the coordinate direction, and can also be interpreted as the direction of steepest ascent. A necessary condition for a minimum (or any other extremal point) at $(x,y)$ is that

$$
\nabla f(x,y) = 0.
$$

2. If the gradient is zero, a sufficient condition for a minimum in $(x,y)$ is that the Hessian matrix of $f$,
$$
H_f(x,y) = \begin{pmatrix} 
\frac{\partial^2 f}{\partial x\partial x}(x,y) &
\frac{\partial^2 f}{\partial x\partial y}(x,y) \\
\frac{\partial^2 f}{\partial y\partial x}(x,y) &
\frac{\partial^2 f}{\partial y\partial y}(x,y)
\end{pmatrix},
$$
is positive definite. Since it is symmetric ($\frac{\partial^2 f}{\partial x\partial y} = \frac{\partial^2 f}{\partial x\partial y}$), this is equivalent to all of its eigenvalues being positive.

<div class="alert alert-success">

**Task 1:** Verify, **analytically**, that $(0,0)$ is indeed a minimum of $f$.

Steps:
- Compute gradient and Hessian matrix of $f$ analytically.
- Validate that $\nabla f(0,0) = 0$.
- Evaluate $H_f(0,0)$ and compute its eigenvalues, by finding the roots of its characteristic polynomial.
- Check that all eigenvalues are positive.

Note: there is no implementation in this task -- use pen and paper, or a symbolic algebra tool of your choice.
</div>

**Solution:**

Gradient:
$$
\nabla f(x,y) = \begin{pmatrix}
2x\cdot cos(x^2) + \frac{1}{2}y \\
4 cos(y)^3sin(y) + \frac{1}{2}x)
\end{pmatrix}
$$

Gradient at $(0,0)$:
$$
\nabla f(0,0) = (0,0)
$$

Hessian:
$$
H_f(x,y) = \begin{pmatrix}
2cos(x^2)-4x^2sin(x^2) & \frac{1}{2} \\
\frac{1}{2} & 4cos^4(y) - 12cos^2(y)sin^2(y)
\end{pmatrix}
$$

Hessian at $(0,0)$:
$$
H_f(0,0) = \begin{pmatrix}
2 & \frac{1}{2} \\
\frac{1}{2} & 4
\end{pmatrix}
$$

Characteristic polynomial of $H_f(0,0)$:
$$
\chi(H_f(0,0))\ =\ \lambda^2 - 2\lambda - \frac{31}{4}
$$

Roots of $\chi(H_f(0,0))$:
$$
\lambda_{1} = 1.881966
$$
$$
\lambda_{2} = 4.118034
$$

<div class="alert alert-success">
    
**Task 2:** Verify, **numerically**, that $(0,0)$ is indeed a minimum of $f$.

Steps:
- Implement functions for gradient (`gradient`) and Hessian (`Hessian`) of $f$ using the formulas from Task 1.
- Pass the assertions to validate that the gradient vanishes and the Hessian is positive definite.
</div>

In [None]:
def gradient(x, y):
    """evaluate the gradient of f at (x,y)"""
    dx = 2*x*np.cos(x**2) + y/2.0
    dy = 4*np.cos(y)**3*np.sin(y) + x/2.0
    return np.array([dx, dy])

def hessian(x, y):
    """evaluate the Hessian matrix of f at (x,y)"""
    dx = 2*np.cos(x**2) - 4*x**2*np.sin(x**2)
    dy = 4*np.cos(y)**4 - 12*np.cos(y)**2*np.sin(y)**2
    xdy = 1/2.0
    return np.array([[dx, xdy], [xdy, dy]])

# TODO: verify (numerically) that $(0,0)$ is a minimum
assert np.allclose(gradient(0, 0), 0.0)

eig, _ = np.linalg.eig(hessian(0, 0))
assert np.all( eig > 0 )

Let's look at two ways to visualize $f$ and its derivative.
    
One possibility is to use the contour plot. A *contour* or *level set* is the pre-image (German: *Urbild*) of some $v\in\mathbb R$ under $f$. In other words, given a value $v$, we can identify the set of all points $(x,y)$ such that $f(x,y) = v$. More formally, the level set is 

$$
I_f(v) := \big\{\ (x,y) \in \mathbb{R}^2\ |\ f(x,y) = v\ \big\}.
$$

Matplotlib's `contour` function takes a grid of sample points for $x$ and $y$, as well as the function value $f(x,y)$ at these points, and computes $I_f(v)$ for a set of suitably chosen values of $v$:

In [None]:
plt.contour(X, Y, func(X, Y));

Contours are quite analogous to height lines on a geographical map.

To visualize the gradients, we can use the `pyplot.quiver` function, which draws arrows at a selected set of points.

<div class="alert alert-success">

**Task 3:** Use the `pyplot.quiver` function to visualize $\nabla f$ on a $25\times 25$-grid over $[-1,1]^2$, on top of the contours.

Hint: Use `numpy.mgrid` as above to obtain the grid.
</div>

In [None]:
# Obtain the grid
x_pts, y_pts = np.mgrid[-1:1:25j, -1:1:25j]
# get the values from calculating the gradient
x, y = gradient(x_pts, y_pts)
# plot function ontop of the contours
plt.contour(X, Y, func(X, Y))
# visualize f
plt.quiver(x_pts, y_pts, x, y)

<div class="alert alert-info">

## Numerical Optimization - Steepest Descent Method
</div>

In the lecture, we have seen that the steepest descent method can be used to find the minimum of a function.
The task in this exercise is to analyze the steepest descent method to $f$: starting from some initial position $x_0,y_0$, we choose a step size $\alpha$ and perform the iteration

$$
(x_{i+1}, y_{i+1}) = (x_i,y_i) - \alpha \nabla f(x_i, y_i),
$$

i.e. we move along the reverse gradient to decrease the function value. The iteration is terminated once the change between successive iterates becomes small, i.e. when $\nabla f(x_i,y_i) < \varepsilon$ for some small $\varepsilon > 0$.

<div class="alert alert-success">

**Task 4:** Write a function `gradient_descent` below that performs the gradient descent procedure from the initial position $x_0,y_0$ and returns the sequence of the $(x_i,y_i)$. Use the provided visualization to validate that the iteration reaches $f$'s minimum at $(0,0)$.
</div>

In [None]:
def gradient_descent(x0, y0, alpha = 0.1, eps=1e-6):
    """perform gradient descent starting at x0,y0, with step size alpha and error tolerance eps."""
    path = [(x0, y0)]
    
    grad = gradient(x0,y0)
    
    while np.any(abs(grad) > eps):
        x0 = x0 - alpha * grad[0]
        y0 = y0 - alpha * grad[1]
        path.append((x0,y0))
        grad = gradient(x0,y0)
        
    return np.array(path)


plt.contour(X, Y, func(X, Y), colors='gray', linestyles='solid', linewidths=.5)

path = gradient_descent(0.1, 0.9)
plt.plot( path[:,0], path[:,1], 'r');

print("Number of iterations:", len(path))

Let's do an experiment: we perform the gradient descent for $(x_0,y_0)$ on a uniformly spaced grid, and record the path for every choice of starting point. Then, we plot all of these paths.

In [None]:
X0, Y0 = np.mgrid[-1:1:25j,-1:1:25j]

paths = []

for x0, y0 in zip(X0.ravel(), Y0.ravel()):
    paths.append( gradient_descent(x0, y0) )

plt.contour(X, Y, func(X, Y), colors='gray', linestyles='solid', linewidths=.5)

for path in paths:
    plt.plot( path[:,0], path[:,1], 'r', lw=1, alpha=0.5 )

Going one step further, we can color-code the lengths of the paths:

In [None]:
plt.contour(X, Y, func(X, Y), colors='gray', linestyles='solid', linewidths=.5)

minlen = min([len(p) for p in paths])
maxlen = max([len(p) for p in paths])
print( "minimal path length:", minlen )
print( "maximal path length:", maxlen )

import matplotlib.colors as colors
import matplotlib.cm as cmx

cmap = plt.get_cmap('viridis')
cnorm  = colors.Normalize(vmin=minlen, vmax=maxlen)
smap = cmx.ScalarMappable(norm=cnorm, cmap=cmap)

for path in paths:
    plt.plot( path[:,0], path[:,1], 'r', lw=1, alpha=0.5, color=smap.to_rgba(len(path)))
    
plt.colorbar(smap);

As we can see, some of the paths converge fairly quickly, other take much longer.

<div class="alert alert-info">

## Numerical Optimization - Newton's Method
</div>

In the lecture, we have seen that the Newton's method converges faster than the steepest descent method. The task in this exercise is to try it for finding the minimum of $f(x,y)$.

<div class="alert alert-success">
    
**Task 5:** Write a function `Newton_method` that computes the minimum of $f$ at $(x,y)$ using the damped Newton method. Try different values of $\alpha$ to find the setting which gives a good convergence. Compare your results with those that are obtained by the steepest-descent method.
</div>

In [None]:
# Wiki https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm
def Newton_method(x0, y0, alpha = 0.1, eps=1e-6): # alpha Schrittweite, eps Fehlergrenze
    """
    Führt die gedämpfte Newton-Methode aus.
    
    Parameter:
    x0, y0 (Float): Anfangsschätzung für x und y.
    alpha (Float): Dämpfungsfaktor.
    eps (Float): Fehlertoleranz für die Anhaltebedingung.
    
    Rückgabe:
    numpy.ndarray: Array von Iteraten (x, y).
    """
    # Definiere die Funktion f und die Jacobian J
    def f(x):
        return np.array([x[0]**2 - x[1], x[1]**2 - x[0]])
    
    def J(x):
        return np.array([[2*x[0], -1], [-1, 2*x[1]]])
    
    # Initialisiere den Startpunkt und path
    x = np.array([x0, y0])
    path = [x.copy()]
    
    while True:
        # Berechne den nächsten Schritt
        s = np.linalg.inv(J(x)).dot(f(x))
        x_next = x - alpha * s
        path.append(x_next.copy())
        
        # Prüfe die Endkondition
        if np.linalg.norm(f(x_next)) < eps:
            break
        
        x = x_next
    
    return np.array(path)

plt.contour(X, Y, func(X, Y), colors='gray', linestyles='solid', linewidths=.5)

path = Newton_method(0.1, 0.4)
plt.plot( path[:,0], path[:,1], 'r');

print("Number of iterations:", len(path))
""" 
Die Abbildung zeigt die Konturlinien der Funktion f(x,y) in Grau 
und den Pfad der Iterationen des Newton-Verfahrens in Rot. 
Der Beginn ist bei dem Punkt (0.1, 0.4)(0.1,0.4). 
Der rote Pfad verfolgt die Iterationen, wie sie sich dem Minimum der Funktion annähern.

Verglichen mit steepest-descent method:
Bei der steepest-descent method werden 67 Iterationen benötigt, bei Newton's Method 124.
""" 
