Consider the function $f(x_1, x_2) = x_1^2 + 2x_2^2 + x_1 x_2 + 4x_1 - 2x_2 + 5$.

We want to find the minimum. We could do it analytically, it's just a quadratic function, but quite soon we'll start to work with function for which we can't just find the optimum with pen and paper. So let's use an iterative algorithm. 

0. Open up VS code, or you're favorite IDE.
1. Pick a random starting point. 
2. In what direction should you move so that the function value decreases in the fastest way? 
3. Move in that direction (you may consider first multiplying that direction by some small value, let's say 0.1, that's the so called $\alpha$'learning rate' / 'step size', which we'll learn about later, but it basically controls how big steps you take. Too big step size -> you may overshoot the optimum value and diverge, too small, you may take too long time to converge, but often **"Let it be late, let it be almond"** principle holds) 
4. Keep on iterating like that. Can you come up with a stopping criteria? (e. g. if improvement / change smaller then x let's just stop the algorithm)
5. Plot and print interesting staff, e. g. function value vs iteration number
6. Play around with $\alpha$ and the starting point to see how it affects convergence.


def f

In [38]:
import numpy as np
import plotly.express as px

In [39]:
def f(x1, x2):
    return x1**2 + 2*x2**2 + x1*x2 + 4*x1 - 2*x2 + 5

def grad_f(x1, x2):
    df_dx1 = 2*x1 + x2 + 4
    df_dx2 = 4*x2 + x1 - 2
    return np.array([df_dx1, df_dx2])

In [None]:
MAX_ITERATIONS = 10_000
STARTING_POINT = np.array([-50, 9])
TOLERANCE = 1e-6

step_size = 0.1

In [35]:
dir = -grad_f(STARTING_POINT[0], STARTING_POINT[1])

new_point = STARTING_POINT + step_size * np.array(dir)

In [36]:
new_point

array([37., 25.])

In [46]:
current_point = STARTING_POINT.copy()

func_vals = []

for i in range(MAX_ITERATIONS):
    direction = -grad_f(current_point[0], current_point[1])
    current_point = current_point + step_size * direction
    
    func_vals.append(f(current_point[0], current_point[1]))

    if i > 2:
        improvement = abs(func_vals[-1] - func_vals[-2])
        if improvement < TOLERANCE:
            print(f"Converged after {i} iterations.")
            break

Converged after 59 iterations.


In [28]:
current_point

array([-49.99130095,   9.00159925])

In [17]:
8 / 7

1.1428571428571428

In [47]:
fig = px.line(y=func_vals, 
              title="Function value over iterations", 
              labels={"y": "f(x1, x2)", "x": "Iteration"})
fig.show()