Suppose we want to minimize $f(x,y) = (x-1)^2 + (y-2)^2$ using gradient descent.  We might set up a two variable version of gradient descent like this:

In [2]:
import numpy as np

def grad_descent(f, fx, fy, x0, y0, alpha, tol=1e-6, ftol=1e-6):
    """
    Gradient descent algorithm for minimizing a function f(x, y).
    
    Parameters:
    f: callable
        The function to minimize.
    fx: callable
        Partial derivative of f with respect to x.
    fy: callable
        Partial derivative of f with respect to y.
    x0: float
        Initial value of x.
    y0: float
        Initial value of y.
    alpha: float
        Learning rate (step size).
    tol: float
        Tolerance for the gradient norm to stop the algorithm.
    ftol: float
        Tolerance for the change in function value to stop the algorithm.
    
    Returns:
    x, y: float
        The coordinates of the minimum point.
    fval: float
        The value of the function at the minimum point.
    """
    # Initialize x and y with the starting values x0 and y0
    x = x0
    y = y0
    # Compute the initial value of the function
    fval = f(x, y)
    
    while True:
        # Compute the partial derivatives (gradients) at the current point
        dfx = fx(x, y)
        dfy = fy(x, y)
        # Compute the norm of the gradient vector
        norm = np.sqrt(dfx**2 + dfy**2)
        
        # If the gradient norm is smaller than the tolerance, stop the algorithm
        if norm < tol:
            break
        
        # Update x and y by moving in the direction opposite to the gradient
        x -= alpha * dfx
        y -= alpha * dfy
        
        # Compute the new value of the function after the update
        fnew = f(x, y)
        
        # If the change in function value is smaller than the tolerance, stop the algorithm
        if abs(fnew - fval) < ftol:
            break
        
        # Update the function value for the next iteration
        fval = fnew
    
    # Return the coordinates of the minimum point and the function value at that point
    return x, y, fval


Let's see how it works:

In [2]:
f = lambda x, y: (x - 1)**2 + (y - 2)**2
fx = lambda x, y: 2 * (x - 1)
fy = lambda x, y: 2 * (y - 2)
x0 = 0
y0 = 0
alpha = 0.1
x, y, fval = grad_descent(f, fx, fy, x0, y0, alpha)
print(f"Minimum at x = {x}, y = {y}, f(x,y) = {fval}")

Minimum at x = 0.9994929397599087, y = 1.9989858795198174, f(x,y) = 2.0086725553238508e-06


This is fine, but it's going to get to be a mess if we want to extend to input variables.  To make it easier to adapt to higher dimensions we'll think of our function as having a vector, $\mathbf{x} = \langle x_1, x_2 \rangle$, as input.  So our function becomes $$f(\mathbf{x}) = (x_1-1)^2 + (x_2-2)^2.$$

If we want to take it a step further we could vectorize the output too.  Let $\mathbf{x}_0 = \langle 1, 2 \rangle$ then we can rewrite $$f(\mathbf{x}) = (\mathbf{x}-\mathbf{x}_0) \cdot (\mathbf{x}-\mathbf{x}_0)$$ using the dot product.  Using vectors to represent the output takes some adjustment and practice so we'll stick to vectorizing the input for now.

We can write our new version of $f$ in Python like this:

In [11]:
def f_vec(x):
    return (x[0] - 1)**2 + (x[1] - 2)**2

def grad_f_vec(x):
    return np.array([2 * (x[0] - 1), 2 * (x[1] - 2)])

x_in = np.array([0, 0])

print(f"Function value at x = {x_in}: {f_vec(x_in)}")
print(f"Gradient at x = {x_in}: {grad_f_vec(x_in)}")

Function value at x = [0 0]: 5
Gradient at x = [0 0]: [-2 -4]


Now, using the vectorized version of our function and its gradient we can write gradient descent like this

In [9]:
def grad_descent_vec(f, gradf, x0, alpha, tol=1e-6, ftol=1e-6):
    """
    Gradient descent algorithm for minimizing a function f(x).
    
    Parameters:
    f: callable
        The function to minimize.
    gradf: callable
        The gradient of the function f.
    x0: numpy.ndarray
        Initial guess for the minimum point.
    alpha: float
        Learning rate (step size).
    tol: float
        Tolerance for the gradient norm to stop the algorithm.
    ftol: float
        Tolerance for the change in function value to stop the algorithm.
    
    Returns:
    x: numpy.ndarray
        The coordinates of the minimum point.
    fval: float
        The value of the function at the minimum point.
    """
    # Initialize x with the starting value x0
    x = x0
    # Compute the initial value of the function
    fval = f(x)
    
    while True:
        # Compute the gradient of the function at the current point
        dfx = gradf(x)
        # Compute the norm (magnitude) of the gradient vector
        norm = np.linalg.norm(dfx)
        
        # If the gradient norm is smaller than the tolerance, stop the algorithm
        if norm < tol:
            break
        
        # Update x by moving in the direction opposite to the gradient
        # The step size is determined by the learning rate alpha
        x -= alpha * dfx
        
        # Compute the new value of the function after the update
        fnew = f(x)
        
        # If the change in function value is smaller than the tolerance, stop the algorithm
        if abs(fnew - fval) < ftol:
            break
        
        # Update the function value for the next iteration
        fval = fnew
    
    # Return the coordinates of the minimum point and the function value at that point
    return x, fval

Now we've made a tradeoff.  This new approach is more abstract, but it's also completely general.  We could use this same code to minimize a function with ANY number of input variables from one to thousands!

Let's see it action:

In [13]:
def f_vec(x):
    return (x[0] - 1)**2 + (x[1] - 2)**2

def grad_f_vec(x):
    return np.array([2 * (x[0] - 1), 2 * (x[1] - 2)])

x_start = np.array([0., 0.])

xmin, fmin = grad_descent_vec(f_vec, grad_f_vec, x_start, alpha=0.1)
print(f"Minimum at x = {xmin}, f(x) = {fmin}")

Minimum at x = [0.99949294 1.99898588], f(x) = 2.0086725553238508e-06


## Monte Carlo in Multiple Variables

In [14]:
def monte_carlo_minimize_iterative(f, lower_bounds, upper_bounds, max_iterations=1000):
    """
    Perform Monte Carlo minimization of a function f by generating one new point per iteration.

    Parameters:
    f: callable
        The function to minimize. It should take a numpy array as input.
    lower_bounds: numpy.ndarray
        An array specifying the lower bounds for each dimension.
    upper_bounds: numpy.ndarray
        An array specifying the upper bounds for each dimension.
    max_iterations: int
        The maximum number of iterations to perform.

    Returns:
    xmin: numpy.ndarray
        The coordinates of the point where f is minimized.
    fmin: float
        The minimum value of the function.
    """
    # Initialize variables to track the minimum
    fmin = float('inf')
    xmin = None
    
    # Perform iterative sampling
    for _ in range(max_iterations):
        # Generate a single random sample within the bounds
        sample = np.random.uniform(lower_bounds, upper_bounds)
        
        # Evaluate f at the sample point
        fval = f(sample)
        if fval < fmin:
            fmin = fval
            xmin = sample
    
    return xmin, fmin

Now let's minimize our function above in the domain $-1 \leq x \leq 3$ a $0 \leq x \leq 4$

In [15]:
def f_vec(x):
    return (x[0] - 1)**2 + (x[1] - 2)**2

lower_bounds = np.array([-1,0])
upper_bounds = np.array([3,4])
xmin, fmin = monte_carlo_minimize_iterative(f_vec, lower_bounds, upper_bounds)
print(f"Minimum at x = {xmin}, f(x) = {fmin}")

Minimum at x = [0.99548816 1.99658483], f(x) = 3.20200629351652e-05


In [16]:
x = np.arange(0,20)
print(x)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [18]:
x_every_other = x[::2]
print(x_every_other)

[ 0  2  4  6  8 10 12 14 16 18]


In [19]:
x_every_other_from_second = x[1::2]
print(x_every_other_from_second)

[ 1  3  5  7  9 11 13 15 17 19]
