# Task 3: Non-linear Conjugate Gradient Methods

This Jupyter Notebook is designed to explore advanced optimization techniques using the conjugate gradient method applied to two specific functions: the well-known Rosenbrock function and a custom-designed function. The aim is to compare the effectiveness of different conjugate gradient variants, namely Fletcher-Reeves and Polak-Ribiere, in finding global minima of these functions.


## Theoretical Background

### Target Functions


#### Rosenbrock Function

The Rosenbrock function is formulated as:
$$ f(x) = 100(x_2 - x_1^2)^2 + (1 - x_1)^2 $$

To find the global minimum analytically, we start by setting the gradient of the function to zero. The gradient components are:
$$ \frac{\partial f}{\partial x_1} = -400x_1(x_2 - x_1^2) - 2(1 - x_1) $$
$$ \frac{\partial f}{\partial x_2} = 200(x_2 - x_1^2) $$

Setting these derivatives to zero, we derive:

1. From $\frac{\partial f}{\partial x_2} = 0$, we find that $x_2 = x_1^2$.
2. Substituting $x_2 = x_1^2$ into $\frac{\partial f}{\partial x_1}$ and simplifying, we get:
   $$ -400x_1(x_1^2 - x_1^2) - 2(1 - x_1) = 0 $$
   $$ -2(1 - x_1) = 0 $$
   Leading to $x_1 = 1$.

Given $x_1 = 1$, and substituting back, we find $x_2 = 1^2 = 1$.

Therefore, the global minimum is at $(x_1, x_2) = (1, 1)$ where $f(x) = 0$, lying at the bottom of a long, narrow, parabolic-shaped valley.


In [10]:
import numpy as np

def safe_rosenbrock(x):
    """Calculate the Rosenbrock function value at x."""
    # Clipping x values to prevent excessive values
    x = np.clip(x, -10, 10)
    return 100 * (x[1] - x[0]**2)**2 + (1 - x[0])**2


In [11]:
import numpy as np

def safe_grad_rosenbrock(x):
    """Calculate the gradient of the Rosenbrock function at x."""
    x = np.clip(x, -10, 10)
    grad = np.zeros(2)
    grad[0] = -400 * x[0] * (x[1] - x[0]**2) - 2 * (1 - x[0])
    grad[1] = 200 * (x[1] - x[0]**2)
    return grad


In [56]:
def hessian_rosenbrock(x):
    """ Calculate the Hessian matrix of the Rosenbrock function at x. """
    x1, x2 = x
    return np.array([[1200 * x1**2 - 400 * x2 + 2, -400 * x1],
                     [-400 * x1, 200]])


#### Custom Function

The custom function is defined as:
$$ f(x) = 150(x_1 x_2)^2 + (0.5 x_1 + 2 x_2 - 2)^2 $$

To analyze the global minimum, we set the gradient to zero. The gradient components are:
$$ \frac{\partial f}{\partial x_1} = 300 x_1 x_2^2 + (0.5 x_1 + 2 x_2 - 2) $$
$$ \frac{\partial f}{\partial x_2} = 300 x_1^2 x_2 + 4(0.5 x_1 + 2 x_2 - 2) $$

Setting these derivatives to zero, we simplify and solve:

1. From $\frac{\partial f}{\partial x_1} = 0$ and $\frac{\partial f}{\partial x_2} = 0$, assuming $x_1 = 0$, we find:
   $$ 2 x_2 - 2 = 0 $$
   Leading to $x_2 = 1$.
2. Substituting $x_1 = 0$ into $\frac{\partial f}{\partial x_2}$ yields:
   $$ 4(2 x_2 - 2) = 0 $$
   Confirms that $x_2 = 1$.

Therefore, the point $(x_1, x_2) = (0, 1)$ represents a critical point. Further analysis, potentially including second derivative tests or numerical verification, would confirm its nature as a global minimum.


To determine if there is another point where $f(x_1, x_2) = 0$ for the custom function:
$$ f(x) = 150(x_1 x_2)^2 + (0.5 x_1 + 2 x_2 - 2)^2 $$

we need to find conditions under which both terms in the function evaluate to zero because this is the only way their sum can be zero. Here's the breakdown:

**Analyzing Each Term for Zero:**

1. **The first term $150(x_1 x_2)^2$ equals zero when:**
   - $x_1 = 0$ or
   - $x_2 = 0$

2. **The second term $(0.5 x_1 + 2 x_2 - 2)^2$ equals zero when:**
   - $0.5 x_1 + 2 x_2 - 2 = 0$

**Solving for Conditions:**

Given the conditions from the second term, we rearrange the equation:
$$ 0.5 x_1 + 2 x_2 = 2 $$
$$ x_2 = 1 - 0.25 x_1 $$

Now, substitute this expression for $x_2$ into the condition from the first term:
$$ x_1 \cdot (1 - 0.25 x_1) = 0 $$

This equation is satisfied when $x_1 = 0$ or $x_1 = 4$. Let's explore both:

- **If $x_1 = 0$**:
  - Substituting $x_1 = 0$ in $x_2 = 1 - 0.25 \cdot 0$:
  - $x_2 = 1$
  - This point $(0, 1)$ was already identified as a global minimizer where $f(x) = 0$.

- **If $x_1 = 4$**:
  - Substituting $x_1 = 4$ in $x_2 = 1 - 0.25 \cdot 4$:
  - $x_2 = 0$
  - This results in the point $(4, 0)$, and checking $f(4, 0)$:
  - $f(4, 0) = 150(4 \cdot 0)^2 + (0.5 \cdot 4 + 2 \cdot 0 - 2)^2 = 0$
  - This shows $f(4, 0)$ is zero, too, constituting a second global minimizer candidate.



To verify whether $(4, 0)$ is also a minimum, we evaluate the gradient components at this point.

At $(4, 0)$ we get:

$$ \frac{\partial f}{\partial x_1} = 300 \cdot 4 \cdot 0^2 + 0.5(0.5 \cdot 4 + 2 \cdot 0 - 2) = 0 $$
$$ \frac{\partial f}{\partial x_2} = 300 \cdot 4^2 \cdot 0 + 2(0.5 \cdot 4 + 2 \cdot 0 - 2) = 0 $$

Both derivatives are zero, confirming that $(4, 0)$ is a critical point. As with $(0,1)$, further analysis would confirm its nature as a minimum.


In [12]:
import numpy as np

def custom_function(x):
    """Calculate the custom function value at x."""
    return 150 * (x[0] * x[1])**2 + (0.5 * x[0] + 2 * x[1] - 2)**2


In [13]:
import numpy as np

def grad_custom_function(x):
    """Calculate the gradient of the custom function at x."""
    grad = np.zeros(2)
    grad[0] = 300 * x[0] * x[1]**2 + 2 * (0.5 * x[0] + 2 * x[1] - 2) * 0.5
    grad[1] = 300 * x[0]**2 * x[1] + 2 * (0.5 * x[0] + 2 * x[1] - 2) * 2
    return grad


In [53]:
def hessian_custom_function(x):
    """Calculate the Hessian matrix of the custom function at x. """
    x1, x2 = x
    return np.array([[300 * x2**2 + 0.5, 600 * x1 * x2 + 2],
                     [600 * x1 * x2 + 2, 300 * x1**2 + 4]])

## Methodological Approach

In this section, we describe the optimization algorithms employed in this study: the Fletcher-Reeves (FR) and Polak-Ribiere (PR) conjugate gradient methods. These methods are variants of the conjugate gradient technique, a popular approach for solving nonlinear optimization problems without constraints. The choice of these methods is motivated by their efficiency in handling large-scale problems and their robustness in navigating complex function landscapes.


### Line Search Methods

A crucial component in optimization algorithms that use gradient information is the line search technique. The line search aims to find an acceptable step size that satisfies certain conditions, improving the convergence of the method.

#### Backtracking Line Search

Backtracking line search is a type of adaptive step size technique used to find a step size that meets the Armijo condition, a fundamental criterion in ensuring sufficient decrease in the function value. This method starts with an initial guess for the step size and iteratively scales it down until the Armijo condition is satisfied.


In [None]:
def line_search(func, grad_func, x, d, alpha_init=1.0, rho=0.9, c=1e-4, max_iter=50):
    """
    Conducts a backtracking line search to find the step size that satisfies the Armijo condition.
    
    This line search method reduces the step size alpha iteratively until a decrease in the function
    value satisfies the Armijo condition, which ensures sufficient decrease.

    Args:
        func (callable): The objective function to minimize. It should take a single numpy array argument.
        grad_func (callable): The gradient of the objective function. It should take a single numpy array argument.
        x (np.array): The current point in the parameter space where the function is evaluated.
        d (np.array): The current search direction along which the line search is performed.
        alpha_init (float): The initial step size for the line search.
        rho (float): The factor by which the step size is reduced in each iteration (0 < rho < 1).
        c (float): The Armijo constant used in the sufficient decrease condition (0 < c < 1).
        max_iter (int): The maximum number of iterations to perform if the Armijo condition is not met.

    Returns:
        tuple: A tuple containing:
            - alpha (float): The step size that satisfies the Armijo condition or the step size at the end of max_iter iterations.
            - log (list of dicts): A log of each iteration's details including the iteration number, alpha value, function value, and target Armijo condition value.
    """
    alpha = alpha_init
    log = []
    iteration = 0
    while True:
        f_current = func(x)
        grad_current = grad_func(x)
        f_test = func(x + alpha * d)
        armijo_condition = f_current + c * alpha * np.dot(grad_current, d)
        log.append({
            'iteration': iteration + 1,
            'alpha': alpha,
            'function_value': f_test,
            'target_value': armijo_condition
        })
#        if f_test <= armijo_condition or iteration >= max_iter:
#            break
        if f_test <= armijo_condition:
            break
        alpha *= rho
        iteration += 1
    return alpha, log


### Optimization Algorithms

With the line search method defined, we can now incorporate it into the conjugate gradient algorithm, an efficient method for solving large-scale optimization problems. The conjugate gradient method uses line search to determine the optimal step size in each iteration, enhancing the algorithm's overall effectiveness.

Conjugate gradient methods are iterative techniques that build a sequence of conjugate directions, along which the function is minimized. The general approach involves computing a search direction that is a linear combination of the steepest descent direction and the previous search direction.

#### Fletcher-Reeves Method (FR)
The FR method updates the search direction using:
$$ \beta_{k+1}^{FR} = \frac{\|\nabla f(x_{k+1})\|^2}{\|\nabla f(x_k)\|^2} $$
where \( \nabla f(x_k) \) is the gradient of the function at step \( k \).

#### Polak-Ribiere Method (PR)
The PR method enhances the FR update by incorporating the gradient change:
$$ \beta_{k+1}^{PR} = \frac{\nabla f(x_{k+1})^T (\nabla f(x_{k+1}) - \nabla f(x_k))}{\|\nabla f(x_k)\|^2} $$
This modification can potentially lead to faster convergence by adjusting the direction based on the gradient's latest changes.


In [15]:
import numpy as np

def conjugate_gradient_method(func, grad_func, x0, known_solution, method='FR', max_iter=10000, tol=1e-6, alpha_init=1.0, rho=0.9, c=1e-4, ls_max_iter=50):
    """
    Implements the conjugate gradient optimization algorithm using either Fletcher-Reeves or Polak-Ribiere update rules, with backtracking line search.

    Args:
        func (callable): The objective function to minimize.
        grad_func (callable): The gradient of the objective function.
        x0 (np.array): Initial guess for the parameters.
        known_solution (np.array): The known solution or global minimum for the function, used for calculating distance.
        method (str): 'FR' for Fletcher-Reeves or 'PR' for Polak-Ribiere.
        max_iter (int): Maximum number of iterations before termination.
        tol (float): Tolerance for convergence, based on the norm of the gradient.
        alpha_init (float): Initial step size for the line search.
        rho (float): Contraction factor in the line search, typically between 0.1 and 0.9.
        c (float): The Armijo rule constant in the line search.
        ls_max_iter (int): Maximum number of iterations allowed in the line search.

    Returns:
        tuple: A tuple containing:
               - Final parameter values (np.array)
               - Number of iterations performed (int)
               - Detailed log of the optimization process (list of dicts)
               - Final gradient norm (float)
               - Distance to the known solution (float)
    """
    x = x0
    g = grad_func(x)
    d = -g
    overall_log = []
    for i in range(max_iter):
        alpha, ls_log = line_search(func, grad_func, x, d, alpha_init, rho, c, ls_max_iter)
        x_new = x + alpha * d
        g_new = grad_func(x_new)
        
        beta = np.dot(g_new, g_new) / np.dot(g, g) if method == 'FR' else np.dot(g_new, g_new - g) / np.dot(g, g)
        
        d = -g_new + beta * d
        if np.linalg.norm(g_new) < tol:
            break
        x = x_new
        g = g_new
        
        overall_log.append({
            'iteration': i + 1,
            'x': x.copy(),
            'gradient_norm': np.linalg.norm(g),
            'alpha': alpha,
            'beta': beta,
            'function_value': func(x),
            'line_search_log': ls_log
        })

    final_gradient_norm = np.linalg.norm(g_new)
    distance_to_solution = np.linalg.norm(x - known_solution)

    return x, i, overall_log, final_gradient_norm, distance_to_solution


### Gradient and Hessian Approximation

#### Gradient Approximation Using Central Difference

In [32]:
import numpy as np

def finite_difference_gradient(func, x, h=1e-5):
    """
    Approximates the gradient of a function at a given point using the central difference formula.
    
    Args:
        func (callable): The function for which the gradient is to be approximated.
        x (np.array): The point at which the gradient is to be approximated.
        h (float): The step size for the finite difference approximation.

    Returns:
        np.array: The approximated gradient as a numpy array.
    """
    n = len(x)
    grad = np.zeros(n)
    for i in range(n):
        x_plus = np.array(x, dtype=float)
        x_minus = np.array(x, dtype=float)
        x_plus[i] += h  # Increment x[i] by h
        x_minus[i] -= h  # Decrement x[i] by h
        grad[i] = (func(x_plus) - func(x_minus)) / (2 * h)  # Central difference for derivative
    return grad


#### Hessian Approximation Using Finite Differences

In [33]:
def finite_difference_hessian(func, x, h=1e-4):
    """
    Approximates the Hessian matrix of a function at a given point using finite differences.
    
    Args:
        func (callable): The function for which the Hessian is to be approximated.
        x (np.array): The point at which the Hessian is to be approximated.
        h (float): The step size for the finite difference approximation.

    Returns:
        np.array: The approximated Hessian matrix as a 2D numpy array.
    """
    n = len(x)
    hessian = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            x_ij = np.array(x, dtype=float)
            if i == j:
                # Diagonal entries
                x_ij[i] += h
                f_plus = func(x_ij)
                x_ij[i] = x[i] - h
                f_minus = func(x_ij)
                hessian[i, j] = (f_plus - 2 * func(x) + f_minus) / h**2
            else:
                # Off-diagonal entries, use central difference for mixed partial derivatives
                # f(x + h*ei + h*ej) - f(x + h*ei - h*ej) - f(x - h*ei + h*ej) + f(x - h*ei - h*ej)
                x_plus_plus = np.array(x, dtype=float)
                x_plus_minus = np.array(x, dtype=float)
                x_minus_plus = np.array(x, dtype=float)
                x_minus_minus = np.array(x, dtype=float)
                
                x_plus_plus[i] += h
                x_plus_plus[j] += h
                x_plus_minus[i] += h
                x_plus_minus[j] -= h
                x_minus_plus[i] -= h
                x_minus_plus[j] += h
                x_minus_minus[i] -= h
                x_minus_minus[j] -= h
                
                f_plus_plus = func(x_plus_plus)
                f_plus_minus = func(x_plus_minus)
                f_minus_plus = func(x_minus_plus)
                f_minus_minus = func(x_minus_minus)
                
                hessian[i, j] = (f_plus_plus - f_plus_minus - f_minus_plus + f_minus_minus) / (4 * h**2)
    return hessian


In [54]:
# Map each function to its corresponding gradient function
grad_func_map = {
    custom_function: grad_custom_function,
    safe_rosenbrock: safe_grad_rosenbrock,
}


In [57]:
# Example of defining a global map for Hessian functions
hessian_func_map = {
    custom_function: hessian_custom_function,  # Example: define `analytical_hessian_custom` appropriately
    safe_rosenbrock: hessian_rosenbrock  # Define this function if needed
}


In [48]:
def get_gradient_func(func, use_numerical_gradient=False):
    """
    Retrieves the appropriate gradient function for a given objective function, allowing for the option
    to use either the analytical or numerical gradient.

    Args:
        func (callable): The objective function for which the gradient is required. This function
                         should take a single numpy array as input and return a single value.
        use_numerical_gradient (bool): A flag indicating whether to use a numerical gradient approximation.
                                       If True, the gradient will be estimated using finite differences;
                                       otherwise, the pre-defined analytical gradient will be used.

    Returns:
        callable: A function that computes the gradient of the objective function. This returned function
                  takes a numpy array as input and returns a numpy array representing the gradient at that point.
                  
    Examples:
        # Define an objective function
        def my_function(x):
            return x[0]**2 + x[1]**2 + 3*x[0]*x[1]

        # Get the analytical gradient function
        analytical_grad = get_gradient_func(my_function, use_numerical_gradient=False)
        
        # Get the numerical gradient function
        numerical_grad = get_gradient_func(my_function, use_numerical_gradient=True)
        
        # Compute gradients
        x = np.array([1.0, 2.0])
        print("Analytical Gradient:", analytical_grad(x))
        print("Numerical Gradient:", numerical_grad(x))
    """
    if use_numerical_gradient:
        # Use the numerical gradient approximation via finite differences
        return lambda x: finite_difference_gradient(func, x)
    else:
        # Retrieve the analytical gradient from a predefined map
        try:
            return grad_func_map[func]
        except KeyError:
            raise ValueError("Analytical gradient function not defined for the provided function.")


In [49]:
def get_hessian_func(func, use_numerical_hessian=False, use_quasi_newton=False):
    """
    Retrieves the appropriate Hessian function based on the configuration.

    Args:
        func (callable): The function for which the Hessian is needed.
        use_numerical_hessian (bool): If True, use a numerical method to approximate the Hessian.
        use_quasi_newton (bool): If True, use a quasi-Newton method for the Hessian approximation.

    Returns:
        callable: A function that computes the Hessian matrix for the given function.
    """
    if use_numerical_hessian:
        # Return a lambda function that calculates the numerical Hessian
        return lambda x: finite_difference_hessian(func, x)
    elif use_quasi_newton:
        # If implementing a specific quasi-Newton method like BFGS that approximates the Hessian
        # Normally BFGS would update its Hessian approximation internally, so this might be managed differently
        return None  # Placeholder for quasi-Newton Hessian management
    else:
        # Return the analytical Hessian function mapped to `func`
        return hessian_func_map[func]  # Ensure this map is defined somewhere globally



## Experiment Configuration

We will test the optimization algorithms on two benchmark functions with the following initial points:

- Rosenbrock function: Points `[-1.2, 1]`, `[0, 0]`, and `[2, 2]`.
- Custom function: Points `[0.5, -1]`, `[2, 2]`, and `[-1, -1]`.

Each test will be run with a convergence tolerance of \(1e-6\) and a maximum of 500000 iterations, allowing us to assess the efficiency and effectiveness of each method under various conditions.


In [16]:
# Define initial points for Rosenbrock function
initial_points_rosenbrock = [np.array([1.2, 1.2]), np.array([-1.2, 1.0]), np.array([0.2, 0.8])]

# Define initial points for the custom function
initial_points_custom = [np.array([-0.2, 1.2]), np.array([3.8, 0.1]), np.array([1.9, 0.6])]


In [17]:
import pandas as pd

def run_optimization_tests(method, func, grad, initial_points, known_solution):
    """
    Runs optimization tests for a given function using the specified conjugate gradient method, and compiles the results into a pandas DataFrame.

    Args:
        method (str): Specifies the conjugate gradient method variant ('FR' for Fletcher-Reeves or 'PR' for Polak-Ribiere).
        func (callable): The objective function to minimize.
        grad (callable): The gradient of the objective function.
        initial_points (list of np.array): List of initial points for the optimization.
        known_solution (np.array): The known global minimum of the function, used for calculating the distance to the solution.

    Returns:
        pandas.DataFrame: A DataFrame containing the results from each optimization run, including the starting point, final iterate, number of iterations, final gradient norm, and the distance to the known solution.
    """
    results = []  # Prepare a list to store result dictionaries
    for x0 in initial_points:
        _, num_iterations, overall_log, final_gradient_norm, distance_to_solution = conjugate_gradient_method(
            func, grad, x0, known_solution, method=method, max_iter=700000, tol=1e-6, alpha_init=1.0, rho=0.9, c=1e-4, ls_max_iter=50
        )
        # Append a dictionary of results for this starting point to the results list
        results.append({
            'Starting Point': np.array_str(x0),
            'Final Iterate': np.array_str(overall_log[-1]['x']),
            'Number of Iterations': num_iterations,
            'Final Gradient Norm': final_gradient_norm,
            'Distance to Solution': distance_to_solution
        })
    
    # Convert the list of dictionaries to a DataFrame and return
    return pd.DataFrame(results)


## Optimization Results

Below are the detailed results from each optimization run:


In [18]:
# Assume initial_points_rosenbrock and initial_points_custom are already defined
rosenbrock_solution = np.array([1, 1])
custom_solution = np.array([4, 0])

# Run tests and display results
results_df_rosenbrock_fr = run_optimization_tests('FR', safe_rosenbrock, safe_grad_rosenbrock, initial_points_rosenbrock, rosenbrock_solution)
results_df_custom_fr = run_optimization_tests('FR', custom_function, grad_custom_function, initial_points_custom, custom_solution)
results_df_rosenbrock_pr = run_optimization_tests('PR', safe_rosenbrock, safe_grad_rosenbrock, initial_points_rosenbrock, rosenbrock_solution)
results_df_custom_pr = run_optimization_tests('PR', custom_function, grad_custom_function, initial_points_custom, custom_solution)

print("Results for Rosenbrock Function (Fletcher-Reeves):")
display(results_df_rosenbrock_fr)  # Use display for nicer output in Jupyter Notebook

print("Results for Custom Function (Fletcher-Reeves):")
display(results_df_custom_fr)

print("Results for Rosenbrock Function (Polak-Ribiere):")
display(results_df_rosenbrock_pr)

print("Results for Custom Function (Polak-Ribiere):")
display(results_df_custom_pr)


Results for Rosenbrock Function (Fletcher-Reeves):


Unnamed: 0,Starting Point,Final Iterate,Number of Iterations,Final Gradient Norm,Distance to Solution
0,[1.2 1.2],[0.99999999 0.99999998],358,9.293173e-07,1.826919e-08
1,[-1.2 1. ],[0.99999998 0.99999996],430,9.637434e-07,3.922507e-08
2,[0.2 0.8],[1.0000009 1.0000018],1050,8.450254e-07,2.017737e-06


Results for Custom Function (Fletcher-Reeves):


Unnamed: 0,Starting Point,Final Iterate,Number of Iterations,Final Gradient Norm,Distance to Solution
0,[-0.2 1.2],[8.31564295e-09 9.99999937e-01],671,8.522154e-07,4.123106
1,[3.8 0.1],[3.99999998e+00 2.33822998e-10],417,9.583673e-07,1.880843e-08
2,[1.9 0.6],[4.00000003e+00 1.97289911e-10],426,8.698669e-07,3.473033e-08


Results for Rosenbrock Function (Polak-Ribiere):


Unnamed: 0,Starting Point,Final Iterate,Number of Iterations,Final Gradient Norm,Distance to Solution
0,[1.2 1.2],[1.00000001 1.00000002],42060,9.999805e-07,1.73191e-08
1,[-1.2 1. ],[1.00000001 1.00000002],48743,9.99948e-07,1.904753e-08
2,[0.2 0.8],[1.00000001 1.00000002],59868,9.208781e-07,1.75737e-08


Results for Custom Function (Polak-Ribiere):


Unnamed: 0,Starting Point,Final Iterate,Number of Iterations,Final Gradient Norm,Distance to Solution
0,[-0.2 1.2],[3.99999939e+00 4.53869805e-10],136050,9.930581e-07,6.068122e-07
1,[3.8 0.1],[3.99999939e+00 5.15545954e-11],123414,9.968647e-07,6.098958e-07
2,[1.9 0.6],[3.99999939e+00 4.55919477e-10],200585,9.975122e-07,6.095695e-07


For running and tuning each test individually, we'll augment the testing framework a little to gain more flexibility for:

* retrieving numerical or non-numerical gradient functions
* determine the non-numerical gradient function for a function given

In [44]:
import numpy as np
import pandas as pd
import time

def run_test(func, grad_func, x0, known_solution, method='FR', max_iter=500000, alpha_init=1.0, rho=0.9, c=0.01):
    """
    Runs a single test using the conjugate gradient method and captures key metrics.

    Args:
        func (callable): The function to minimize.
        grad_func (callable): The gradient of the function.
        x0 (np.array): Initial point for the optimization.
        known_solution (np.array): Known global minimum of the function.
        method (str): Optimization method to use ('FR' or 'PR').
        max_iter (int): Maximum number of iterations.
        alpha_init (float): Initial step size for line search.
        rho (float): Reduction factor for line search step size.
        c (float): Armijo constant in line search.

    Returns:
        pd.DataFrame: DataFrame containing the results of the test.
    """
    start = time.time()
    x_min, num_iters, logs, final_gradient_norm, distance_to_solution = conjugate_gradient_method(
        func, grad_func, x0, known_solution, method=method, max_iter=max_iter, alpha_init=alpha_init, rho=rho, c=c
    )
    stop = time.time()

    results = {
        'Starting Point': np.array_str(x0),
        'Known Solution': np.array_str(known_solution),
        'Calculated Solution': np.array_str(logs[-1]['x']),
        'Distance to Solution': distance_to_solution,
        'Final Gradient Norm': final_gradient_norm,
        'Number of Iterations': num_iters,
        'Execution Time (s)': stop - start
    }
    
    return pd.DataFrame([results])

# Example usage:
#x0_rosenbrock = np.array([1.2, 1.2])
#known_rosenbrock = np.array([1, 1])

#df_rosenbrock = run_test(
#    safe_rosenbrock, safe_grad_rosenbrock, x0_rosenbrock, known_rosenbrock, method='FR'
#)
#display(df_rosenbrock)


In [47]:
def run_test_wrap(func, x0, known_solution, method='FR', use_numerical_gradient=False, max_iter=500000, alpha_init=1.0, rho=0.9, c=0.01):
    grad_func = get_gradient_func(func, use_numerical_gradient)
    return run_test(func, grad_func, x0, known_solution, method, max_iter, alpha_init, rho, c)

# Example usage
x0_custom = np.array([-0.2, 1.2])
known_custom = np.array([4.0, 0.0])  # Assuming this is the coordinate of the known solution

# Running test with analytical gradient
df_custom = run_test_wrap(
    custom_function, x0_custom, known_custom, method='PR', use_numerical_gradient=False
)
display(df_custom)

# Running test with numerical gradient
df_custom_numerical = run_test_wrap(
    custom_function, x0_custom, known_custom, method='PR', use_numerical_gradient=True
)
display(df_custom_numerical)


Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[-0.2 1.2],[4. 0.],[3.99999936e+00 6.41808346e-11],6.380448e-07,9.971666e-07,135564,230.092147


Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[-0.2 1.2],[4. 0.],[3.99999936e+00 6.41841640e-11],6.380448e-07,9.971526e-07,135564,369.152316


Fletcher-Reevers:

In [20]:
# Rosenbrock with Fletcher-Reevers at (1.2, 1.2):
x0_rosenbrock = np.array([1.2, 1.2])
known_rosenbrock = np.array([1, 1])

df_rosenbrock = run_test(
    safe_rosenbrock, safe_grad_rosenbrock, x0_rosenbrock, known_rosenbrock, method='FR'
)
display(df_rosenbrock)


Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[1.2 1.2],[1 1],[1.00000014 1.00000027],3.061061e-07,9.513325e-07,267,0.794757


In [21]:
# Rosenbrock with Fletcher-Reevers at (-1.2, 1.0):
x0_rosenbrock = np.array([-1.2, 1.0])
known_rosenbrock = np.array([1.0, 1.0])

df_rosenbrock = run_test(
    safe_rosenbrock, safe_grad_rosenbrock, x0_rosenbrock, known_rosenbrock, method='FR'
)
display(df_rosenbrock)


Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[-1.2 1. ],[1. 1.],[1.0000004 1.00000081],9.0426e-07,7.941245e-07,505,1.580579


In [22]:
# Rosenbrock with Fletcher-Reevers at (0.2, 0.8):
x0_rosenbrock = np.array([0.2, 0.8])
known_rosenbrock = np.array([1.0, 1.0])

df_rosenbrock = run_test(
    safe_rosenbrock, safe_grad_rosenbrock, x0_rosenbrock, known_rosenbrock, method='FR'
)
display(df_rosenbrock)


Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[0.2 0.8],[1. 1.],[0.99999986 0.99999973],3.050252e-07,8.214121e-07,551,2.379362


In [23]:
# Custom function with Fletcher-Reevers at (-0.2, 1.2)
x0_custom = np.array([-0.2, 1.2])
known_custom = np.array([0.0, 1.0])

df_custom = run_test(
    custom_function, grad_custom_function, x0_custom, known_custom, method='FR'
)
display(df_custom)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[-0.2 1.2],[0. 1.],[1.66189452e-08 9.99999941e-01],6.115246e-08,9.070521e-07,420,0.251913


In [24]:
# Custom function with Fletcher-Reevers at (3.8, 0.1)
x0_custom = np.array([3.8, 0.1])
known_custom = np.array([4.0, 0.0])

df_custom = run_test(
    custom_function, grad_custom_function, x0_custom, known_custom, method='FR'
)
display(df_custom)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[3.8 0.1],[4. 0.],[ 4.0000003e+00 -3.9505335e-10],3.025209e-07,9.559757e-07,457,0.323323


In [25]:
# Custom function with Fletcher-Reevers at (1.9, 0.6)
x0_custom = np.array([1.9, 0.6])
known_custom = np.array([0.0, 1.0])

df_custom = run_test(
    custom_function, grad_custom_function, x0_custom, known_custom, method='FR'
)
display(df_custom)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[1.9 0.6],[0. 1.],[3.81287695e-09 1.00000011e+00],1.124647e-07,9.290424e-07,583,0.29911


Polak-Ribiere:

In [26]:
# Rosenbrock with Polak-Ribiere at (1.2, 1.2):
x0_rosenbrock = np.array([1.2, 1.2])
known_rosenbrock = np.array([1.0, 1.0])

df_rosenbrock = run_test(
    safe_rosenbrock, safe_grad_rosenbrock, x0_rosenbrock, known_rosenbrock, method='PR'
)
display(df_rosenbrock)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[1.2 1.2],[1. 1.],[1.00000013 1.00000027],2.956403e-07,9.500237e-07,34946,337.684152


In [27]:
# Rosenbrock with Polak-Ribiere at (-1.2, 1.0):
x0_rosenbrock = np.array([-1.2, 1.0])
known_rosenbrock = np.array([1.0, 1.0])

df_rosenbrock = run_test(
    safe_rosenbrock, safe_grad_rosenbrock, x0_rosenbrock, known_rosenbrock, method='PR'
)
display(df_rosenbrock)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[-1.2 1. ],[1. 1.],[1.00000012 1.00000024],2.707479e-07,8.703475e-07,42093,465.110381


In [28]:
# Rosenbrock with Polak-Ribiere at (0.2, 0.8):
x0_rosenbrock = np.array([0.2, 0.8])
known_rosenbrock = np.array([1.0, 1.0])

df_rosenbrock = run_test(
    safe_rosenbrock, safe_grad_rosenbrock, x0_rosenbrock, known_rosenbrock, method='PR'
)
display(df_rosenbrock)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[0.2 0.8],[1. 1.],[1.00000012 1.00000024],2.731361e-07,8.776023e-07,52998,427.267237


In [29]:
# Custom function with Polak-Ribiere at (-0.2, 1.2)
x0_custom = np.array([-0.2, 1.2])
known_custom = np.array([4.0, 0.0])

df_custom = run_test(
    custom_function, grad_custom_function, x0_custom, known_custom, method='PR'
)
display(df_custom)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[-0.2 1.2],[4. 0.],[3.99999936e+00 6.41808346e-11],6.380448e-07,9.971666e-07,135564,406.062963


In [30]:
# Custom function with Polak-Ribiere at (3.8, 0.1)
x0_custom = np.array([3.8, 0.1])
known_custom = np.array([4.0, 0.0])

df_custom = run_test(
    custom_function, grad_custom_function, x0_custom, known_custom, method='PR'
)
display(df_custom)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[3.8 0.1],[4. 0.],[3.99999936e+00 4.66040643e-10],6.368097e-07,9.963518e-07,122524,228.600885


In [31]:
# Custom function with Polak-Ribiere at (1.9, 0.6)
x0_custom = np.array([1.9, 0.6])
known_custom = np.array([4.0, 0.0])

df_custom = run_test(
    custom_function, grad_custom_function, x0_custom, known_custom, method='PR'
)
display(df_custom)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[1.9 0.6],[4. 0.],[ 4.00000064e+00 -6.45558086e-11],6.37491e-07,9.943869e-07,169673,379.543619


With derivative approximation:

Fletcher-Reevers:

In [58]:
x0_custom = np.array([-0.2, 1.2])
known_custom = np.array([4.0, 0.0])  # Assuming this is the coordinate of the known solution

# Running test with analytical gradient
df_custom = run_test_wrap(
    custom_function, x0_custom, known_custom, method='FR', use_numerical_gradient=True
)
display(df_custom)


Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[-0.2 1.2],[4. 0.],[1.33944272e-08 9.99999924e-01],4.123106,8.74135e-07,271,0.180095


Polak-Ribiere:

Rosenbrock function:

In [60]:
# Rosenbrock with Polak-Ribiere at (1.2, 1.2), approximated gradient:
x0_rosenbrock = np.array([1.2, 1.2])
known_rosenbrock = np.array([1.0, 1.0])

df_rosenbrock = run_test_wrap(
    safe_rosenbrock, x0_rosenbrock, known_rosenbrock, method='PR',
     use_numerical_gradient=True
)
display(df_rosenbrock)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[1.2 1.2],[1. 1.],[1.00000024 1.00000047],5.308174e-07,8.674522e-07,33276,516.201523


In [61]:
# Rosenbrock with Polak-Ribiere at (-1.2, 1.0), approximated gradient:
x0_rosenbrock = np.array([-1.2, 1.0])
known_rosenbrock = np.array([1.0, 1.0])

df_rosenbrock = run_test_wrap(
    safe_rosenbrock, x0_rosenbrock, known_rosenbrock, method='PR',
     use_numerical_gradient=True
)
display(df_rosenbrock)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[-1.2 1. ],[1. 1.],[1.00000024 1.00000048],5.357696e-07,9.998391e-07,40181,569.576135


In [62]:
# Rosenbrock with Polak-Ribiere at (0.2, 0.8), approximated gradient:
x0_rosenbrock = np.array([0.2, 0.8])
known_rosenbrock = np.array([1.0, 1.0])

df_rosenbrock = run_test_wrap(
    safe_rosenbrock, x0_rosenbrock, known_rosenbrock, method='PR',
     use_numerical_gradient=True
)
display(df_rosenbrock)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[0.2 0.8],[1. 1.],[1.00000023 1.00000046],5.144113e-07,9.999897e-07,51202,1099.009333


Custom function:

In [63]:
# Custom function with Polak-Ribiere at (-0.2, 1.2), approximated gradient:
x0_custom = np.array([-0.2, 1.2])
known_custom = np.array([4.0, 0.0])  # Assuming this is the coordinate of the known solution

df_custom = run_test_wrap(
    custom_function, x0_custom, known_custom, method='PR', use_numerical_gradient=True
)
display(df_custom)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[-0.2 1.2],[4. 0.],[3.99999936e+00 6.41841640e-11],6.380448e-07,9.971526e-07,135564,382.979786


In [64]:
# Custom function with Polak-Ribiere at (3.8, 0.1), approximated gradient:
x0_custom = np.array([3.8, 0.1])
known_custom = np.array([4.0, 0.0])

df_custom = run_test_wrap(
    custom_function, x0_custom, known_custom, method='PR', use_numerical_gradient=True
)
display(df_custom)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[3.8 0.1],[4. 0.],[3.99999936e+00 4.66041725e-10],6.368097e-07,9.963562e-07,122524,353.572155


In [65]:
# Custom function with Polak-Ribiere at (1.9, 0.6)
x0_custom = np.array([1.9, 0.6])
known_custom = np.array([4.0, 0.0])

df_custom = run_test_wrap(
    custom_function, x0_custom, known_custom, method='PR', use_numerical_gradient=True
)
display(df_custom)

Unnamed: 0,Starting Point,Known Solution,Calculated Solution,Distance to Solution,Final Gradient Norm,Number of Iterations,Execution Time (s)
0,[1.9 0.6],[4. 0.],[ 4.00000064e+00 -6.45575857e-11],6.37491e-07,9.943793e-07,169673,537.617304


### Results Analysis

The results from the optimization tests on both the Rosenbrock and Custom functions illustrate key differences in the convergence behavior of the Fletcher-Reeves and Polak-Ribiere methods. For example, the Fletcher-Reeves method showed faster convergence on the Rosenbrock function from closer initial points to the global minimum, while Polak-Ribiere was more robust to poor initial conditions on the Custom function.

#### Observations:
- **Rosenbrock Function**: Detailed observations about each starting point and method's performance.
- **Custom Function**: Discussion on the resilience of the methods against complex landscapes.

These findings suggest that the choice of method can significantly impact the efficiency and success of optimization, especially in complex or ill-conditioned problem spaces.
