# Multivariate Unconstrained Optimization with Python

## Introduction

This notebook explores various numerical methods available in Python's `scipy.optimize` library to solve unconstrained optimization problems. We will use the challenging **Rosenbrock function** as a benchmark to test and compare the performance of these different optimization algorithms, especially in high-dimensional scenarios.

### The Rosenbrock Function

The Rosenbrock function is a classic non-convex test function for optimization algorithms. It's known for having a narrow, parabolic-shaped global minimum valley, which can be difficult for many algorithms to navigate.

The function is defined as:

$$
f(\boldsymbol{x}) = \sum_{i=0}^{n-2} \left[ 100(x_{i+1} - x_i^2)^2 + (1 - x_i)^2 \right]
$$

where $\boldsymbol{x} = [x_0, x_1, \dots, x_{n-1}]$ is a vector of variables. The global minimum is at $\boldsymbol{x} = [1, 1, \dots, 1]$, where $f(\boldsymbol{x}) = 0$.

To effectively use gradient-based optimization methods, we also need the function's gradient (first derivative) and its Hessian (second derivative).

**Gradient (Jacobian):**

The gradient, or Jacobian, of the Rosenbrock function is a vector of its partial derivatives:

$$
\frac{\partial f}{\partial x_j} = \sum_{i=0}^{n-2} \frac{\partial}{\partial x_j} \left[ 100(x_{i+1} - x_i^2)^2 + (1 - x_i)^2 \right]
$$

**Hessian:**

The Hessian is the matrix of second partial derivatives. For the Rosenbrock function, it is a symmetric matrix.



-----

## Optimization Methods in `scipy.optimize.minimize`

The `scipy.optimize.minimize` function provides a unified interface for various optimization algorithms. We will explore and compare the following methods:

### Gradient-Free Methods

These methods do not require the gradient of the objective function.

  * **Nelder-Mead:** A simplex-based method that is robust but can be slow for high-dimensional problems.
  * **Powell:** A conjugate direction method that does not require the explicit gradient.
  * **COBYLA (Constrained Optimization BY Linear Approximation):** It's designed for constrained problems but can be used for unconstrained ones.
  * **COBYQA (Constrained Optimization BY Quadratic Approximation):** A newer derivative-free trust-region method based on quadratic models.

### Gradient-Based Methods

These methods utilize the gradient of the objective function to find the minimum more efficiently.

  * **CG (Conjugate Gradient):** An iterative method that is effective for large-scale problems.
  * **BFGS (Broyden–Fletcher–Goldfarb–Shanno):** A popular quasi-Newton method that approximates the inverse Hessian matrix.
  * **L-BFGS-B (Limited-memory BFGS with Bounds):** A memory-efficient version of BFGS suitable for problems with a large number of variables.
  * **TNC (Truncated Newton Conjugate-Gradient):** A truncated Newton algorithm that uses a conjugate gradient method to solve the Newton equations.
  * **SLSQP (Sequential Least Squares Programming):** Suitable for constrained optimization but can handle unconstrained problems as well.

### Hessian-Based Methods

These methods use the Hessian matrix (or its approximation), which provides information about the local curvature of the function.

  * **Newton-CG (Newton-Conjugate Gradient):** A Newton method where the search direction is found using a conjugate gradient algorithm.
  * **dogleg:** A trust-region method.
  * **trust-ncg:** A trust-region algorithm that uses a Newton-conjugate gradient approach.
  * **trust-krylov:** A trust-region method that uses a Krylov subspace method to solve the subproblem.
  * **trust-exact:** A trust-region method that solves the subproblem almost exactly.
  * **trust-constr:** A trust-region method for constrained optimization that can also be used for unconstrained problems.

-----

## Implementation and Comparison

Let's implement the Rosenbrock function and its derivatives in Python and then use these to test the various optimization methods.

### 1\. Importing Libraries

In [1]:
import numpy as np
import pandas as pd
from scipy.optimize import minimize, rosen, rosen_der, rosen_hess, rosen_hess_prod
import time

### 2\. Setting up the Benchmark

We will test the optimizers on the Rosenbrock function for dimensions:


In [2]:
# Dimensions to test
dimensions = [2, 10, 100, 500]

We will use a consistent starting point for all tests.

In [3]:
# List of methods to be tested
methods = [
    'Nelder-Mead', 'Powell',
    'CG', 'BFGS', 'Newton-CG', 'L-BFGS-B', 'TNC',
    'dogleg', 'trust-ncg', 'trust-krylov', 'trust-exact',
    'SLSQP', 'trust-constr'    
]
# 'COBYLA', 'COBYQA', 'Nelder-Mead' and 'Powell' are gradient-free methods
# 'COBYLA', 'COBYQA', 'SLSQP' and 'trust-constr' are methods for constrained optimization

# Analytical solution
analytical_solution = lambda n: np.ones(n)

# DataFrame to store results
results = []

### 3\. Running the Optimization Tests

Now, we'll loop through each method and dimension, run the optimization, and collect the performance metrics.

In [4]:
for n in dimensions:
    print(f"\n\n--- Testing for n = {n} ---")
    x0 = np.zeros(n)  # Initial guess
    solution = analytical_solution(n)

    for method in methods:
        print(f"  Testing method: {method}")

        # Determine which arguments are needed for the method
        kwargs = {'fun': rosen, 'x0': x0, 'method': method, 'options': {'maxiter': 10000}}

        # Methods that require Jacobian
        if method in ['CG', 'BFGS', 'L-BFGS-B', 'TNC', 'SLSQP', 'Newton-CG', 'dogleg', 'trust-ncg', 'trust-krylov', 'trust-exact', 'trust-constr']:
            kwargs['jac'] = rosen_der

        # Methods that require Hessian
        if method in ['Newton-CG', 'dogleg', 'trust-ncg', 'trust-krylov', 'trust-exact', 'trust-constr']:
            kwargs['hess'] = rosen_hess

            # Some methods can use the Hessian-vector product
            if method in ['Newton-CG', 'trust-ncg', 'trust-krylov', 'trust-constr']:
                # In fact, this function will not be used as "hess" has been provided
                kwargs['hessp'] = rosen_hess_prod

        try:
            start_time = time.perf_counter()
            result = minimize(**kwargs)
            end_time = time.perf_counter()

            success = result.success
            if success:
                print('   ', result.message)
            else:
                raise Exception(result.message)

            # Calculate metrics
            error     = np.linalg.norm(result.x - solution)
            fun_val   = result.fun
            grad_norm = np.linalg.norm(result.jac) if hasattr(result, 'jac') and result.jac is not None else 'N/A'
            nfev      = result.nfev
            njev      = result.njev if hasattr(result, 'njev') else 'N/A'
            nhev      = result.nhev if hasattr(result, 'nhev') else 'N/A'
            nit       = result.nit
            exec_time = end_time - start_time

            results.append({
                'Success':            success,
                'Dimension':          n,
                'Method':             method,
                'Error':              error,
                'Function Value':     fun_val,
                'Gradient Norm':      grad_norm,
                'Func Evals':         nfev,
                'Jac Evals':          njev,
                'Hess Evals':         nhev,
                'Iterations':         nit,
                'Execution Time (s)': exec_time
            })

        except Exception as e:
            print(f"    >>> Method {method} failed for n={n}: {e}")
            results.append({
                'Success':            success,
                'Dimension':          n,
                'Method':             method,
                'Error':              'Failed',
                'Function Value':     'Failed',
                'Gradient Norm':      'Failed',
                'Func Evals':         'Failed',
                'Jac Evals':          'Failed',
                'Hess Evals':         'Failed',
                'Iterations':         'Failed',
                'Execution Time (s)': 'Failed'
            })

results_df = pd.DataFrame(results)



--- Testing for n = 2 ---
  Testing method: Nelder-Mead
    Optimization terminated successfully.
  Testing method: Powell
    Optimization terminated successfully.
  Testing method: CG
    Optimization terminated successfully.
  Testing method: BFGS
    Optimization terminated successfully.
  Testing method: Newton-CG
    Optimization terminated successfully.
  Testing method: L-BFGS-B
    CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
  Testing method: TNC
    Converged (|f_n-f_(n-1)| ~= 0)
  Testing method: dogleg
    >>> Method dogleg failed for n=2: A linalg error occurred, such as a non-psd Hessian.
  Testing method: trust-ncg
    Optimization terminated successfully.
  Testing method: trust-krylov
    Optimization terminated successfully.
  Testing method: trust-exact
    Optimization terminated successfully.
  Testing method: SLSQP
    Optimization terminated successfully
  Testing method: trust-constr
    `gtol` termination condition is satisfied.


--- Testing for n = 10 

  result = minimize(**kwargs)


    Optimization terminated successfully.
  Testing method: Powell
    Optimization terminated successfully.
  Testing method: CG
    Optimization terminated successfully.
  Testing method: BFGS
    Optimization terminated successfully.
  Testing method: Newton-CG
    Optimization terminated successfully.
  Testing method: L-BFGS-B
    CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
  Testing method: TNC
    >>> Method TNC failed for n=10: Max. number of function evaluations reached
  Testing method: dogleg
    Optimization terminated successfully.
  Testing method: trust-ncg
    Optimization terminated successfully.
  Testing method: trust-krylov
    Optimization terminated successfully.
  Testing method: trust-exact
    Optimization terminated successfully.
  Testing method: SLSQP
    Optimization terminated successfully
  Testing method: trust-constr
    `gtol` termination condition is satisfied.


--- Testing for n = 100 ---
  Testing method: Nelder-Mead
    >>> Method Nelder-Mead 

-----

## Results and Discussion

Now let's display the results in a pandas DataFrame for easy comparison.

In [5]:
# Displaying the results
results_df

Unnamed: 0,Success,Dimension,Method,Error,Function Value,Gradient Norm,Func Evals,Jac Evals,Hess Evals,Iterations,Execution Time (s)
0,True,2,Nelder-Mead,0.000012,0.0,,146,,,79,0.004483
1,True,2,Powell,0.0,0.0,,423,,,16,0.009118
2,True,2,CG,0.000018,0.0,0.000008,42,42,,18,0.003916
3,True,2,BFGS,0.000002,0.0,0.000005,24,24,,19,0.009483
4,True,2,Newton-CG,0.000087,0.0,0.01421,53,53,33,33,0.016279
5,True,2,L-BFGS-B,0.000001,0.0,0.000005,25,25,,21,0.001903
6,True,2,TNC,0.00003,0.0,0.000076,46,,,13,0.002699
7,False,2,dogleg,Failed,Failed,Failed,Failed,Failed,Failed,Failed,Failed
8,True,2,trust-ncg,0.000195,0.0,0.000078,22,18,59,21,0.003191
9,True,2,trust-krylov,0.0,0.0,0.000022,24,24,55,23,0.004699


### Key Observations

  * **Performance with Increasing Dimensions:** Gradient-based methods, particularly those using Hessian information (like `Newton-CG` and the `trust-` family), tend to maintain their efficiency or scale better as the number of dimensions increases. In contrast, gradient-free methods like `Nelder-Mead` and `Powell` can become significantly slower and less accurate.
  * **Accuracy:** Methods that converge to the true minimum will have an error and function value close to zero. The `Gradient Norm` is also a good indicator of convergence; a smaller norm indicates that the algorithm has found a point where the gradient is close to zero, a necessary condition for a minimum.
  * **Computational Cost:** The number of function, Jacobian, and Hessian evaluations (`Func Evals`, `Jac Evals`, `Hess Evals`) gives a direct measure of the computational effort. Methods that require fewer evaluations are generally faster. Notice that Hessian-based methods require `Hess Evals`, which can be computationally expensive to compute for complex functions.
  * **Execution Time:** This is a practical measure of the algorithm's speed. It's influenced by the number of evaluations and the internal computations of the algorithm.

### Method-Specific Insights

  * **`Nelder-Mead` and `Powell`** are simple to use as they don't require derivatives. However, their performance degrades significantly in high-dimensional spaces.
  * **`CG` and `BFGS`** often provide a good balance between speed and accuracy. `L-BFGS-B` is particularly effective for very large-scale problems due to its limited memory usage.
  * **`Newton-CG` and the `trust-` methods** can be very fast and accurate if the Hessian is available and not too expensive to compute. They often converge in fewer iterations than other methods because the Hessian provides valuable curvature information.
  * **`COBYLA` and `COBYQA`**, as derivative-free methods, show decent performance but may not be as accurate or fast as gradient-based methods for smooth functions like Rosenbrock.
  * **`SLSQP`**, while designed for constrained optimization, is a robust general-purpose optimizer.

## Conclusion

Choosing the right optimization method is crucial in civil engineering applications. This notebook demonstrates how to benchmark and compare different optimizers in `scipy.optimize`.

Below is a detailed comparison table of the requested SciPy `optimize.minimize` methods, based on the official documentation:

| Method Name      | Algorithm Name                              | Gradient-free | Supports Bounds | Supports Constraints | Requires Gradient | Requires Hessian | Trust-region | Suitable Scale | Additional Notes |
|------------------|---------------------------------------------|:-------------:|:---------------:|:-------------------:|:-----------------:|:---------------:|:------------:|:--------------:|-----------------|
| Nelder-Mead      | Nelder-Mead Simplex                         | True      | False           | False               | False             | False           | False        | Small/Medium   | Simplex-based, robust but slow for high dimensions[1] |
| Powell           | Powell’s Direction Set                      | True      | True        | False               | False             | False           | False        | Small/Medium   | Supports bounds, not constraints[2] |
| CG               | Nonlinear Conjugate Gradient                | False         | False           | False               | True          | False           | False        | Large          | Needs gradient, unconstrained only[2] |
| BFGS             | Broyden–Fletcher–Goldfarb–Shanno            | False         | False           | False               | True          | False           | False        | Small/Medium   | Quasi-Newton, unconstrained[2] |
| Newton-CG        | Newton Conjugate Gradient                   | False         | False           | False               | True          | Optional*        | False        | Large          | Needs gradient, Hessian or Hessian-vector product[2][1] |
| L-BFGS-B         | Limited-memory BFGS with Bounds             | False         | True        | False               | True          | False           | False        | Large          | For large-scale, supports simple bounds[2] |
| TNC              | Truncated Newton Conjugate-Gradient         | False         | True        | False               | True          | False           | False        | Large          | Gradient-based, supports bounds[2] |
| COBYLA           | Constrained Optimization BY Linear Approx.  | True      | False           | True            | False             | False           | False        | Small/Medium   | Only inequality constraints, no bounds[2] |
| COBYQA           | Constrained Optimization BY Quadratic Approx.| True      | True        | True            | False             | False           | True         | Small/Medium   | Derivative-free, trust-region SQP[2] |
| SLSQP            | Sequential Least Squares Programming        | False         | True        | True            | True          | False           | False        | Small/Medium   | Handles bounds and constraints, gradient-based[2] |
| trust-constr     | Trust-Region Constrained Algorithm          | False         | True        | True            | True          | Optional*        | True     | Large          | Most versatile, supports all constraints, can use Hessian or approximation[3][1] |
| dogleg           | Trust-Region Dogleg                         | False         | False           | False               | True          | True        | True     | Small/Medium   | Needs gradient and Hessian, unconstrained[2] |
| trust-ncg        | Trust-Region Newton-Conjugate Gradient      | False         | False           | False               | True          | True/**Hessp**| True    | Large          | Needs gradient, Hessian or Hessian-vector product[4][1][5] |
| trust-krylov     | Trust-Region Krylov Subspace                | False         | False           | False               | True          | True/**Hessp**| True    | Medium/Large   | Needs gradient, Hessian or Hessian-vector product[2] |
| trust-exact      | Trust-Region Exact Hessian                  | False         | False           | False               | True          | True        | True     | Small/Medium   | Needs gradient and exact Hessian[2] |

**Legend & Notes:**
- **Gradient-free:** Does not require the user to provide derivatives.
- **Supports Bounds:** Can enforce variable bounds directly.
- **Supports Constraints:** Can enforce general (equality/inequality) constraints.
- **Requires Gradient:** Needs the gradient vector (either provided or approximated).
- **Requires Hessian:** Needs the Hessian matrix (or a Hessian-vector product, marked as *Optional* if not always required).
- **Trust-region:** Uses a trust-region approach for step selection.
- **Suitable Scale:** Typical problem size for which the method is efficient.
- *Optional*: Some methods can use either the full Hessian or just a Hessian-vector product (e.g., `trust-ncg`, `trust-krylov`, `trust-constr`)[2][6][4][1][5].

This table is based on the official SciPy documentation and the latest available information[2][6][3][4][1][5].

* [1] https://docs.scipy.org/doc/scipy/tutorial/optimize.html
* [2] https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
* [3] https://docs.scipy.org/doc/scipy/reference/optimize.minimize-trustconstr.html
* [4] https://docs.scipy.org/doc/scipy/reference/optimize.minimize-trustncg.html
* [5] https://docs.scipy.org/doc/scipy-1.4.1/reference/tutorial/optimize.html
* [6] https://docs.scipy.org/doc/scipy-1.11.3/tutorial/optimize.html
* [7] https://docs.scipy.org/doc/scipy/reference/optimize.html
* [8] https://stackoverflow.com/questions/58925576/how-to-choose-proper-method-for-scipy-optimize-minimize
* [9] https://docs.scipy.org/doc/scipy/reference/optimize.minimize-lbfgsb.html
* [10] https://stackoverflow.com/questions/46939100/i-need-some-intuition-about-the-trust-radius-in-scipys-trust-ncg-minimization
* [11] https://stackoverflow.com/questions/65400336/how-to-reuse-jacobian-and-inverse-hessian-with-scipy-minimize
* [12] https://stackoverflow.com/questions/58925576/how-to-choose-proper-method-for-scipy-optimize-minimize/64004381
* [13] https://discourse.julialang.org/t/is-there-a-julia-equivalent-of-scipy-optimize-minimize-method-tnc/45154
* [14] https://docs.jax.dev/en/latest/_autosummary/jax.scipy.optimize.minimize.html
* [15] https://github.com/scipy/scipy/issues/13754
* [16] https://docs.scipy.org/doc/scipy-1.3.3/reference/generated/scipy.optimize.minimize.html
* [17] https://stackoverflow.com/questions/41121061/is-there-a-bug-in-scipy-0-18-1s-scipy-optimize-minimize
* [18] https://stackoverflow.com/questions/48790279/alternatives-to-scipy-optimize-newton-cg-for-minimization-if-i-have-the-hessian
* [19] https://zfit.readthedocs.io/en/0.20.1/user_api/minimize/_generated/minimizers/zfit.minimize.ScipyLBFGSBV1.html
* [20] https://scipy-lectures.org/advanced/mathematical_optimization/