*Credit*: These examples originally came from [The Kitchin Research Group Blog](http://kitchingroup.cheme.cmu.edu/blog/2018/11/03/Constrained-optimization-with-Lagrange-multipliers-and-autograd/). I have adapted them to be more verbose, use notation consistent with [Mathematics for Machine Learning](https://mml-book.github.io/) and add some colour. I've also repeated some of the documentation pertaining to the `minimize` module [from the SciPy docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html).

# Constrained Optimization

In the previous notebook we saw that the `scipy.optimize` package provides several unconstrained optimization algorithms. It is rarer to need constrained optimization algorithms in machine learning, but if we need them, they are available. 

## Method of Lagrange multipliers

Let's first use Python to help us implement the method of Lagrange multipliers. Consider the following constrained optimization problem in three variables:

$$
\begin{align}
\min_{x_1, x_2, x_3} f(\mathbf{x}) = x_1^2 + x_2^2 + x_3^2\\
\text{subject to}:   h(\mathbf{x}) = 2x_1 - x_2 + x_3 - 3 = 0 .\
\end{align}
$$

Geometrically this corresponds to finding a point on a plane $2x_1 - x_2 + x_3 = 3$ which is closest to the origin $\mathbf{x} = \mathbf{0}$.


We can convert this into an unconstrained problem by using the method of Lagrange multiplers. The Lagrangian is
$$
\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}) = f(\mathbf{x}) + \lambda h(\mathbf{x}) .\
$$

Note that because $h(\cdot)$ is an equality rather than an inequality constraint, we leave $\lambda$ unconstrained rather than restricting it to be non-negative.

We can take the gradient of the Lagrangian with respect to $\mathbf{x}$ and setting the differential to zero, $\nabla_\mathbf{x} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}) = \mathbf{0}$. This will give us three equations:

$$
\begin{align}
\frac{\partial \mathcal{L}}{\partial x_1} & = 0\\
\frac{\partial \mathcal{L}}{\partial x_2} & = 0\\
\frac{\partial \mathcal{L}}{\partial x_3} & = 0 .\
\end{align}
$$

We also have a fourth equation corresponding to the equality constraint:
$$
h(\mathbf{x}) = 2x_1 - x_2 + x_3 - 3 = 0
$$

### Exercise

On paper, work out the three partial derivatives above, corresponding to $\nabla_\mathbf{x} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}) = \mathbf{0}$.

So to recap, we have a system of four equations and four variables, and we simply need to find the root (i.e. the solution for which each of those equations equal zero). Now, of course, you could work this out by hand. But we're interest in seeing how Python can help us out.

Let's use `autograd` from Unit 3 to compute the gradients. When you worked out these by hand, you would have seen that they are pretty easy partial derivatives. But `autograd` is nice when we have more complicated expressions.

In [0]:
import autograd.numpy as np
from autograd import grad

In [0]:
def f(x):
  "Our original objective function"
  x1, x2, x3 = x
  return x1**2 + x2**2 + x3**2

def h(x):
  "Our equality constraint"
  x1, x2, x3 = x
  return 2*x1 - x2 + x3 - 3

def lagrangian(l):
  "The Lagrangian function"
  x1, x2, x3, _lambda = l  # note that lambda is a reserved word in Python
  return f([x1, x2, x3]) + _lambda * h([x1, x2, x3])

# Gradients of the Lagrangian
dLdl = grad(lagrangian)  # courtesy of autograd, nice!

def obj(l):
  x1, x2, x3, _lambda = l
  dLdx1, dLdx2, dLdx3, dLdlam = dLdl(l)
  # return the terms that should be set to zero (note we throw away dLdlam)
  return [dLdx1, dLdx2, dLdx3, h([x1, x2, x3])]

So we now have an expression the returns each of the equations that must equal
zero. The last step is to solve this thing.

`scipy.optimize` provides a generic tool for root finding called `fsolve`. It has a similar interface to `minimize`.

In [0]:
from scipy.optimize import fsolve
l0 = np.array([0.0, 0.0, 0.0, 1.0])  # the initial values of x1, x2, x3, _lambda
x1, x2, x3, _lambda = fsolve(obj, l0)  # note this is the variable 'l0' not 10
print(f'The solution is at {x1, x2, x3} and lambda={_lambda}')

### Exercise

It was kind of wasteful to compute $\frac{\partial \mathcal{L}}{\partial \lambda}$ and then throw it away. Can you rewrite the code to only compute the necessary partial derivatives?

### Exercise

What happens if we formulate the Lagrangian as $
\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}) = f(\mathbf{x}) - \lambda h(\mathbf{x})$ (i.e. we subtract rather than add the penalty term)? You may wish to check out [this Stack Exchange post](https://math.stackexchange.com/q/1099429) for a discussion.

Now, how do we know whether the function is at a minimum? We can check that the Hessian is positive definite in the original objective $f(\mathbf{x})$. `autograd` makes this really easy.

In [0]:
from autograd import hessian
hess = hessian(f)
hess(np.array([x1, x2, x3]))

A positive definite Hessian will have all positive eigenvalues. The eigenvalues should be obvious from the diagonal structure of the Hessian. But we can check this as follows:

In [0]:
eigvals, eigvecs = np.linalg.eig(hess(np.array([x1, x2, x3])))
print(eigvals)

## Constrained minimization of multivariate scalar functions (`minimize`)

The `minimize` function provides algorithms for **constrained** minimization, namely `trust-constr`, `SLSQP` and `COBYLA`. They require the constraints to be defined using slightly different structures. The method `trust-constr` requires the constraints to be defined as a sequence of objects [LinearConstraint](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.LinearConstraint.html#scipy.optimize.LinearConstraint) or [NonlinearConstraint](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.NonlinearConstraint.html#scipy.optimize.NonlinearConstraint). Methods `SLSQP` and `COBYLA`, on the other hand, require constraints to be defined as a sequence of dictionaries, with keys `type`, `fun`, and `jac`.

## Trust-Region Constrained Algorithm (`method='SLSQP'`)


We've already formulated the objective and constraints, so let's see how easy it is to use the method of Sequential Least SQuares Programming (SLSQP) provided by `scipy.optimize`.

In [0]:
#@title
from scipy.optimize import minimize
sol = minimize(f, [1, -0.5, 0.5], constraints={'type': 'eq', 'fun': h},
               options={'disp': True})

We note that the solution is the same as above. That was a lot easier than forming the Lagrangian ourselves!

### Exercise

The `scipy.optimize` default solver for constrained problems is SQSLP.  Try solving this problem using the other constrained minimization algorithms available through `minimize`: `trust-constr` and `COBYLA`. You will need to specify a `method` argument.