# Lecture 8, Optimality conditions

Now we will move to studying constrained optimizaton problems i.e., the full problem
$$
\begin{align} \
\min \quad &f(x)\\
\text{s.t.} \quad & g_j(x) \geq 0\text{ for all }j=1,\ldots,J\\
& h_k(x) = 0\text{ for all }k=1,\ldots,K\\
&x\in \mathbb R^n.
\end{align}
$$

In order to identify which points are optimal, we want to define similar conditions as there are for unconstrained problems through the gradient:

>If $x$ is a  local optimum to function $f$, then $\nabla f(x)=0$.

Since, if $\nabla f(x)d<0$, **then** $d$ is a descent direction, the following theorem follows easily.

**Theorem:** Consider an optimization problem
$$
\begin{align}
\min &\  f(x)\\
\text{s.t. }&\ x\in S
\end{align}
$$
and let $x^*\in S$. Now, if $x^*$ is a local minimizer, then $\{d\in B(0,1):\nabla f(x^*)d<0 \}\cap D$ is empty.

## KKT conditions


Unfortunately, the set $D$ is not easily explicitly modelled. Thus, we need to develop  methods for explicitly defining the set $D$ or even better the set $\{d\in B(0,1):\nabla f(x^*)d<0 \}\cap D$. This is done through the KKT conditions:

**Theorem (Kuhn-Tucker Necessary Conditions)** Let $x^**$ be a local minimum for problem
$$
$$
\begin{align} \
\min \quad &f(x)\\
\text{s.t.} \quad & g_j(x) \geq 0\text{ for all }j=1,\ldots,J\\
& h_k(x) = 0\text{ for all }k=1,\ldots,K\\
&x\in \mathbb R^n.
\end{align}
$$
$$

and assume that $x^*$ is regular. Then there exists unique Lagrance multiplier vectors $\lambda^* = (\lambda^*_1,\ldots,\lambda_J^*)$ and $\mu^*=(\mu_1^*,\ldots,\mu_K^*)$ such that
$$
\begin{align}
&\nabla_xL(x,\lambda,\mu) = 0\\
&\mu_j^*\geq0,\text{ for all }j=1,\ldots,J\\
&\mu_j^*=0,\text{for all }j\in A(x^*),
\end{align}
$$
where $$L(x,\lambda,\mu) = f(x)+\sum_{j=1}^J\lambda_jh_j(x) + \sum_{k=1}^K\mu_kg_k(x)$$ and $A(x^*)$ is the set of active constraints at $x^*$. If in addition $f$, $h$ and $g$ are twice continuously differentiable, it holds that
$$
yH_{x}L(x^*,\lambda^*,\mu^*)y\geq0, \text{ for all }y\in V(x^*),
$$
where 
$$
V(x^*) = \{y:\nabla h_j(x^*)'y=0, \text{ for all }j=1,\ldots,J, \text{ and }\nabla g_k(x^*)'y=0, \text{ for all }j\in A(x^*).
$$

**Example (page 285, Bertsekas: Nonlinear Programming)** Consider the optimizaiton problem
$$
\begin{align}
\min &\qquad \frac12 (x_1^2+x^2_2+x^2_3)\\
\text{s.t}&\qquad x_1+x_2+x_3\geq 0.
\end{align}
$$
Let us verify the Kuhn-Tucker necessary conditions for the local optimum $x^*=(-1,-1,-1)$.

In [76]:
def f(x):
    return 0.5*sum([i**2 for i in x])
def g(x):
    return 3-sum(x)
def h(x):
    return 0*sum(x)

In [77]:
import numpy as np
import ad
def grad_x_L(x,lambda_,mu,f,g,h):
    return ad.gh(f)[0](x)+lambda_*np.array(ad.gh(h)[0](x))+mu*np.array(ad.gh(g)[0](x))

In [78]:
import ad
mu = 1
lambda_ = 10 #Does not play a role. Think why?
x_opt = [-1,-1,-1]


print grad_x_L(x_opt,lambda_,mu,f,g,h)

print g(x_opt)

[-2. -2. -2.]
6
