# 4 Optimality Conditions

If a function $f:\ \mathbb R^n\rightarrow \mathbb R$ is continuously differentiable, one can find its local minimum by derivative.

**Theorem** Suppose $f:\ \mathbb R^n \rightarrow \mathbb R$ is continuously differentiable over an open set. If some point on the open set has $\nabla f(x_0)\neq 0$, then $x_0$ is not a local minimum (or local maximum). 

Proof: Suppose $d = \nabla f(x_0)\neq 0$ and a sufficiently small $r>0$ such that $f$ is continuously differentiable over the ball $\{ x:\ \Vert x - x_0\Vert \leqslant r\}$. By the continuity of the derivative and $f(x_0)^Td = d^Td>0$ we may also assume $f(x)^Td>0$ in the ball. Recall that the Taylor's theorem states that there exists some $\alpha \in (0,1)$ such that 
$$f\left(x_0 - r\frac{d}{\Vert d \Vert}\right) = f(x_0) -  r \nabla f\left(x_0-\alpha r\frac{d}{\Vert d\Vert}\right)^T\frac{d}{\Vert d\Vert} < f(x_0).$$


**Theorem** Suppose $f:\ \mathbb R^n \rightarrow \mathbb R$ is continuously differentiable over an open set. If on the open set it has some local minimum (or local maximum) $x_*$, then $\nabla f(x_*) = 0$.

Proof: This is the converse of the previous theorem. 

**Theorem** Suppose $f:\ \mathbb R^n \rightarrow \mathbb R$ is continuously differentiable and convex over an open set. Then on the open set $x_*$ is a global minimum iff $\nabla f(x_*) = 0$.

Proof: The $\Rightarrow $ side is trivial by the argument above. The $\Leftarrow$ side yields from $f(y)\geqslant f(x_*) + \nabla f(x_*)^T(y-x_*) = f(x_*)$.


**Theorem**  Suppose $f:\ \mathbb R^n \rightarrow \mathbb R$ is twice continuously differentiable over an open set. Then if on the open set $x_*$ is local minimum will $\nabla f(x_*) = 0 $ and $\nabla^2 f(x_*)\succeq 0$. Conversely if  $\nabla f(x_*) = 0$ and $\nabla^2 f(x_*)\succ 0$, then $x_*$ is a local minimum.

Proof: $\Rightarrow $ side: If not, we assume $\nabla^2 f(x_*)<0$ and thus there exists some $d$ such that $d^T\nabla ^2 f(x_*)d<0$. The continuity of the second order derivative guarantees that $d^T\nabla f(x_*)d < 0$ around a neighborhood of $x_*$. Then the Taylor's theorem claims that for sufficiently small $r>0$ we have 
$$f\left(x_* + r\frac{d}{\Vert d \Vert}\right)= f(x_*) + \frac{d^Tr}{\Vert d\Vert }\nabla^2 f\left(x_* - \alpha r\frac{d}{\Vert d \Vert}\right) \frac{rd}{\Vert d\Vert }<f(x_*).$$

$\Leftarrow $ side is also a direct application of Taylor's theorem and the continuity of $\nabla^2 f(x)$.

## Fritz John (FJ) Necessary Conditions

Fritz John necessary conditions applies Lagrange multipliers on the constrained optimization problems. If $f,g,h$ is 
continuously differentiable on the region, suppose the optimization problem 
$$\min_x \{f(x):\quad g_i(x)\leqslant 0,\quad h_j(x) = 0\}$$
has a local minimum $x_*$.  Then there exist $u,\lambda_i,\mu_j\in \mathbb R$ that are not simultaneously zero such that 
$$\begin{aligned}
g_i(x_*)& \leqslant 0 \\
h_i(x_*)& = 0\\
u\nabla f(x_*) + \sum \lambda_i \nabla g_i(x_*) + \sum \mu_j \nabla h_j(x)  & = 0\\
u &\geqslant 0\\
\lambda_i &\geqslant 0\\
\lambda_i g_i(x_*) &= 0.
\end{aligned}$$

The leading two conditions are the original ones. The third is the gradient of after introducing the Lagrange multpliers. The fourth and fifth guarantees the inequality while the last implies that either $\lambda_i = 0$ or 
$g_i(x_*) = 0$, known as complementary slackness.

The converse does not hold, i.e. a Fritz John solution might not be a local minimum.

## Karush-Kuhn-Tucker (KKT) conditions

If $f,g,h$ is 
continuously differentiable on the region, suppose the optimization problem 
$$\min_x \{f(x):\quad g_i(x)\leqslant 0,\quad h_j(x) = 0\}$$
has a local minimum $x_*$.  Then, if denote by $I=\{i\in\{1,2,\dotsc\}:\ g_i(x_*) = 0\}$ the active set and 
the set of vectors $\{\nabla g_i(x_*)\}_{i\in I}\cup \{\nabla h_j(x_*)\}$ are linearly independent, there exist $\lambda_i,\mu_j\in \mathbb R$  such that 
$$\begin{aligned}
g_i(x_*)& \leqslant 0 \\
h_i(x_*)& = 0\\
\nabla f(x_*) + \sum \lambda_i \nabla g_i(x_*) + \sum \mu_j \nabla h_j(x)  & = 0\\
\lambda_i &\geqslant 0\\
\lambda_i g_i(x_*) &= 0.
\end{aligned}$$
And we call $x_*$ a Karush-Kuhn-Tucker (KKT) point. 

The converse does not hold, i.e. a KKT point might not be a local minimum.

Note that when the set of vectors are linearly independent will the problem solved by some KKT point, it implies there might not be any solution of KKT system otherwise. Compared to FJ necessary condition, KKT ensures $u>0$ in FJ.

Example: Solve
$$\min \{x_1:\quad (x_1 - 1)^2+(x_2 - 1)^2\leqslant 1,\quad (x_1 - 1)^2+(x_2 +1)^2\leqslant 1\}$$
While there is only one point $[1,0]^T$ in the feasible set, the minimum is $1$. But the KKT system is 
$$\begin{aligned}
(x_1 - 1)^2+(x_2 - 1)^2&\leqslant 1\\
\quad (x_1 - 1)^2+(x_2 +1)^2&\leqslant 1\\
1 + 2\lambda_1 (x_1 - 1) + 2\lambda_2 (x_2 - 1) &= 0\\ 
2\lambda_1 (x_2 - 1) + 2\lambda_2 (x_2 + 1) &= 0\\
& \dotsm
\end{aligned}$$
The first two have required $[x_1,x_2] = [1,0]$ but as a consequence, the third and the fourth cannot hold both. So there is no KKT solution here, simply because $\nabla f_1([1,0]^T = [0,-1]^T$ and $\nabla f_2([1,0]^T) = [0,1]^T$ are linearly dependent.

### Global Minimum

#### Weierstrass Theorem

$f$ is continuous on a compact set $C\in\mathbb R^n$, then it has a global minimum over $C$.

#### Coerciveness

Let $f:\mathbb R^n\rightarrow \mathbb R$ be a continuous function and it is called coercive if 
$$\lim_{\Vert x\Vert\rightarrow \infty}f(x) = \infty$$

If $f$ is coercive, and $S\subset \mathbb R^n$ is an arbitrary nonempty closed set, then 
$f$ has a global minimum over $S$.

## Slater Condition

**Definition** If $f,g,h$ is 
continuously differentiable on the region, consider the optimization problem 
$$\min_x \{f(x):\quad g_i(x)\leqslant 0,\quad h_j(x) = 0\}.$$
If $x_*$ is a local minimum of the  and $I = \{i:\ g_i(x_*) = 0\}$ is the active set. Then if we have at least one $x'$ in the feasible set such that $g_i(x')<0$ for $i\in I$, we say that the Slater condition is satisfied.

More generally, the Slater condition allow $g_i(x') = 0$ if $g_i$ is affine.

### Necessary KKT

**Theorem** If in the problem $g_i$ are all **convex** and $h_j$ are all **affine**. Then if $x_*$ is a local minimum and the Slater condition gets satisfied, $x_*$ is a KKT point, i.e. $g_I(x)$ is linearly independent. 

<font color = red> **Note** The target function $f$ is NOT necessarily convex.</font>

Proof: As $h_j$ are affine, without loss of generality we may assume that $\nabla h_j$ are linearly independent. 

From FJ necessary condition, we have $u\nabla f(x_*) +\sum \lambda_i\nabla g(x_*) + \sum \mu_j  \nabla h(x_*) = 0$ and $u,\lambda \geqslant 0$. Then it suffices to  show that $u>0$ is strict. If not, on the one hand $u = 0$ implies that 
$$\sum_{i\in I} \lambda_i \nabla g_i(x_*) + \sum \mu_j \nabla h_j(x_*) = 0\quad (\star)$$
where  $I = \{i:\ g_i(x_*) = 0\}$ is the active set because $\lambda_i = 0$ as long as $i\notin I$ by the complementary slackness. 

On the other, note that $u,\lambda_i,\mu_j$ are not simutaneously zero, the linear independence of $h_j$ constraints that $I$ is nonempty and there exists at least some $i\in I$ such that $\lambda_i \neq 0$. the Slater condition gives

$$0>g_i(x') \geqslant  g_i(x_*) + \nabla g_i(x_*)^T(x' - x_*)= \nabla g_i(x_*)^T(x' - x_*)\quad \forall i\in I.$$ 

By linearity of $h_j$ we learn that $\nabla h_j^T$ is independent of $x$ and since $0 = h_j(x_*) = h_j(x')$, 
$$0=\nabla h_j(x_*)^T (x' - x_*)\quad \forall j.$$ 

Consequently, sum these up to obtain
$$\sum_{i\in I} \lambda_i \nabla g_i(x_*)^T(x' - x_*) + \sum \mu_j \nabla h_j(x_*)^T(x' - x_*) < 0,$$
which is  a contradiction to $(\star)$.



## Convex KKT

For a convex optimization problem, if $f,g,h$ is 
continuously differentiable on the open and convex set $X$, consider the convex optimization problem 
$$\min_x \{f(x):\quad g_i(x)\leqslant 0,\quad h_j(x) = 0,\quad x\in X\}$$
where $f,g$ are convex and $h$ are affine. Then if $x_*$ is a KKT point, i.e. there exists $\lambda_i,\mu_j$ such that 
$$\begin{aligned}
g_i(x_*)& \leqslant 0 \\
h_i(x_*)& = 0\\
x_*&\in X\\
\nabla f(x_*) + \sum \lambda_i \nabla g_i(x_*) + \sum \mu_j \nabla h_j(x)  & = 0\\
\lambda_i &\geqslant 0\\
\lambda_i g_i(x_*) &= 0.
\end{aligned}$$
Then $x_*$ must be the global minimum.

Proof: For a KKT point $x_*$ we prove that it is the  global minimum. We have already known that it is sufficient 
to show that $\nabla f(x_*) = 0$ by the convexity.

### Examples

#### Log Determinant

Given $A\in \mathcal S_{++}^n$ and $b>0$. Solve in closed form
$$\min\{-\log {\rm det} Z:\ A\bullet Z\leqslant b,\ Z\in \mathcal S_{++}^n\}$$
where $A\bullet Z = {\rm tr}(A^*Z) = {\rm vec}(A)^*{\rm vec}(Z)$ is the standard inner product and $^*$ stands for the conjugate transpose.

Solution: We have shown in previous courses that the target function is convex, note that 
$$-\nabla \log {\rm det} Z = -\frac{\partial}{\partial {\rm det}Z}\log {\rm det} Z  \cdot 
\frac{\partial}{\partial Z}{\rm det}Z =-\frac{{\rm adj}Z}{ {\rm det}Z} = -Z^{-1} $$

And the KKT condition requires $-Z^{-1}+\lambda A = 0$, leading to $Z = \lambda^{-1}A^{-1}$ as long as $\lambda \neq 0$. Since $$b = A\bullet Z = \lambda^{-1}A\bullet A^{-1} = \lambda^{-1}{\rm tr}(I) = n\lambda^{-1},$$
 we obtain $\lambda = \frac nb$ and $Z = \frac bnA^{-1}$.

 
#### Water Filling

Given $\alpha>0$ and $e = [1,1,\dotsc,1]^T$. Solve
$$\min\{-\sum_{i=1}^n \log(x_i+\alpha_i):\quad x\geqslant 0,\quad e^Tx = 1\}.$$

Solution: This is a convex optimization and we construct from the gradient that
$$-\frac{1}{x_i+\alpha_i} - \lambda_i +\mu  = 0.$$
The complementary slackness requires that $\lambda_i = 0$ or $x_i = 0$. If $\lambda_i = 0$, then $0\leqslant x_i = \frac{1}{\mu} - \alpha_i$. If $x_i = 0$, then $0\leqslant \lambda_i = \mu - \frac{1}{\alpha_i}$. 

Conversely, given $\mu$ and $i$, if $\frac{1}{\mu} \geqslant \alpha_i$, then $x_i = \frac{1}{\mu} - \alpha_i$. If $\frac{1}{\mu} < \alpha_i$, then $x_i = 0$. Hence we summarize

$$x_i = \max\{\frac{1}{\mu}-\alpha_i,0\}.$$

Back substitution yields that $\sum_{i=1}^n\max\{\frac{1}{\mu}-\alpha_i,0\}$, of which a geometric 
characterization is illustrated below.