# Penalties and Log Barriers

Based on Homework at https://www.user.tu-berlin.de/mtoussai/teaching/13-Optimization/
From M. Toussaint

In [2]:
using Plots
using LinearAlgebra

plotly()

Plots.PlotlyBackend()

## (1) Equality Constraint Penalties and augmented Lagrangian

(We don't need to know what the Langangian is (yet) to solving this exercise.) In the lecture we discussed the squared penalty method for inequality constraints. There is a straight-forward version for equality constraints: Instead of
$$\min _{x} f(x) \quad \text { s.t. } \quad h(x)=0 $$
we address
$$ \min _{x} f(x)+\mu \sum_{i=1}^{m} h_{i}(x)^{2} $$
such that the squared penalty pulls the solution onto the constraint $h(x)=0 .$ Assume that if we minimize (2) we end up at a solution $x_{1}$ for which each $h_{i}\left(x_{1}\right)$ is reasonable small, but not exactly zero. We also mentioned the idea that we could add an additional term which counteracts the violation of the constraint. This can be realized by minimizing
$$ \min _{x} f(x)+\mu \sum_{i=1}^{m} h_{i}(x)^{2}+\sum_{i=1}^{m} \lambda_{i} h_{i}(x) $$
for a "good choice" of each $\lambda_{i}$. It turns we can infer this "good choice" from the solution $x_{1}$ of (2) 

#### Task:
Prove that setting $\lambda_{i}=2 \mu h_{i}\left(x_{1}\right)$ will, if we assume that the gradients $\nabla f(x)$ and $\nabla h(x)$ are (locally) constant, ensure that the minimum of (3) fulfils exactly the constraints $h(x)=0$ 

Tip: Think intuitively. Think about how the gradient that arises from the penalty in (2) is now generated via the $\lambda_{i}$


#### Result:

First, we think back to how we arrived at $\lambda_i = 2 \mu h_i(x_1)$. We first take the gradient of (2), which is the optimality condition:
$$ \nabla_{x} f(x)+\mu \nabla_{x} \sum_{i=1}^{m} h_{i}(x)^{2} = 0 $$
$$ \nabla_{x} f(x)+\mu \sum_{i=1}^{m} 2 h_{i}(x) \nabla_{x} h_i(x) = 0 $$

We next take the gradient of (3), which is the optimality condition:

$$ \nabla_{x} f(x)+\mu \nabla_{x} \sum_{i=1}^{m} h_{i}(x)^{2}+ \nabla_{x} \sum_{i=1}^{m} \lambda_{i} h_{i}(x) = 0$$
$$ \nabla_{x} f(x)+\mu \sum_{i=1}^{m} 2 h_{i}(x) \nabla_{x} h_i(x)+ \sum_{i=1}^{m} \lambda_{i} \nabla_{x} h_{i}(x) = 0$$
$$ \nabla_{x} f(x)+\sum_{i=1}^{m} (2 \mu  h_{i}(x) + \lambda_{i}) \nabla_{x} h_i(x) = 0$$

For optimization of (2), we get $x_1$ when $\lambda_i = 0$. That is,
$$\lambda_{i}^{new} \leftarrow 2 \mu  h_{i}(x) + \lambda_{i}^{old} = 2 \mu h_i(x_1)$$

The key assumptions are that $\nabla_x f(x)$ and $\nabla_x h(x)$ are (locally) constant. Hence, call the argmin of (3) = $x_2$. Then $\nabla_x f(x_1) = \nabla_x f(x_2)$ and $\nabla_x h(x_1) = \nabla_x h(x_2)$

Therefore, we have 
$$ \nabla_{x} f(x_1)+\mu \sum_{i=1}^{m} 2 h_{i}(x_1) \nabla_{x} h_i(x_1) = 0 = \nabla_{x} f(x_2)+\mu \sum_{i=1}^{m} 2 h_{i}(x_1) \nabla_{x} h_i(x_2) $$

Looking at the optimality condition with $x_2$, we have 
$$ \nabla_{x} f(x_2)+\mu \sum_{i=1}^{m} 2 h_{i}(x_2) \nabla_{x} h_i(x_2)+ \sum_{i=1}^{m} \lambda_{i} \nabla_{x} h_{i}(x_2) = 0$$
And with the current $\lambda_i = 2 \mu h_i(x_1)$, we have
$$ \underbrace{\nabla_{x} f(x_2)}+\mu \sum_{i=1}^{m} 2 h_{i}(x_2) \nabla_{x} h_i(x_2)+ \underbrace{\sum_{i=1}^{m} 2 \mu h_i(x_1) \nabla_{x} h_{i}(x_2)} = 0$$

We now identify the conditions for optimality of (2) using the key assumptions that $\nabla_x f(x)$ and $\nabla_x h(x)$ are (locally) constant. So,
$$\mu \sum_{i=1}^{m} 2 h_{i}(x_2) \nabla_{x} h_i(x_2) = 0$$

This new optimality condition is equivalent to 
$$ \min _{x} \mu \sum_{i=1}^{m} h_{i}(x)^{2}$$

Which occurs when $h(x) = 0$ as desired.

## (2) Squared Penalties \& Log Barriers

In the last exercise we defined the "hole function" $f_{\text {hole }}^{c}(x),$ where we now assume a conditioning $c=4$ Consider the optimization problem