The method explained in the following tries to solve our problem of finding an integral solution from a somewhat different perspective.

# Lagrange Relaxation

Consider the MIP
<br><br>
$
\begin{equation}
    \begin{array}{ll@{}ll}
    \displaystyle \text{max} & c^T x &\\
    \displaystyle \text{s.t.}& A^1 x  & \leq b^1 &\\
    \displaystyle            & A^2 x  & \leq b^2 &\\
                             &     x  & \in Z^p \times R^{n-p}
    \end{array}
\end{equation}
$
<br><br>
The idea here is, that we have already split-up our constraints and put the "easier" part into $P(A^2,b^2)$, while the harder part is in $P(A^1, b^1)$.

Then we can replace the "hard" part of the constraints by a penalty in the objective function, yielding the so-called <b>Lagrange Equation</b>:
<br><br>
$
\begin{equation}
    \begin{array}{ll@{}ll}
    \displaystyle L(\lambda) & = \text{max}\{c^Tx + \lambda^T(b^1 - A^1 x)|x\in X^2\} &\\
    \displaystyle \text{with} & \lambda \in R^{m_1}_{\geq 0}, &\\
    & X^2 := \{x|A^2x\leq b^2\} \cap (Z^p \times R^{n-p})
    \end{array}
\end{equation}
$

The <b>Lagrange Relaxation</b> now tries to minimize this equation, which gives an upper bound on the optimal objective of the MIP $\forall \lambda \geq 0$:
<br><br>
$\text{min}_{\lambda \geq 0} L(\lambda) = \text{max}\{c^Tx|A^1x \leq b^1, x \in \text{conv}(X^2)\}$
<br><br>
This yields an approximation at least as close to the true optimum of the MIP as the optimum of the LP relaxation:
<br><br>
$z_{MIP} \leq \text{min}_{\lambda \geq 0} L(\lambda) \leq z_{LP}$
<br><br>
Note that $L(\lambda)$ is convex, so we can optimize it using the <b>subgradient method</b>!

# Subgradient Method

Let $f:R^r\rightarrow R$ be a convex function. Then, $h\in R^r$ is called a <b>subgradient</b> of $f$ at $\lambda$, if
<br><br>
$f(\lambda')\geq f(\lambda) + h^T(\lambda' - \lambda) \>\>\>\> \forall \lambda' \in R^r$
<br><br>
Note that if the function $f$ is minimzed at $\lambda^*$, then
<br><br>
$0 \geq f(\lambda^*) - f(\lambda)\geq h^T(\lambda^* - \lambda)$
<br><br>
Using this subgradient in a gradient-descent type algorithm ($-h$ points towards $\lambda^*$) yields a linear over-estimation of the objective, which allows to also optimze non-smooth functions (which is the case for our function $L(\lambda)$, as it is the maximum of finitely many affince functions$.

Three important questions to finally run this approach remain.
1. <i>How do we know when to stop the procedure?</i>
<br><br>
The answer here, to keep it simple, is to stop at a certain threshold to the distance to the optimum $f(\lambda^*) - f(\lambda)$
<br><br>
2. <i>How do we calculate the subgradient?</i>
<br><br>
This has to be answered specifically for the problem we are looking at. For the lagrange relaxation one can show that if $x^\lambda$ is an optimal solution of $L(\lambda)$, then $h=b^1-A^1x\lambda$ is a subgradient of $L$ at $\lambda$.
<br><br>
3. <i>How can we choose the step-size appropriately?</i>
<br><br>
For this approach, it can be shown that if $f$ attains a minimum and there is a bound $H\in R:||h^k||\leq H \>\> \forall k$ for the norm of subgradients $h^k$ used in every step $k$ of the method, we can choose the stepsize according to 
<br><br>
$\displaystyle \sum_{k=0}^\infty \mu_k = \infty, \sum_{k=0}^\infty \mu_k^2 < \infty$
<br><br>
then the row of produces optima $(\bar f_k)_k$ converges to the minimum $f^*=f(\lambda^*)$. If we choose $\mu_k = \frac{1}{k}$, then even $(\lambda^k)_k$ converges to $\lambda^*$.

Let's see this method in action:

In [1]:
import numpy as np
import matplotlib.pyplot as plt

print("hello world°°°!")

hello world°°°!
