# CHAPTER 6 - Duality

---
---

**Author:** Dr Giordano Scarciotti (g.scarciotti@imperial.ac.uk) - Imperial College London 

**Module:** ELEC70066 - Advanced Optimisation

**Version:** 1.1.4 - 22/02/2023

---
---

The material of this chapter is adapted from $[1]$.

In this chapter we define duality. Thanks to duality we define necessary and sufficient conditions of optimality, provide lower bounds of the optimal value and a sensitivity analysis with respect to the violation of the constraints. Contents:

*   Section 6.1 The Lagrange Dual Function
*   Section 6.2 Lagrange Dual Problem
*   Section 6.3 Geometric Interpretation
*   Section 6.4 Optimality Conditions
*   Section 6.5 Perturbation and Sensitivity Analysis
*   Section 6.6 Duality and Problem Reformulations
*   Section 6.7 Generalised Inequalities

# 6.1 The Lagrange Dual Function

## 6.1.1 Definitions

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/DBFv3pffBUk"></iframe>')

Consider a general (i.e. may or may not be convex) optimisation problem 

$$
\begin{array}{lll}
\min & f_0(x) &\\
s.t. & f_i(x) \le 0, & i = 1,\dots,m\\
& h_i(x) = 0,  & i = 1,\dots,p,
\end{array} \tag{1}
$$

assume that its domain $\mathcal{D}$ is non-empty and denote the optimal value by $p^*$. Define the **Lagrangian** as

$$
\displaystyle L(x,\lambda, \nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x)
$$

with domain $\textbf{dom }L = \mathcal{D} \times \mathbb{R}^m\times \mathbb{R}^p$. The vectors $\lambda$ and $\nu$ are called **dual variables** or **Lagrangian multiplier vectors** associated to problem $(1)$. We define the **Lagrange dual function**, or simply **dual function**, as the minimum value of the Lagrangian over $x$, namely

$$
\displaystyle  g(\lambda, \nu) = \inf_{x\in\mathcal{D}}  L(x,\lambda, \nu) = \inf_{x\in\mathcal{D}} \left( f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x)\right)
$$

Since the dual function is the pointwise infimum of a family of affine functions in $\lambda$ and $\nu$, it is concave (always, no matter whether $(1)$ is convex or not).



The dual function yields lower bounds on the optimal value $p^*$ of problem $(1)$. In fact, suppose that $\tilde x$ is a feasible point for problem $(1)$ and $\lambda \succcurlyeq 0$. Then

$$
\sum_{i=1}^m \lambda_i f_i(\tilde x) + \sum_{i=1}^p \nu_i h_i(\tilde x) \le 0
$$

since the equality constraints are satisfied (so they sum to zero) and the inequality constraints are satisfied (so they sum to a non-positive number). Thus

$$
L(\tilde x,\lambda, \nu) \le f_0(\tilde x)
$$

But since $g$ is the infimum of $L$, we have

$$
g(\lambda, \nu)= \inf_{x \in \mathcal{D}} L(x,\lambda, \nu) \le L(\tilde x,\lambda, \nu) \le f_0(\tilde x).
$$

Since this holds for any feasible $\tilde x$, it must hold for the optimal point. So

$$
g(\lambda, \nu) \le p^*
$$

When the Lagrangian is unbounded, the lower bound is $-\infty$, which is not very informative. The lower bound is meaningful only when $\lambda \succcurlyeq 0$ and $(\lambda, \nu)\in\textbf{dom }g$, that is when $g(\lambda,\nu)> -\infty$. In this case, we say that $(\lambda,\nu)$ is a **dual feasible** pair. The reason of this terminology will be clear later.



## 6.1.2 Examples

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/fYfpv1vRfQU"></iframe>')

We look now at a few simple examples.



### Least-squares solution of linear equations

Consider the problem

$$
\begin{array}{ll}
\min & x^\top x \\
s.t. & Ax=b,  
\end{array}
$$

where $A\in\mathbb{R}^{p \times n}$. The Lagrangian is $L(\lambda,\nu) = x^\top x + \nu^\top (Ax-b)$. Since this is a convex quadratic function in $x$, we compute the gradient and set it to zero, $\nabla_x L(\lambda,\nu) = 2x + A^\top \nu =0$, from which it follows that $x = -(1/2)A^\top \nu$ . Hence,

$$
g(\nu) = -\frac{1}{4} \nu^\top AA^\top \nu - b^\top \nu,
$$

which is a concave quadratic function. So for any $\nu$ we have

$$
-\frac{1}{4} \nu^\top AA^\top \nu - b^\top \nu \le \inf\{x^\top x : Ax=b \}.
$$



### Standard form LP

Consider the LP in standard form

$$
\begin{array}{ll}
\min & c^\top x \\
s.t. & Ax=b,\\
& x \succcurlyeq 0  
\end{array} \tag{2}
$$

The Lagrangian is $c^\top x + \nu^\top (Ax-b) - \lambda^\top x = -b \nu^\top + (c + A^\top \nu - \lambda)^\top x$. Now, since $L$ is affine we have an unbounded infimum, unless the linear term disappears. Hence

$$
g(\lambda,\nu) = \left\{\begin{array}{ll}-b^\top\nu & A^\top \nu - \lambda +c =0 \\ -\infty & \text{otherwise.}  \end{array}\right. \tag{3}
$$

Hence, in this case the lower bound is non-trivial, and equal to $-b^\top\nu$, only when $(\lambda,\nu)$ satisfy $\lambda \succcurlyeq 0$ and $A^\top \nu - \lambda +c =0$. We will see that this type of result is fairly common.




### Two-way partitioning problem

Consider the non-convex problem

$$
\begin{array}{lll}
\min & x^\top W x & \\
s.t. & x_i^2=1, & i=1,\dots,n 
\end{array}
$$

where $W \in \mathbb{S}^n_+$. This problem consists in partitioning the elements $\{1,\dots,n\}$ in two sets, according to the cost. The matrix coefficient $W_{ij}$ represents the cost of assigning $i$ and $j$ to the same set (both $1$ or $-1$), while $-W_{ij}$ represents the cost of assigning $i$ and $j$ to different sets ($1$ and the other $-1$). The number of assignments is finite, but it grows exponentially with $2^n$, so a solution by enumeration is to be excluded for large $n$.  The Lagrangian is

$$
L(x,\nu) = x^\top W x + \sum_{i-1}^n \nu_i(x_i^2-1) = x^\top (W + \textbf{diag}(\nu))x - \mathbf{1}^\top \nu.
$$

This is a quadratic function. If the quadratic term is positive semidefinite, then we want it to be zero. If the quadratic term has a negative eigenvalue, then we can make the cost unbounded. In summary

$$
g(\lambda,\nu) = \left\{\begin{array}{ll}-\mathbf{1}^\top\nu & W + \textbf{diag}(\nu) \succcurlyeq 0 \\ -\infty & \text{otherwise.}  \end{array}\right.
$$

This dual function provides lower bounds on the optimal value of a difficult problem. For instance, if $\nu = - \lambda_{\min}(W) \mathbf{1}$, then $p^* \ge -\mathbf{1}^\top\nu = n \lambda_{\min}(W)$, where $\lambda_{\min}(W)$ is the minimum eigenvalue of $W$.

Note that this lower bound can be obtained also in another way. The constraints $x_1^2=1$, ..., $x_n^2=1$ imply (but are not implied by) that $\sum_{i=1}^n x_i^2 = n$. If we relax the problem by considering this last constraint, then the modified problem is simply an eigenvalue problem which has solution $n\lambda_{\min}(W)$.




## 6.1.3 Conjugate Functions

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/J8slAaQTuwg"></iframe>')

The Lagrangian dual function is closely related to the so-called conjugate function.

Let $f : \mathbb{R}^n \to \mathbb{R}$. The function $f^∗ : \mathbb{R}^n \to \mathbb{R}^n$, defined as 

$$
f^∗(y) = \sup_{x\in\textbf{dom }f}\left(y^\top x - f(x)  \right),
$$

is called the **conjugate** of the function $f$. The domain of the conjugate function consists of all $y$ for which the supremum is finite, i.e., for which the difference $y^\top x - f(x)$ is bounded above on $\textbf{dom }f$. This definition is illustrated in the figure below.

<div>
<img src="https://drive.google.com/uc?export=view&id=1cUpzRrxFzbMbWbXsFVWp4Ohf1gCYJnGM" width="400"/>
</div>

Figure 6.1. *The conjugate function $f^*(y)$ is the maximum gap between the linear function $yx$ and $f(x)$. Source: page $91$ of $[1]$.*

Obviously $f^∗$ is a convex function, since it is the pointwise
supremum of a family of affine functions of $y$. This is true whether
or not $f$ is convex. We now look at some examples.

*   *Affine function*: $f(x) = ax + b$. As a function of $x$, $yx − ax − b$ is bounded if and only if $y = a$, in which case it is constant. Therefore the domain of the conjugate function $f^∗$ is the singleton $\{a\}$, and $f^∗(a) = −b$.
*    *Negative logarithm*: $f(x) = −\log x$, with $\textbf{dom }f = \mathbb{R}_{++}$. The function $xy+\log x$ is unbounded above if $y \ge 0$. Computing the gradient, the maximum is reached at $x = −1/y$. Therefore, $\textbf{dom }f^∗ = \{y : y < 0\} = −\mathbb{R}_{++}$ and $f^∗(y) = −\log(−y)−1$ for $y < 0$.
*    *Log-determinant*: $f(X) = \log \det X^{−1}$ on $\mathbb{S}^n_{++}$. We skip the proof, but recall that $\log \det X^{−1}$ on $\mathbb{S}^n_{++}$ plays the role of $−\log x$ on $\mathbb{R}^n_{++}$. In fact the conjugate is $f^*(Y) = \log \det (-Y)^{-1}-n$ with $\textbf{dom }f^* = -\mathbb{S}^n_{++}$.
*    *Exponential*: $f(x) = e^x$. The function $xy − e^x$ is unbounded if $y < 0$. For $y > 0$, $xy − e^x$ reaches its maximum at $x = \log y$, so we have $f^∗(y) = y \log y − y$. For $y = 0$, $f^∗(y) = \sup_x −e^x = 0$.
*    *Negative entropy*: $f(x) = x \log x$, with $\textbf{dom }f = \mathbb{R}_{+}$ (and $f(0) = 0$). The function $xy − x \log x$ is bounded above on $\mathbb{R}^n_{+}$ for all $y$, hence $\textbf{dom }f^∗ = \mathbb{R}$. It attains its maximum at $x = e^{y−1}$, and substituting we find $f^∗(y) = e^{y−1}$.


---

**Exercise 6.1:**  Show that the conjugate of the *Inverse* function $f(x) = 1/x$ on $\mathbb{R}_{++}$ is $f^∗(y) = −2(−y)^\frac{1}{2}$, with $\textbf{dom }f^* = -\mathbb{R}_{+}$.

***EDIT THE FILE TO ADD YOUR PROOF HERE***

---

*    *Strictly convex quadratic function*: $f(x) = \frac{1}{2} x^\top Qx$, with $Q \in \mathbb{S}^n_{++}$. The function $y^\top x − \frac{1}{2}x^\top Qx$ is bounded above as a function of $x$ for all $y$. It attains its maximum at $x = Q^{−1}y$, so $f^∗(y) = \frac{1}{2}y^\top Q^{−1}y$.
*    *Log-sum-exp*: $f(x) = \log(\sum^n_{i=1} e^{x_i})$ has conjugate
$$
f^*(y) = \left\{\begin{array}{ll} \sum_{i=1}^n y_i \log y_i & \text{if }y\succcurlyeq 0 \text{ and } \mathbf{1}^\top y =1 \\ \infty & \text{otherwise.}  \end{array} \right.
$$
The proof is omitted.






---

**Exercise 6.2:** Show that the conjugate of the *norm* $f(x)=||x||$, where $||\cdot||$ is a norm on $\mathbb{R}^n$ with dual norm $||\cdot||_*$, is given by

$$
f^*(y) = \left\{\begin{array}{ll} 0 & ||y||_*\le 1 \\ \infty & \text{otherwise.}  \end{array} \right.
$$

***EDIT THE FILE TO ADD YOUR PROOF HERE***

**Exercise 6.3:** Show that the conjugate of the *squared norm* $f(x) = \frac{1}{2}||x||^2$ is $f^*(y) = \frac{1}{2}||y||_*^2$.

***EDIT THE FILE TO ADD YOUR PROOF HERE***

---

Going back to duality, consider an optimisation problem with linear inequality and equality constraints

$$
\begin{array}{ll}
\min & f_0(x) \\
s.t. & Ax \preccurlyeq b\\
& Cx = d. 
\end{array} 
$$

The dual function is given by

$$
\begin{array}{rl}
g(\lambda,\nu) &= \displaystyle\inf_x (f_0(x) + \lambda^\top (Ax-b) + \nu^\top (Cx-d)) \\
&= -b^\top \lambda - d^\top \nu + \displaystyle\inf_x (f_0(x) + (A^\top \lambda +C^\top\nu)^\top x) \\
&= -b^\top \lambda - d^\top \nu - f_0^*(-A^\top \lambda -C^\top\nu),
\end{array}
$$

with domain $\textbf{dom }g = \{(\lambda,\nu) : -A^\top \lambda -C^\top\nu \in \textbf{dom }f_0^*\}$. This is a useful result that can be applied to several classes of problems, as shown below.



### Equality constrained norm minimization

Consider the problem

$$
\begin{array}{ll}
\min & ||x|| \\
s.t. & Ax = b 
\end{array} 
$$
where $||\cdot||$ is any norm. Then from the results above we have
$$
g(\nu) = -b^\top \nu - f_0^*(-A^\top \nu) = \left\{\begin{array}{ll} -b^\top \nu & ||A^\top \nu||_*\le 1 \\ \infty & \text{otherwise.}  \end{array} \right.
$$



### Entropy maximization

Consider the entropy maximization problem

$$
\begin{array}{ll}
\min & \sum_{i=1}^n x_i \log(x_i)\\
s.t. & Ax\preccurlyeq b\\
& \mathbf{1}^\top x = 1.   
\end{array} \tag{4}
$$

From the results above we have

$$
g(\lambda,\nu) = -b^\top \lambda - \nu - \sum_{i=1}^n e^{-a_i^\top \lambda - \nu -1} = -b^\top \lambda - \nu - e^{-\nu-1} \sum_{i=1}^n e^{-a_i^\top \lambda}
$$

where $a_i$ is the $i$-th column of $A$.


# 6.2 Lagrange Dual Problem

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/k40FB9sLtYw"></iframe>')

Since for any $(\lambda,\nu)$ with $\lambda\succcurlyeq 0$, the Lagrange dual function provides us with a lower bound for $p^*$, it is natural to seek the best (maximum) lower bound. This leads to the optimisation problem

$$
\begin{array}{ll}
\max & g(\lambda, \nu) \\
s.t. & \lambda \succcurlyeq 0 
\end{array} \tag{5}
$$

called **Lagrange dual problem** associated to **primal problem** $(1)$. The term **dual feasible** that we have already used before to describe the pairs $(\lambda,\nu)$ with $\lambda \succcurlyeq 0$ and $g(\lambda,\nu) > -\infty$ corresponds to the feasibility of the dual problem $(5)$. We call $(\lambda^*,\nu^*)$ **dual optimal** or **optimal Lagrange multipliers** if they are optimal for the dual problem $(5)$.

Note that the dual problem $(5)$ is always convex (whether the original problem is convex or not) bacause the objective is the maximisation of a concave function and the constraint is convex.




### Lagrange dual of the standard form LP

We have seen above that the domain of the dual, namely $\textbf{dom }g = \{(\lambda,\nu):g(\lambda,\nu) > - \infty\}$, has often dimension smaller than $m+q$. It is often the case that we can describe $\textbf{dom }g$ with a set of linear equality constraints. In this case it is often convenient to include the domain as an explicit constraint to the problem.

We have already seen that the Lagrange dual of the standard form LP $(2)$ is given by $(3)$. So the Lagrange dual problem of the standard form LP is

$$
\begin{array}{ll}
\max & g(\lambda,\nu) = \left\{\begin{array}{ll}-b^\top\nu & A^\top \nu - \lambda +c =0 \\ -\infty & \text{otherwise,}  \end{array}\right. \\
s.t. & \lambda \succcurlyeq 0.  
\end{array}\tag{6}
$$

We can form an equivalent problem by making the constraints explicit, namely

$$
\begin{array}{ll}
\max & -b^\top\nu\\
s.t. & A^\top \nu - \lambda +c =0\\
 & \lambda \succcurlyeq 0  
\end{array}\tag{7}
$$

or equivalently

$$
\begin{array}{ll}
\max & -b^\top\nu\\
s.t. & A^\top \nu  +c \succcurlyeq 0, 
\end{array}\tag{8}
$$

which is an LP in inequality form (note the symmetry). For our definition of optimisation problem $(6)$, $(7)$ and $(8)$ are three equivalent, but different problems. Nonetheless, with abuse of terminology we will indicate either $(7)$ or $(8)$  as the Lagrange dual of $(2)$.

---

**Exercise 6.4:** Find the Lagrange dual problem of the inequality form LP

$$
\begin{array}{ll}
\min & c^\top x \\
s.t. & Ax \preccurlyeq b.  
\end{array} \tag{9}
$$

***EDIT THE FILE TO ADD YOUR PROOF HERE***

---

## 6.2.1 Weak and Strong duality

The optimal value of the Lagrange dual problem, which we indicate as $d^*$ is the best lower bound (**that can be found by Lagrange duality**). Thus, it is always true that 

$$
d^* \le p^*.
$$

This property is called **weak duality** and the difference $p^* - d^*$ is known as **optimal duality gap**. Note that weak duality holds even when the two optimal values are infinity. If the primal is unbounded below, then $p^*=-\infty$ and we must have $d^*=-\infty$ which implies that the dual is infeasible. If the dual is unbounded above, then $d^*=\infty$ and we must have $p^*=\infty$ which implies that the primal is infeasible. 

Weak duality is very important because it easily provides us with certificates about how good an approximate solution is. In fact, since the dual problem is always convex, it can be easility solved even when the primal is not convex and cannot be solved. Then if we find a value of the primal using a heurisitic and that value is very close to $d^*$, then we have a certificate that our primal solution is actually a good one. 

If the equality

$$
d^* = p^*
$$

holds, then we have that **strong duality** holds and, of course, the duality gap is zero. Strong duality does not hold in general, not even when the primal is convex. However, if the problem is convex there are results that establish additional conditions on the problem that, if satisfied, guarantee strong duality. These conditions are called **constraint qualifications**.

One simple constraint qualification for convex problem is **Slater's condition**: *If there exists a feasible $x \in \textbf{int }\mathcal{D}$ (interior of $\mathcal{D}$), such that the inequality constraints hold strictly ($<$) (and the problem is convex), then strong duality holds*.

Such a point, i.e. a feasible point $x \in \textbf{int }\mathcal{D}$ that satifies the inequality constraints strictly, is called **strictly feasible**. 

Slater's condition can be refined. For instance the result hold if $x \in \textbf{relint } \mathcal{D}$ (the interior relative to the affine hull) instead of $x \in \textbf{int }\mathcal{D}$. Or, most importantly, the inequality does not have to hold strictly for linear inequality constraints. This means that for a convex problem with linear inequalities and $\textbf{dom }f_0$ open, strong duality holds if the problem is feasible.

Note that Slater's condition does not simply imply strong duality for convex problems. It also implies that the dual optimal is attained when $d^*>-\infty$, i.e. there exists a dual feasible $(\lambda^*, \nu^*)$ such that $g(\lambda^*, \nu^*) = d^* = p^*$.

Going back to the idea of duality as a certificate of how good our suboptimal solution is, we note that this certification can also be used as a stopping condition in iterative algorithms. For instance, we can compute iteratively suboptimal solutions for the primal and the dual and check how close the optimal values are, i.e. stop when $p^*-d^* < \varepsilon$. If strong duality holds, then we make the tolerance $\varepsilon$ as small as we want.

### Examples

**LP:** For the inequality form LP $(9)$, strong duality holds if the primal is feasible. Of course the same holds for equality form LPs. Since we have seen that the dual of inequality form LPs are equality form LPs, and vice versa, then strong duality holds for LPs if the dual is feasible. We are left with the case in which both primal and dual are unfeasible, and in that case strong duality may fail. This can occur. 

**Entropy maximisation:** For problem $(4)$, strong duality holds if the primal is feasible.

**Quadratic programme:** Consider the optimisation problem

$$
\begin{array}{ll}
\min & x^\top P x \\
s.t. & Ax = b.  
\end{array}
$$

The dual function is $g(\nu) = -\frac{1}{4} \nu^{\top} A P^{-1} A^\top \nu - b^\top \lambda$ and the dual problem is

$$
\begin{array}{ll}
\max & -\frac{1}{4} \nu^{\top} A P^{-1} A^\top \nu - b^\top \nu.
\end{array}
$$

Using Slater's condition, strong duality holds if the primal is feasible, i.e. $p^*=d^*$ as long as $b\in \mathcal{R}(A)$, which means $p^* < \infty$. In fact, for this problem one can show that duality always holds, even when  $p^*=\infty$. This is the case when $b\not \in \mathcal{R}(A)$, which implies that there exists a $z$ such that $A^\top z = 0$, $b^\top z \not = 0$. It follows that the dual function is unbounded above along the line $\{tz : t\in\mathbb{R} \}$ and so $d^* = \infty$ as well. 


**Nonconvex quadratic problem with strong duality:** On some cases strong duality holds for nonconvex problems. This is the case, for instance, of 

$$
\begin{array}{ll}
\min & x^\top A x + 2b^\top x \\
s.t. & x^\top x \le 1.  
\end{array}
$$

where $A\in\mathbb{S}^n$ but $A \not \succcurlyeq 0$, which makes it nonconvex. The dual problem is given by (prove it as an exercise)

$$
\begin{array}{ll}
\max & -b^\top (A+\lambda I)^{\dagger} b - \lambda \\
s.t. & A + \lambda I \succcurlyeq 0\\ 
& b \in \mathcal{R}(A+\lambda I) 
\end{array}
$$

It can be shown that the problem is equivalent to the SDP

$$
\begin{array}{ll}
\max & -t - \lambda \\
s.t. & \left[\begin{array}{cc}A + \lambda I & b\\ b^\top & t \end{array}\right] \succcurlyeq 0 
\end{array}
$$

Despite the primal is not convex, it is possible to show that strong duality holds. This is due to a more general, but difficult to prove, result that establishes that for any optimisation problem with quadratic objective and exactly one quadratic inequality constraint strong duality holds as long as Slater's condition holds. Thus, even though the primal is not convex, we can find the optimal value by solving the convex dual problem.





# 6.3 Geometric Interpretation

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/5NWVLIEaI0U"></iframe>')

Duality can be interpreted and understood in several different ways: multicriterion interpretation, max-min characterisation, saddle-point interpretation, game interpretation and price/tax interpretation. The interested student can read more about this in Chapter $5.4$ of $[1]$, but that material is not part of this course.

Instead, here we focus on the geometric interpretation. For simplicity, we consider a convex problem with objective function $f_0(x)$ and only one inequality constraint $f_1(x) \le 0$ (but this can be extended to any number of inequality and equality constraints). Let $\mathcal{G}=\{(f_1(x),f_0(x)): x\in\mathcal{D}\}$. The optimal value can be expressed in terms of $\mathcal{G}$ as

$$
p^* = \inf \{t : (u,t) \in \mathcal{G},\, u \le 0 \}.
$$

Then the dual function can be written as

$$
\displaystyle g(\lambda) = \inf_{(u,t)\in\mathcal{G}} (t + \lambda u).
$$

Hence, given $\lambda$, we minimize $(\lambda, 1)^\top (u, t)$ over $\mathcal{G}$. This yields a supporting
hyperplane with slope $-\lambda$. The intersection of this hyperplane with the $t$-axis gives $g(\lambda)$. This is shown in the figure below.


<div>
<img src="https://drive.google.com/uc?export=view&id=1ZjaX9oRDQvm-Igfh2vC3LrI_r_GTisnd" width="500"/>
</div>

Figure 6.2. *Geometric interpretation of $g(\lambda)$ as lower bound of $p^*$. Source: page $233$ of $[1]$.*

Solving the dual problem is equivalent to finding the value of $\lambda$ for which the duality gap is minimised. This is shown in the figure below.

<div>
<img src="https://drive.google.com/uc?export=view&id=1ohBvgvmMy2sqS04lzXSaJbE4eyR3l9VG" width="500"/>
</div>

Figure 6.3. *Supporting hyperplanes corresponding to three dual feasible values of $\lambda$, including the optimum $\lambda^*$. Strong duality does not hold; the optimal duality gap $p^* − d^*$ is positive. Source: page $233$ of $[1]$.*

We can also explain strong duality using a geometric interpretation. Consider the set 

$$
\mathcal{A}=\{(u,t) : f_1(x) \le u,\,f_0(x)\le t \text{ for some } x\in\mathcal{D}\}.
$$

Since $\mathcal{A}$ includes all the points in $\mathcal{G}$ as well as the points which are worse, the set $\mathcal{A}$ can be interpreted as a sort of epigraph form of $\mathcal{G}$. Similarly to above, given $\lambda$, we minimize $(\lambda, 1)^\top (u, t)$ over $\mathcal{A}$. This yields a supporting hyperplane with slope $-\lambda$. The intersection of this hyperplane with the $t$-axis gives $g(\lambda)$. The figure below shows the set $\mathcal{A}$.

<div>
<img src="https://drive.google.com/uc?export=view&id=14qns6GEzP3U5NexiH6-1IIQNE67cwnfr" width="500"/>
</div>

Figure 6.4. *Source: page $234$ of $[1]$.*

From the figure one can see that strong duality holds if and only if there exists a nonvertical supporting hyperplane to $\mathcal{A}$ at its boundary point $(0, p^*)$. For convex problems $\mathcal{A}$ is convex, hence it has a (possibly vertical) supporting hyperplace at $(0, p^*)$. Then Slater's condition simply requires the existence of a point $(\tilde u, \tilde t)\in\mathcal{A}$ such that $\tilde u < 0$, i.e. a piece of $\mathcal{A}$ is on the left of the $t$-axis.  If so, the supporting hyperplane at $(0, p^*)$ cannot be vertical (or it would cut the left part of $\mathcal{A}$ off).


# 6.4 Optimality Conditions

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/5v198qjK-iI"></iframe>')

Consider problem $(1)$. We do not assume that the problem is convex.

Suppose that strong duality holds and the primal and dual optimal values $x^*$ and $(\lambda^*,\nu^*)$ are attained.  Then

$$
\begin{array}{rl}
f_0(x^*) &= g(\lambda^*,\nu^*) = \inf_x \left( f_0(x) + \sum_{i=1}^m \lambda_i^* f_i(x) +\sum_{i=1}^p \nu_i^* h_i(x) \right)
\\&\le f_0(x^*) + \sum_{i=1}^m \lambda_i^* f_i(x^*) +\sum_{i=1}^p \nu_i^* h_i(x^*) \le f_0(x^*),
\end{array}
$$

where the first equality is from strong duality, the second from the definition of dual function and the first inequality is from the infimum being smaller or equal than at any other point. The last inequality follows from the fact that since $x^*$ is optimal, then it is feasible and so the equality constraints sum to zero and the inequality constraints sum to a negative number. But since the first term of the chain is also the last term of the chain, it follows that all the inequality must hold with the equality sign. This has two important consequences: first, we conclude that $x^*$ minimizes $L(x, \lambda^*, \nu^*)$ over $x$. Second, it follows that

$$
\sum_{i=1}^m \lambda_i^* f_i(x^*) =0.
$$

Since each term of the sum is nonpositive, we conclude that 

$$
\lambda_i^* f_i(x^*) =0, \qquad i=1,\dots,m.
$$

This condition is called **complementary slackness** and implies that the $i$-th optimal Lagrange multiplier is zero unless
the $i$-th constraint is active ($f_i(x^*)=0$) at the optimum.


Assume now that all the functions are differentiable (without assuming convexity). Since $x^*$ minimises $L(x, \lambda^*, \nu^*)$ over $x$, it follows that its gradient must be zero at $x^*$. Thus we have the following necessary conditions for the optimal $x^*$ and $(\lambda^*,\nu^*)$ with zero duality gap:

*   $f_i(x^*)\le 0,\qquad i=1,\dots,m$
*   $h_i(x^*)= 0,\qquad i=1,\dots,p$
*   $\lambda_i^* \ge 0,\qquad i=1,\dots,m$
*   $\lambda_i^* f_i(x^*)=0,\qquad i=1,\dots,m$
*   $\nabla f_0(x^*) +  \sum_{i=1}^m \lambda_i^* \nabla f_i(x^*) + \sum_{i=1}^p  \nu_i^* \nabla h_i(x^*) =0$

which are called **Karush-Kuhn-Tucker (KKT)** conditions.

To summarize, for any optimisation problem with differentiable objective and constraint functions for which strong duality holds, any pair of primal and dual optimal points must satisfy the KKT conditions. 

**When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal.**

To see this, let $\tilde x$, $\tilde\lambda$ and $\tilde \nu$ satisfy the KKT conditions. Note that the first two conditions state that $\tilde x$ is primal feasible.
Since $\tilde \lambda_i \ge 0$, $L(x,\tilde\lambda,\tilde \nu)$ is convex in $x$; the last KKT condition states that $\tilde x$ minimizes $L(x,\tilde\lambda,\tilde \nu)$
over $x$. From this we conclude that

$$
g(\tilde\lambda,\tilde \nu) = L(\tilde x,\tilde\lambda,\tilde \nu) 
= f_0(\tilde x)
$$

because of complementary slackness and the fact that the equality constraints hold. This shows that $\tilde x$, $\tilde\lambda$ and $\tilde \nu$ have zero duality gap and are primal and dual optimal. In summary, for any convex optimisation problem with differentiable objective and constraint functions, any points that satisfy the KKT conditions are primal and dual optimal, and have zero duality gap.

The KKT conditions generalize the optimality condition $\nabla f_0(x)=0$ for unconstrained problems and play an important role in optimisation. Many algorithms for convex optimisation are conceived as methods for solving the KKT conditions.

## 6.4.1 Solving the Primal Problem via the Dual

We mentioned that if strong duality holds then $x^*$ minimises $L(x, \lambda^*, \nu^*)$ over $x$, where $(\lambda^*, \nu^*)$ is a dual optimal solution. This fact sometimes allows us to compute a primal optimal solution from a dual optimal solution. Suppose we have strong duality and an optimal $(\lambda^*, \nu^*)$ is known.
Suppose that the minimizer of $L(x, \lambda^*, \nu^*)$, i.e. the solution of

$$
\min_x \,\,\, f_0(x) + \sum_{i=1}^m \lambda_i^* f_i(x) +\sum_{i=1}^p \nu_i^* h_i(x) \tag{10}
$$

is unique (this occurs, for instance, when $L(x, \lambda^*, \nu^*)$ is strictly convex in $x$). Then if the solution of $(10)$ is primal feasible, it must be
primal optimal; if it is not primal feasible, then no primal optimal point can exist, i.e., we can conclude that the primal optimum is not attained.

This method is useful when solving the dual problem and then solving $\nabla L(x, \lambda^*, \nu^*) = 0$ is easier than solving the primal problem (because the primal, for instance, is not convex).

---

**Example 6.1:** (*Entropy maximisation*) Consider problem $(4)$ and its dual 

$$
\begin{array}{ll}
\max & -b^\top \lambda - \nu - e^{-\nu-1} \sum_{i=1}^n e^{-a_i^\top \lambda} \\
s.t. & \lambda \succcurlyeq 0 
\end{array} 
$$

Suppose that Slater's condition holds and that we have solved the dual problem and found  $(\lambda^*, \nu^*)$. Then the Lagrangian at $(\lambda^*, \nu^*)$ is

$$
L(x, \lambda^*, \nu^*) = \sum_{i=1}^n x_i \log x_i +\lambda^{*\top}(Ax-b) + \nu^*(\textbf{1}^\top x-1)
$$

which is strictly convex and bounded below. So we can compute the minimum by solving $\nabla L(x, \lambda^*, \nu^*) = 0$, obtaining

$$
x_i^* = e^{-a_i^\top \lambda^* + \nu^* + 1}, \qquad i=1,\dots, n
$$

If $x^*$ is primal feasible, it must be the optimal solution of the primal problem. If $x^*$ is not primal feasible, then we can conclude that the primal optimum is not attained.

---



# 6.5 Perturbation and Sensitivity Analysis

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/5l9aOdeL7Wc"></iframe>')

Duality provides useful information about the sensitivity of the optimal value with respect to perturbations of certain constraints. This is very useful because in practical scenarios the values of the parameters in the problems are often approximations.

Consider the perturbed version of problem $(1)$, namely

$$
\begin{array}{lll}
\min & f_0(x) &\\
s.t. & f_i(x) \le u_i, & i = 1,\dots,m\\
& h_i(x) = v_i,  & i = 1,\dots,p 
\end{array} 
$$

This problem coincides with $(1)$ when $u=v=0$. We also give an intuitive explanation of the meaning of the perturbed inequality constraints. When $u_i$ is positive it means that we have relaxed the $i$-th inequality constraint; when $u_i$ is negative, it means that we have tightened the constraint. Let us define $p^*(u,v)$ as the optimal value of the perturbed problem. Clearly $p^*(0,0)=p^*$. If $p^*(u,v)=\infty$, then the perturbed problem is unfeasible. Obviously, if the original problem is convex, so is the perturbed problem. 


Assume now that strong duality holds for the unperturbed problem and that the dual optimal is attained (for instance this is the case if the unperturbed problem is convex and Slater's condition holds). Let $(\lambda^*,\nu^*)$ be optimal for the dual of the unperturbed problem. Note that if $g(\lambda,\nu)$ is the dual function of the original problem, then $g(\lambda,\nu)-\lambda u - \nu v$ is the dual function of the perturbed problem. By weak duality of the perturbed problem

$$
p^*(u,v) \ge g(\lambda^*,\nu^*) -\lambda^* u - \nu^* v
$$

But by strong duality of the original problem we have $g(\lambda^*,\nu^*)= p^*(0,0)$.

Thus

$$
p^*(u,v) \ge p^*(0,0)-\lambda^{*\top}u -\nu^{*\top}v. \tag{11}
$$



From this we have the following sentitivity conclusions:

*   if $\lambda_i^*$ is large: $p^*$ increases greatly if we thighten contraint $i$ ($u_i< 0$)
*   if $\lambda_i^*$ is small: $p^*$ does not decrease much if we loosen contraint $i$ ($u_i> 0$)
*   if $\nu_i^*$ is large and positive: $p^*$ increases greatly if we take $v_i< 0$
*   if $\nu_i^*$ is large and negative: $p^*$ increases greatly if we take $v_i> 0$
*   if $\nu_i^*$ is small and positive: $p^*$ does not decrease much if we take $v_i> 0$
*   if $\nu_i^*$ is small and negative: $p^*$ does not decrease much if we take $v_i< 0$


The inequality $(11)$, and the conclusions listed above, give a lower bound on the perturbed optimal value, but no upper bound. For this reason the results are not symmetric with respect to loosening or tightening a constraint. For example, suppose that $\lambda_i^*$ is large, and we loosen the $i$-th constraint a bit (i.e., take $u_i$ small and positive). In this case the inequality $(11)$ is not useful; it does not, for
example, imply that the optimal value will decrease considerably.








Suppose now that $p^*(u,v)$ is differentiable at $u = 0$, $v = 0$. Then, provided strong
duality holds, we have that

$$
\lambda_i^* = -\frac{\partial p^*(0,0)}{\partial u_i} \qquad \nu_i^* = -\frac{\partial p^*(0,0)}{\partial v_i}.
$$

These expressions follow from $(11)$. In fact, let $u = t e_i$ with $e_i$ the $i$-th unit vector, and $v=0$, then if $t>0$

$$
\frac{p^*(t e_i,0) - p^*(0,0)}{t} \ge -\lambda_i^{*}
$$

and taking the limit with $t\to 0$ we have

$$
\frac{\partial p^*(0,0)}{\partial u_i} \ge -\lambda_i^{*}.
$$

Doing the same for $t < 0$ we have the oppositve inequality. Doing the same for $v$ gives us the other derivative.

So when the differentiability assumption holds, the optimal Lagrange multipliers are exactly the local sensitivities of the optimal value with respect to constraint perturbations. In contrast to the nondifferentiable case, this interpretation is symmetric: tightening the $i$-th inequality constraint by
a small amount (i.e., taking $u_i$ small and negative) yields an increase in $p^*$ of
approximately $−\lambda^*_i u_i$; loosening the $i$-th constraint by a small amount (i.e., taking $u_i$
small and positive) yields a decrease in $p^*$ of approximately $\lambda^*_i u_i$.

The $i$-th optimal Lagrange multiplier tells us how active the constraint is: If $\lambda_i^*$ is small, it means that the constraint can be loosened or tightened a bit without much effect on the optimal value; if $\lambda_i^*$ is large, it means that if the constraint is loosened or tightened a bit, the effect on
the optimal value will be great.

For instance, consider the problem of designing a circuit in which the constraints are power and surface with $\lambda_1^*=0.1$ associated to the power constraint and $\lambda_2^*=100$ associated to the surface constraint. Then we see that if we get more or less power, we will not really improve or worsen the optimal design. However, if we are given less surface, we will definitely have a much worse design.

# 6.6 Duality and Problem Reformulations

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/1MHTinR_aXg"></iframe>')

We already know that an optimisation problem can be formulated in several equivalent ways. From each way we can obtain a dual. It is important to stress that these duals are in general completely different. In other words, duals of the same primal written in different ways are not necessarily or obviously equivalent.

This motivates us to explore primal problem reformulations for the sake of obtaining a better dual. One can consider the following types of reformulations:
*    Introducing new variables and associated equality constraints.
*    Replacing the objective function with an increasing function of the original objective.
*    Making explicit constraints implicit, i.e., incorporating them into the domain.

We now see a few examples.

## 6.6.1 Introducing New Variables

Consider the unconstrained problem

$$
\begin{array}{ll}
\min & f_0(Ax+b).  
\end{array}
$$

Its Lagrangian dual function is $g = \inf_x f_0(Ax+b) = p^*$, i.e. the dual is just a constant equal to the optimal value. This is not useful nor informative. Let us now consider the equivalent problem

$$
\begin{array}{ll}
\min & f_0(y) \\
s.t. & Ax +b = y.  
\end{array} \tag{12}
$$

The dual problem is

$$
\begin{array}{ll}
\max & b^\top \nu - f_0^*(\nu) \\
s.t. & A^\top \nu = 0.  
\end{array} \tag{13}
$$

This dual is not anymore trivial and can be used for finding lower bounds or for sensitivity analysis.


---

**Exercise 6.5:** Show that $(13)$ is the dual of $(12)$.

***EDIT THE FILE TO ADD YOUR PROOF HERE***

**Exercise 6.6:** Consider the problem

$$
\begin{array}{ll}
\min & \displaystyle \log \left(\sum_{i=1}^m e^{a_i^\top x+b_i}\right)  
\end{array}
$$

By introducing  the variable $y=Ax+b$ we obtain the equivalent problem

$$
\begin{array}{ll}
\min & \displaystyle \log \left(\sum_{i=1}^m e^{y_i}\right) \\
s.t. & Ax +b = y.  
\end{array}
$$

Show that the dual is given by

$$
\begin{array}{ll}
\max & \displaystyle b^\top \nu - \sum_{i=1}^m \nu_i \log \nu_i \\
s.t. & \mathbf{1}^\top \nu = 1\\ 
& A^\top \nu =0 \\ 
& \nu \succcurlyeq 0.
\end{array}
$$

***EDIT THE FILE TO ADD YOUR PROOF HERE***

**Exercise 6.7:** Consider the problem

$$
\begin{array}{ll}
\min & ||Ax -b||.
\end{array}
$$

The dual here is the trivial $p^*$. Determine an equivalent primal and a nontrivial dual.


***EDIT THE FILE TO ADD YOUR PROOF HERE***

---

## 6.6.2 Implicit Constraints

The next simple reformulation that we study is the inclusion of some of the constraints in the objective function, by modifying the objective function to be infinite when the constraint is violated.

Consider the LP with box contraints $l \preccurlyeq x \preccurlyeq u$

$$
\begin{array}{ll}
\min & c^\top x \\
s.t. & Ax = b\\ 
& l \preccurlyeq x \preccurlyeq u.
\end{array}
$$

The dual of this problem is

$$
\begin{array}{ll}
\max & -b^\top \nu - \lambda_1^\top u + \lambda_2^\top l \\
s.t. & A^\top \nu +\lambda_1 -\lambda_2 +c = 0\\ 
& \lambda_1 \succcurlyeq 0, \quad   \lambda_2 \succcurlyeq 0.
\end{array}
$$



An alternative dual can be obtained as follows. We first define the equivalent primal as

$$
\begin{array}{ll}
\min & f_0(x) \\
s.t. & Ax = b
\end{array}
$$

where 

$$
f_0(x) = \left\{\begin{array}{ll} c^\top x & l \preccurlyeq x \preccurlyeq u \\ \infty & \text{otherwise}\end  {array}\right.
$$

The dual function is given by

$$
g(\nu) = \inf_{l \preccurlyeq x \preccurlyeq u} (c^\top x +\nu^\top (Ax-b)) = -b^\top \nu - u^\top (A^\top \nu + c)^- +l^\top (A^\top \nu + c)^+
$$

where $y_i^+ = \max\{y_i,0\}$ and $y_i^- = \max\{-y_i,0\}$. Thus the dual is

$$
\max \quad -b^\top \nu - u^\top (A^\top \nu + c)^- +l^\top (A^\top \nu + c)^+
$$

which is an unconstrained (piece-wise linear) problem.

# 6.7 Generalised Inequalities

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/Mfq3fbqZ2vc"></iframe>')

Duality extends straightforwardly to problems with generalised inequalities, i.e.

$$
\begin{array}{lll}
\min & f_0(x) &\\
s.t. & f_i(x) \preccurlyeq_{K_i} 0, & i = 1,\dots,m\\
& h_i(x) = 0,  & i = 1,\dots,p 
\end{array} 
$$

where $K_i \subseteq \mathbb{R}^{k^i}$ are proper cones.

The Lagrangian and dual function are identical to the standard case, with the exception that the $\lambda_i$'s are now vectors in $\mathbb{R}^{k_i}$ instead of scalars. As in a problem with scalar inequalities, the dual function gives lower bounds on $p^*$. For a problem with scalar inequalities, we require $\lambda_i \ge 0$. Here the nonnegativity requirement on the dual
variables is replaced by the condition

$$
\lambda_i \succcurlyeq_{K_i^*} 0, \qquad i= 1,\dots,m,
$$

where $K_i^*$ denotes the dual cone of $K_i$. In other words, the Lagrange multipliers associated with inequalities must be dual nonnegative. In particular note that $f_i(x) \preccurlyeq_{K_i} 0$ and $\lambda_i \succcurlyeq_{K_i^*} 0$ imply $f_i(x) \lambda_i \le 0$. From this property all the rest of the duality theory follows: weak duality, strong duality with constraint qualification, complementary slackness, KKT conditions, perturbation and sensitivity analysis (with the only change that any inequality involving only $f_i(x)$ is $\preccurlyeq_{K_i}$ instead of $\le$ and any inequality involving only $\lambda_i$ is $\succcurlyeq_{K_i^*}$ instead of $\ge$).

---

**Example 6.2:** (*Semidefinite programme in inequality form*) Consider the SDP

$$
\begin{array}{ll}
\min & c^\top x \\
s.t. & x_1 F_1 + \cdots + x_n F_n \preccurlyeq G 
\end{array} 
$$

with $F_1$, ..., $F_n$, $G\in\mathbb{S}^k$. Define the Lagrange multiplier as the matrix $Z \in\mathbb{S}^k$. The Lagrangian is

$$
L(x,Z) = c^\top x + \textbf{tr}(Z(x_1 F_1 + \cdots + x_n F_n - G))
$$

and the dual function is given by

$$
g(Z) = \inf_x L(x,Z)  = \left\{\begin{array}{lll}-\textbf{tr}(GZ) & \textbf{tr}(F_iZ)+c_i = 0& i=1,\dots,n\\-\infty & \text{otherwise.}&\end{array}\right.
$$

So the dual problem is 

$$
\begin{array}{llll}
\max & -\textbf{tr}(GZ) &&\\
s.t. & Z \succcurlyeq 0, & \textbf{tr}(F_iZ)+c_i = 0& i=1,\dots,n.
\end{array} 
$$

If the primal SDP problem is strictly feasible (i.e. $x_1 F_1 + \cdots + x_n F_n - G \prec 0 $), then $p^* = d^*$.

---

# End of CHAPTER 6