# Transforming convex optimization problems$ 
% Text formating
\newcommand {\th}{^{th}} % for ith, jth, etc. 
% General Latex Definitions
\newcommand {\vec}{\mathbf} % Vector variable
\newcommand {\mat}{} % Matrix variable
\newcommand {\set}{\mathcal} % Set name
\newcommand{\invs}{^{-1}}
\newcommand{\trans}{^ \top} % Vector or matrix transpose notation (\intercal is an alternative)
\newcommand{\suchthat}{\mid} %Alternative: \mathrel{} \middle| \mathrel{} 
\newcommand {\definedas}{:=}
\newcommand {\arctan}{\text{tan}^{-1}}
\newcommand {\abs}[1]{|#1|}
\newcommand {\for}{\text{for}\;}
\newcommand {\and}{\quad \text{and}\quad}
\newcommand {\norm}[1]{\left\lVert#1\right\rVert}
\newcommand {\onenorm}[1]{\norm{#1}_{1}}
\newcommand {\twonorm}[1]{\norm{#1}_{2}}
\newcommand {\pnorm}[1]{\norm{#1}_{p}}
\newcommand {\inftynorm}[1]{\norm{#1}_{\infty}}
% Sets
\newcommand {\reals}{\mathbb{R}}
\newcommand {\realsn}{\reals^{n}}
\newcommand {\positivereals}{\reals_{>0}}
\newcommand {\integers}{\mathbb{Z}}
\newcommand \squarematrices[1][n]{\reals^{#1 \times #1}}
\newcommand \symmetricmatrices[1][n]{\mathbb{S}^{#1}}
\newcommand \pdmatrices[1][n]{\symmetricmatrices[#1]_{++}}
\newcommand \psdmatrices[1][n]{\symmetricmatrices[#1]_{+}}
% Common vectors. If a value is a variable don't use the "\"
\newcommand {\x}{\vec x}
\newcommand {\y}{\vec y} 
\newcommand {\u}{\vec u}
\newcommand {\v}{\vec v}
\newcommand {\onevec}{\mathbb{1}}
% Common Matrices
\newcommand {\A}{\mat A} 
\newcommand {\B}{\mat B}
\newcommand {\X}{\mat X} 
\newcommand {\Y}{\mat Y}
\newcommand {\I}{\mat I} % Identity
% Matrix shortcut
\newcommand {\beginmatrix}{\begin{bmatrix}}
\newcommand {\endmatrix}{\end{bmatrix}}
\newcommand {\beginalign}{\begin{align}}
\newcommand {\endalign}{\end{align}}
% Calculus
\newcommand {\derive}[2]{\frac{d#1}{d#2}}
\newcommand {\ddx}{\derive{}{x}}
\newcommand {\ddt}{\derive{}{t}}
\newcommand {\dxdt}{\derive{x}{t}}
\newcommand {\dydt}{\derive{y}{t}}
\newcommand {\dfdx}{\derive{f}{x}}
\newcommand {\del}{\nabla}
\newcommand {\hessian}{\del^2}
% Convex Optimizations
\newcommand \convexcombo[2]{\theta #1 + (1 - \theta)#2}
\newcommand \minimize[1]{\underset{#1}{\text{minimize}}\quad} % Usage: \minimize{\x \in \reals}
\newcommand {\subjectto}{\text{subject to}\quad}
% Exponent Taylor Series Definition
\newcommand \exponentsum[1]{\sum^\infty_{k=1} \frac{#1^k}{k!}}
% Matrix shortcuts
\newcommand \diagmatrix[2]{
\begin{bmatrix} 
#1 & & \\
& \ddots & \\
& & #2 \\
\end{bmatrix}}$

## (a) [10 points] LP as special case of SDP

Given $A\in\reals^{m\times n}$, $b\in\reals^{m}$, $c\in\realsn$, rewrite the LP

$$\minimize{x\in\realsn} c^{\top}x\\
\text{subject to}\quad Ax \preceq b,$$

as the SDP

$$\underset{x\in\mathbb{R}^{n}}{\text{minimize}}\quad c^{\top}x\\
\text{subject to}\quad F(x) \succeq 0,$$

that is, write the matrix $F(x)$ appearing in the LMI constraint of the SDP in terms of the data of the LP.

### Answer

Let $\vec a_i$ be the $i\th$ column vector of $A$ and $(\vec a_i)_j$ be the $j\th$ component of $\vec a_i$. 

The inequality $A\x \preceq \vec b$ is equivalent to $\vec a_i \trans \x   - b_i \leq 0 $, so $b_i - \vec a_i \trans \x \geq 0, \quad \forall i \in 1, \cdots, n$. 

Let us now take the following diagonal matrix, which must be positive semidefinite, because the diagonal values, which are the eigenvalues, are all nonnegative. 

$$\diagmatrix{b_1-\vec a_1\trans \x}{b_n-\vec a_n\trans \x}$$

Thus, we can rewrite the inequality constraint as

$$F(\x) = \diagmatrix{b_1}{b_n} + \sum_{i=1}^{n} x_i \diagmatrix{-(\vec a_1)_i}{-(\vec a_n)_i} \succeq 0$$ 

## (b) [10 points] Second-order Cone Programming as special case of Semi-Definite Programming

Given $\vec f\in\realsn$, $A_i\in\reals^{(n_i-1)\times n}$, $\vec b_i\in\reals^{n_i-1}$, $c_{i}\in\reals^{n}$, $d_i\in\reals$, rewrite the SOCP 

$$\minimize{x\in\realsn} \vec f^{\top} \x \\
\subjectto \twonorm{A_i \x + \vec b_i} \;\leq\; \vec c_i\trans \x + d_i, \quad i=1,...,m,$$

as the SDP

$$\minimize{x\in\realsn} \vec f\trans \x \\
\subjectto F(x) \succeq 0,$$

that is, write the matrix $F(x)$ appearing in the LMI constraint of the SDP in terms of the data of the SOCP.

(Hint: Use the Schur Complement Lemma in Lecture 8, p. 10)

### Answer

#### First, show $\twonorm {\x}^2 \leq t^2$ is convex. 

_Proof:_
$$\beginalign 
& \twonorm {\x} \leq t \\
\implies &\twonorm{\x}^2 \leq t^2 \\
\implies &\frac{1}{t} \twonorm{\x}^2 \leq t \\
\implies &t - \frac{1}{t} \twonorm{\x}^2 \geq 0
\endalign$$

We now note that 
$$\beginalign  
t - \frac{1}{t} \twonorm \x ^2 
&= t - \x\trans \frac{1}{t} I \x \\
&= t - \x\trans (tI)\invs \x \\
&= C - B\trans A\invs B \\
&= S
\endalign $$

Where $S = t - \x\trans (tI)\invs \x $ is the Schar complement of the the matrix.

$$ X = \beginmatrix 
A & B \\ B\trans & C
\endmatrix
=
\beginmatrix 
tI & \x \\ \x\trans & t
\endmatrix $$ 

Furthermore, we know that $S = t - \frac{1}{t} \twonorm{\x}^2 \geq 0$, and also that $A = tI \succeq 0$ is positive semidefinite, $\forall t \geq 0$. By the Schar Complement Lemma, we then claim that $X \succeq 0 $ (i.e. $X$ is positive semidefinite). 

Finally, we can decompose $X$ into a linear combination of matrices with coefficients $t, x_1, \dots, x_n$: 

$$ X = t \beginmatrix I & \vec 0 \\ \vec 0 \trans & 1 \endmatrix 
+ x_1 \beginmatrix \huge {0} & \beginmatrix 1 \\ 0 \\ \vdots \\ 0 \endmatrix \\
\beginmatrix 1 & 0 & \cdots & 0 \endmatrix & 
0
\endmatrix
+ x_2 \beginmatrix \huge {0} & \beginmatrix 0 \\ 1 \\ \vdots \\ 0 \endmatrix \\
\beginmatrix 0 & 1 & \cdots & 0 \endmatrix & 
0
\endmatrix
+ \dots
+ x_n \beginmatrix \huge {0} & \beginmatrix 0 \\ 0 \\ \vdots \\ 1 \endmatrix \\
\beginmatrix 0 & 0 & \cdots & 1 \endmatrix & 
0
\endmatrix
\succeq 0
$$

This, then, is clearly a Linear Matrix Inequality of the form 

$$t F_0 + x_1 F_1 + \dots + x_n F_n \succeq 0$$

Therefore we have converted $\twonorm {\x} \leq t $ to a linear matrix inequality.

#### Show that $\twonorm{A_i \x + \vec b_i} \;\leq\; \vec c_i\trans \x + d_i$ is a SDP constraint. 

We have shown that $\twonorm\y \leq t$ is equivilant to $ F_0 + x_1 F_1 + \dots + x_n F_n \succeq 0$, so we an plug in $\y = A\x + \vec b$ and $t = \vec c \trans \x + d$. We get
 
$$ 
\beginalign
(\vec c \trans \x + d) F_0 + ((A \x)_1 + b_1) F_1
+ \dots + ((A \x)_n + b_n) F_n &\succeq 0 \\
c_1 x_1 F_0 + \dots + c_n x_n F_n + d F_0 
+ a_1 \x_1 F_1 + b_1 F_1 + \dots + a_n \x_n F_n + b_n F_n &\succeq 0 \\
(d F_0 + b_1 F_1 + \dots + b_n F_n ) + x_1 (c_1 F_0 + a_1 F_1) + \dots + x_n (c_n F_n + a_n F_n) & \succeq 0 \\
G_0 + x_1 G_1 + \dots + x_n G_n & \succeq 0
\endalign $$

Therefore $\twonorm{A_i \x + \vec b_i} \;\leq\; \vec c_i\trans \x + d_i$ is a SDP constraint. 

## (c) [30 points] GP as convex optimization problem 

The standard form of GP, given by

$$
\renewcommand \exp[1]{\text{exp}(#1)}
\minimize{\x \in\positivereals^n}f_{0}(x)\\
\text{subject to}\quad f_{i}(x) \leq 1, \quad i=1,...,m, \\
\qquad\qquad h_{j}(x) = 1, \quad j=1,...,p,$$

where $f_{0}, f_{1}, ..., f_{m}$ are posynomials, and $h_{1}, ..., h_{p}$ are monomials, does not look like a convex optimization problem. But we can transform it to a convex optimization problem, by logarithmic change-of-variable $y_{i} = \log x_{i}$ (so $x_{i} = \exp{y_i}$), followed by applying $\log(.)$ to the objective function and to the both sides of the constraints:

$$ \minimize{y\in\realsn} \log(f_{0}(\exp{y}))\\
\subjectto \log(f_{i}(\exp{y})) \leq 0, \quad i=1,...,m, \\
\qquad\qquad \log(h_{j}(\exp{\y})) = 0, \quad j=1,...,p,$$

where $\exp{y}$ denotes elementwise exponential of the vector $y\in\mathbb{R}^{n}$. Clearly, the above and the original problems are equivalent. 

### (c.1) (10 points) 

To understand why the transformed problem is convex, consider the simple case: $m=p=1$, and that

$$f_{0}(\x) =  \sum_{k=1}^{K_{0}}\alpha_{k}x_{1}^{\beta_{1,k}}...x_{n}^{\beta_{n,k}}, \qquad
f_{1}(x) =
 \sum_{k=1}^{K_{1}}a_{k}x_{1}^{b_{1,k}}...x_{n}^{b_{n,k}}, \qquad 
h_{1}(x) = c x_{1}^{d_{1}} x_{2}^{d_{2}} ... x_{n}^{d_{n}}, \qquad 
\alpha_{k}, a_{k}, c> 0.$$

Prove that the transformed problem has the form 

$$\minimize{y\in\realsn} \log\left(\displaystyle\sum_{k=1}^{K_{0}}\exp{\vec p_{k}\trans \y + q_{k}}\right)\\
\subjectto
\log\left(\displaystyle\sum_{k=1}^{K_{1}}\exp{\vec r_{k}\trans \y + s_{k}}\right) \leq 0,\\
\quad \u^{\top}\y = t.$$

In other words, derive the transformed problem data $p_{k}, q_{k}, r_{k}, s_{k}, u, t$ as function of the original problem data: $\alpha_{k}, \beta_{1,k}, ..., \beta_{n,k}, a_{k}, b_{1,k}, ..., b_{n,k}, c, d_{1}, ..., d_{n}$.

#### Answer

Let's begin by examining the inside of the summation, we'll define 

$$
g(\x) = \alpha_{k}x_{1}^{\beta_{1,k}}...x_{n}^{\beta_{n,k}}
$$

This gives us

$$
\begin{align}
g(\exp{\y}) 
&= \alpha_{k}\exp{y_{1}}^{\beta_{1,k}}...\exp{y_{n}}^{\beta_{n,k}} \\
&= \exp{q_k}\exp{\beta_{1,k} y_1}...\exp{\beta_{n,k} y_{n}} &&( \text{let } a_k = e^{q_k}) \\
&= \exp{q_k + \beta_{1,k} y_1 + \cdots + \beta_{n,k} y_n}
\end{align}
$$

We can now take $\vec p\trans = \beginmatrix \beta_{1, k} & \cdots & \beta_{n, k} \endmatrix$, so the above expression reduces to 

$$g(\exp{\y}) = \exp{q_k + \vec p \trans \y}$$

Now, we plug into the summation.
    
$$ f_{0}(\exp{\y}) = \sum_{k=1}^{K_{0}}g(\exp{\y}) = \sum_{k=1}^{K_{0}} \exp{q_k + \vec p \trans \y}$$

Therefore 

$$\log(f_0(\exp{\y}) = \log\left(\sum_{k=1}^{K_{0}} \exp{q_k + \vec p \trans \y}\right) $$

By the same process, we get

$$\log(f_{1}(\exp{\x})) =
 \log\left(\sum_{k=1}^{K_{1}}\exp{\vec r_k\trans \y + s_{k}} \right)
 $$

if we let $a_k = \exp{s_k}$ and $\vec r \trans = \begin{bmatrix} b_{1,k} & \cdots & b_{n,k}\end{bmatrix}$.

Therefore the inequality $f_1(\x) \leq 1$ transforms into 

$$ \beginalign 
&\log(f_1(\exp{\y}) \leq \log(1) \\
&\log\left(\sum_{k=1}^{K_{1}}\exp{\vec r_k\trans \y + s_{k}} \right) \leq 0
\endalign
$$ 

Now, for the equality conditions:

$$\beginalign
h_1(x) &= c x_1^{d_1}... x_n^{d_n} \\
h_1(\exp{\y}) &= c\, \exp{y_1}^{d_1}... \exp{y_n}^{d_n} \\
&= \exp{-t}\exp{d_1 y_1}... \exp{d_n y_n} &&( \text{let } c = e^{-t}) \\
&= \exp{d_1 y_1  + ... + d_n y_n - t} \\
&= \exp{\u\trans \y - t} \\
\endalign$$

Which gives us

$$\log(h_1(\exp{\y})) = \u\trans \y - t $$

Therefore $h_1(\x) = 1 $ transforms into

$$
\beginalign
\log(h_1(\exp{\y}))  &= \log(1) \\ 
\u\trans \y - t &= 0 \\
\u\trans \y &= t
\endalign
$$

### (c.2) (20 points)
Prove that the transformed problem derived in part (c.1) is indeed a convex optimization problem.

(Hint: First show that log-sum-exp is a convex function using second order condition. Then think operations preserving function convexity.)

$\newcommand {\z}{\vec z}$
Let $g(\x)$ be the log-sum-exp function

$$g(\x) = \log(e^{x_1} + \cdots + e^{x_n})$$

To show that $g$ is convex, we find the Hessian $\hessian g(\x)$. To simplify, we will use $z_i = e^{x_i}$. First, we find the first partial derivative with respect to an arbitrary element of $\x$:

$\newcommand {\pp}[2]{\frac{\partial #1}{\partial #2}}
\newcommand {\secondpp}[2]{\frac{\partial #1}{\partial #2}}$
$$ \pp{g}{x_j} = \frac{z_j}{\onevec\trans\z}$$

Then we take a second partial derivative with another arbitrary element of $\x$:

$$\beginalign
\pp{^2g}{x_i\partial x_j} 
&= \pp{}{x_i}\left(\frac{z_i}{\onevec\trans \z}\right) \\
&= \pp{}{x_i}\left(\frac{1}{\onevec\trans\z}\right)z_j + \frac{1}{\onevec\trans\z}\pp{z_i}{x_i} \\
&= -\frac{z_i z_j}{(\onevec\trans\z)^2} + \frac{\delta_{ij} z_i}{\onevec\trans\z} \\
&= \frac{\delta_{ij} z_i}{\onevec\trans\z} - \frac{z_i z_j}{(\onevec\trans\z)^2} \\
&= \left(\frac{\text{diag}(\z)}{\onevec\trans \z} -  \frac{\z \z\trans}{(\onevec\trans\z)^2}\right)_{ij}
\endalign
$$

Therefore 

$$\hessian g(\x) = \frac{\text{diag}(\z)}{\onevec\trans \z} -  \frac{\z \z\trans}{(\onevec\trans\z)^2} $$

Now we must show that $\hessian g \succeq 0$. 

$$\beginalign 
\v\trans \hessian g(\x) \v 
&= \frac
    {\v\trans \text{diag}(\z)\v }
    {\onevec\trans \z}
- \frac
    {\v\trans \z \z\trans \v}
    {(\onevec\trans\z)^2} \\
&=
\frac {1}{(\onevec\trans \z)^2} \left(
\left(\onevec\trans \z \right) \left(\v\trans \text{diag}(\z)\v \right)
-
\v\trans \z \z\trans \v 
\right) \\
&= 
\frac{1}{(\onevec\trans \z)^2} \left(
        \left(\sum_i^n z_i \right)
        \left(\sum_{i=1}^{n}v_i^2 z_i \right) 
- \sum_{i=1}^{n}\sum_{j=1}^{n}v_i z_i v_j z_j
    \right) \\
&= 
\frac{1}{(\onevec\trans \z)^2} \left(
        \left(\sum_i^n z_i \right)
        \left(\sum_{i=1}^{n}v_i^2 z_i \right) 
- \left(\sum_{i=1}^{n}v_i z_i \right)^2
    \right) \\
&= 
\frac{1}{(\onevec\trans \z)^2} 
    \left(
          (\vec a \trans \vec a)(\vec b \trans \vec b) 
        - (\vec a \trans \vec b)
    \right)
\endalign$$

Where $\vec a$ and $\vec b$ are vectors with components 

$$a_i = \sqrt{z_i}, \quad b_i = v_i
\sqrt{z_i}$$

By the Cauchy-Schwarz inequality, $(\vec a \trans \vec a)(\vec b \trans \vec b) \geq (\vec a \trans \vec b)$, so 

$$ \v\trans \hessian g(\x)\v \geq 0, \quad \forall \v \in \realsn,$$

therefore $\hessian g(\x)$ is positive semi-definite and $g(\x)$ is convex. 

Now we can see that the transformed objective function is in the form 

$$\log\left(\displaystyle\sum_{k=1}^{K_{0}}\exp{\vec p_{k}\trans \y + q_{k}}\right)$$

Let us take $x_i = \vec p_i\trans\y + q_i$. We can put this into matrix form, giving us an affine function of $\y$:

$$ \x = \beginmatrix \vec p_1\trans \\ \vdots \\ \vec p_n\trans \endmatrix \y+ \vec q = P \y + \vec q$$

The composition of an affine function preserves convexity, so 

$$ g(P\y + \vec q) = \log(e^{\vec p_1\trans \y + \vec q_1} + \cdots + e^{\vec p_n\trans \y + \vec q_n} )$$

is convex, thus proving the convexity of the transformed objective function $\log(f_0(\exp{\y})$. The form of the function in the inequality constraint $\log(f_1(\exp{\y}) \leq 0 $ is identical, therefore they it is also convex. 

The transformed equality constrain $\u\trans \y = t$ is linear, so it is also convex, therefore the transformed problem is a convex optimization problem. 