# 3 Optimization Problems

A problem is a standard optimization problem if it is the form of
$$\min_x\{f_0(x):\quad f_i(x)\leqslant 0, i=1,2,\dotsc,m;\quad h_j(x)=0,\ j=1,2,\dotsc ,p\}$$
where we find an $x$ to minimize of a target function $f_0(x)$ with $x$ restricted by 
$m+p$ constraints.

## Convex Optimization Problem

When all $f_0$ and $f_i$ are convex functions and $h_i$ are affine, we call the optimization problem a convex optimization problem.

**Theorem** Any local optimal solution to a convex problem is also the global optimal solution.

Proof: Assume that now we have found a local optimal solution $x_0$ while a global optimal solution is given by $x_*$, where $f_0(x_*) < f_0(x)$. Now we consider a point $x = x_0 + \alpha (x_*-x_0)$ for $\alpha \in [0,1]$. It is easy to verify that 
$$f_i(x) = f_i(x_0 + \alpha(x_* - x_0)) \leqslant \alpha f_i(x_*) + (1 - \alpha) f_i(x_0)\leqslant 0,
\quad i = 1,2,\dotsc,m$$
and 
$$h_j(x) = h_j(x_0 + \alpha (x_* - x_0)) = 0,\quad j = 1,2\dotsc,p.$$
Hense $x$ is feasible. However, $f_0(x) = f_0(x_0 + \alpha(x_* - x_0)) \leqslant \alpha f_0(x_*) + (1 - \alpha) f_0(x_0) < f_0(x_0)$. Let $\alpha \rightarrow 0_+$ such that $x$ falls in the neighborhood of $x_0$, this is contradictory to the assumption that 
$x_0$ is a local optimal solution.

## Linear Programming (LP)

The following optimization problem is called a standard linear programming. Linear programming is convex.

$$\min_x \{c^Tx:\quad Ax = b,\quad x\geqslant 0 \}$$

### General Linear Programming

There are many general forms of linear programming which can be converted to the standard form. The following is an example.

$$\min_x \{c^Tx:\quad Ax = b,\quad Gx\leqslant h \}$$

It can be reconstructed as 
$$\min_{x^+,x^-,s} \{c^Tx^+ - c^T x^- :\quad Ax^+ - Ax^- =b,\quad Gx^+ - Gx^- + s = h,\quad x^+\geqslant 0,\quad x^-\geqslant 0,\quad s\geqslant 0\}.$$

Proof: We can construct a mapping that $x\mapsto (x^+,x^-)$ where $x_i^+ = x_i \mathbb I_{x_i>0}$ and 
$x_i^- = x_i\mathbb I_{x_i<0}$ which denote the positive and negative part of $x$ respectively. This is a bijection because $x$ can be retrieved by $x = x_i^+ - x_i^-$. Hence the minimizer in the two optimization problems are correspondent.

### Linear Minimax 

Given some weights and biases $c_i\in \mathbb R^n$ and $d_i\in \mathbb R$, the following linear minimax problem 
$$\min_x \left\{\max_i \{c_i^Tx + d_i\}:\quad Ax=b,\quad Gx\leqslant h\right\}$$
can be converted to the standard form,
$$\min_{x\in \mathbb R^n,t\in \mathbb R} \{t:\quad c_i^Tx -t \leqslant d_i,\quad Ax =b,\quad Gx\leqslant h\}.$$

### Linear-Fractional Programming (FP)

Linear-fractional programming can also be converted to linear programming. It is in the form of 
$$\min_{e^Tx+f>0} \{\frac{c^Tx + d}{e^Tx + f}:\quad Ax=b,\quad Gx\leqslant h\}$$
with equivalence to 
$$\min_{y\in \mathbb R^n,z\in \mathbb R}\{c^Ty+dz:\quad Ay = bz,\quad 
e^Ty+fz = 1,\quad Gy\leqslant hz,\quad -z\leqslant 0\}.$$

Proof: This is mainly assured by the homogenousity. Construct a mapping
$x\mapsto (\frac{1}{e^Tx+f}x,\frac{1}{e^Tx+f})=(y,z)\in \mathbb R^n\otimes \mathbb R$ where it is clear that $y = xz$. Then 
it is simple to verify that $e^Ty + fz= 1$, and, 
$$c^Ty + dz = \frac{c^Tx + d}{e^Tx + f}.$$

Therefore as long as $\frac{c^Tx_1 + d}{e^Tx_1 + f}\leqslant \frac{c^Tx_2 + d}{e^Tx_2 + f}$
we have $c^Ty_1 + dz_1\leqslant c^Ty_2 +dz_2$, indicating that any $(y,z)$ induced by a minimizer $x$ in the original linear-fraction also minimizes the second form. Conversely if we have the minizer $(y,z)$ in the second form, 
one can attain a minimizer in the first form by simply setting $(y,z)\mapsto x = \frac{y}{z}$.

## Quadratic Progamming (QP)

A typical quadratic progamming is the following optimization problem
$$\min \{\frac12 x^TPx + q^Tx +r:\quad Gx\leqslant h,\quad Ax=b\}$$
where $P$ is a given positive semidefinite matrix. By the second order derivative one is easy to verify that QP is a convex optimization.

### Ordinary Least Squares (OLS)

A least-squares problem is quadratic programming.

$$\min \{\Vert Ax - b\Vert_2^2:\quad l\leqslant Cx+d\leqslant u\}$$

If there is no linear constraints on $x$, the analytical solution is given by $x = A^\dag b$.

### Support Vector Machine (SVM)

Suppose there are two types of points in $\mathbb R^n$. Type $A$ includes $x_1,x_2,\dotsc ,x_m\in \mathbb R^n$ and 
type $B$ includes $x_{m+1},\dotsc,x_{m+p}\in \mathbb R^n$. Let $b_i = 1$ for $i = 1,2,\dotsc,m$ while $b_i = -1$ for $i = m+1,\dotsc,m+p$. We wish to find a hyperplane 
$$H(w,\beta) = \{x\in \mathbb R^n:\quad w^Tx+\beta = 0\}$$
with minimum weights that separates the two types by a certain margin, i.e.
$$\min \{\frac12\Vert w\Vert_2^2:\quad b_i(w^Tx_i+\beta_i)\leqslant 1,\ i=1,\dotsc,m+p\}.$$
Here we shall assume the hyperplane exists, where we call the two sets of points **linearly separable**.

<br>

In general, if the two sets of points are NOT linearly separable, we might introduce a regularization factor $\mu>0$ and solve
$$\min \left\{\frac 12\Vert w\Vert_2^2+\mu \sum_{i=1}^{m+p}\max \{1 - b_i(w^Tx_i + \beta), 0\}\right\}.$$

It can be converted to standard form if we introduce slackness variable $\xi_i$ and the problem becomes
$$\min \left\{\frac 12\Vert w\Vert_2^2 + \mu \sum_{i=1}^{m+p}\xi_i,\quad \xi_i \geqslant 1 - b_i(w^Tx_i +\beta),
\quad \xi \geqslant 0 \right\}.$$

### Quadratic Constrained Quadratic Programming (QCQP)
Quadratic constrained quadratic programming is a quadratic optimization with the constraints of quadratic forms, 
$$\min \{\frac12 x^TP_0x+q_0^Tx + r_0:\quad 
\frac12 x^TP_ix + q_i^Tx + r_i\leqslant 0,\ i=1,\dotsc,m,\quad Ax = b\}$$
where $P_0$ is positive semidefinite and other $P_i\ 1,2\dotsc,m$ are positive definite.

### Second-order Cone Programming (SOCP)
$$\min\{f^Tx:\quad \Vert A_i^Tx +b_i\Vert_2\leqslant c_i^Tx+d_i,\quad Fx = g\}$$

SOCP is a generalization of QCQP. 

Proof: Introduce a slackness variable $t$ and the QCQP turns to 
$$\min_{x,t}\left\{t:\quad \frac12 \Vert P_0^\frac 12 x + P_0^{-\frac 12}q_0\Vert_2\leqslant t+\frac 12q_0^TP_0^{-1}q-r_0,
\quad \Vert A_i^Tx +b_i\Vert_2\leqslant c_i^Tx+d_i, \quad Ax =b\right\}.$$

### Semidefinite Programming (SDP)
Let $P_0,P_1,\dotsc,P_n$ be symmetric, then SDP is in the form of
$$\min\left\{c^Tx:\quad  P_0+\sum_{i=1}^n x_i P_i \preceq 0,\quad Ax = b\right\}.$$

### Robust Linear Programming

For a   linear programming in practice,
$$\min \{c^Tx:\quad a_i^Tx \leqslant b_i\}$$
there are cases where $a_i$ are noisy or are random variable. There are two methods to handle the uncertainty.

#### Deterministic Linear Programming

We require that the constraints $a_ix\leqslant b_i$ hold around a neighborhood $\mathcal E_i$ of $a_i$, that is 
$$\min \{c^Tx:\quad \hat a_i^Tx\leqslant b_i, \ \forall \hat a_i\in \mathcal E_i\}.$$

In particular if $\mathcal E_i$ is an ellipsoid centered at $a_i$,  $\ \mathcal E_i = \{\hat a_i = a_i + P_iu:\ \Vert u\Vert_2\leqslant 1\}$ where $P_i$ is positive definite. Note that 
$$\max _{\hat a_i \in \mathcal E_i} \{\hat a_i^Tx + b_i\}=\max _{\Vert u\Vert_2 =1} \{a_i^Tx + u^TP_ix\}= 
a_i^Tx + \Vert P_ix\Vert_2$$
and hence the problem becomes
$$\min \{c^Tx:\quad a_i^Tx + \Vert P_ix\Vert_2\leqslant b_i\}.$$

#### Stochastic Linear Programming

We require that the constraints have a certain probability to hold,
$$\mathbb P(\hat a_i^Tx\leqslant b_i)\geqslant \eta_i.$$

In particular we assme $\hat a_i \sim N(a_i, \Sigma_i)$ and then for a fixed $x$ 
 we have $\hat a_i^Tx - b_i \sim N(a_i^Tx - b_i,x^T\Sigma_i x)$ and 
$$\mathbb P\left(\hat a_i^Tx - b_i\leqslant 0\right) = \Phi\left(\frac{a_i^Tx - b_i}{x^T\Sigma_i x}\right)$$

As we require $\mathbb P\left(\hat a_i^Tx - b_i\leqslant 0\right)\leqslant \Phi(\Phi^{-1}(\eta_i))$, the problem becomes
$$\min \{c^Tx:\quad a_i^Tx - b_i \leqslant  \Phi^{-1}(\eta_i)\cdot x^T\Sigma_ix\}.$$