# Convex Optimization
```{math}

\newcommand\realnumber{\mathbb{R}}
\newcommand\vbx{\vb{x}}
```

> The great watershed in optimization isn't between linearity and non-linearity, but convexity and non-convexity. -R. Tyrrell Rockafellar ()

## Convex Definitions 

````{prf:definition} Convex
:label: convex_combination
$$a x_1 + (1-a) x_2$$
````

## Convex functions

````{prf:definition} Convex function
:label: convex_function
a function $f:R^n\to R$ is `convex` if dom is a convex set and
$f(\theta x+(1-\theta) y) \le \theta f(x) + (1-\theta) f(y), \forall x, y\in D, 0\le \theta \le 1$.
(strictly convex)
````
````{prf:example} Convex functions
:class: dropdown
convex:
- affine
- exponential
- powers: $x^\alpha$ on $R_{++}$, for $\alpha\ge 1$ or $\alpha\le 0$
- powers of absolute value: $\abs{x}^P$ on $R$, for $p\ge 1$
- negative entropy: $x\log x$ on $R_{++}$

concave:
- affine 
- powers: for $0\le \alpha \le 1$
- logarithm: $\log x$ on $R_{++}$
````
`all norms are convex`

<!-- ### Epigraph of a function -->

````{prf:definition} Epigraph of a function
:label: epigraph_function
$\alpha$-sublevel set of $f:R^n\to R$:
$C_{\alpha} = \{x\in dom f \mid f(x) \le \alpha \}$
sublevel sets of convex functiosn are convex

`epigraph` of $f: R^n\to R$

$$epi(f)=\{(x,t)\in R^{n+1} | x\in D, f(x)\le t\}$$
````
`if all sublevel sets of a function are convex, is the function necessarily convex?` (butterfly like function)

````{prf:theorem} Epigraph convex
:label: epigraph_convex
$f$ is convex iff $epi(f)$ is a convex set.
````
````{prf:proof} 
two directions
````


````{prf:definition} Differentiable functions
:label: differentiable_function
$f$ is `differentiable` if $D$ is `open` and the gradient $\nabla f(x)$ exists at each $x\in D$
````

- `f (defined on an open domain) is convex, then f is continuous`
- f is convex, then f is differentiable. (False)

````{prf:definition} 1st-order condition
:label: 1st_order_condition
differentiable $f$ with convex domain is convex iff

$$ f(y) \ge f(x) + \nabla f(x)^T (y-x) \; \forall x,y \in D $$

first-order approximation of f is always underestimator. (first-order Taylor)
````
````{prf:definition} second-order condition
$f$ is `twice differentiable` if $D$ is `open` and the `Hessian` $\nabla^2 f(x)$ exists at each $x\in D$
````
````{prf:theorem} 
- for twice differentiable f with convex domain,
f is convex iff $\nabla^2 f(x) \succeq 0, \forall x \in D$
- if $\nabla^2 f(x) \succ 0 \forall x\in D$, then $f$ is strictly convex.
(converse is not true, $f(x) = x^4$)
````
<!-- ````{prf:proof}
```` -->
````{prf:example} Quadratic function
$f(x) = x^2$, $f(x) = (1/2)x^T Px + q^T x +r $ with $P\in S^n$,
then $\nabla (x) = Px+q$, $\nabla^2 f(x) =P$. 
convex iff $P \succeq 0$
````
````{prf:example} Least-square objective
$f(x)=||Ax-b||_2^2$, $\nabla f(x)=2A^T(Ax-b)$, $\nabla^2f(x) = 2A^T A$ convex (for any A)
````


````{prf:definition} Global minimum
- if $f$ is convex, differenetiable, and $\nabla f(x*)=0$,
then $f(y)\ge f(x^*), \forall y$, i.e., $x^*$ is the global minimum of $f(x)$
- if f is twice differentiable and $\nabla f(x*)=0$, $\nabla^2 f(x*)\succeq 0$, then $f(y)\ge f(x^*) \forall$, i.e., $x^*$ is a global minimum of $f(x)$. 
````
- a function may not have a global minimum even if it is convex, e.g., $f(x)=x$, $f(x)=e^x$ 
- these properties only apply to uncontrained minimization, $f(x,y)=x^2+y^2$.
if $D=\{(x,y)| xy\ge 1, x\ge 0, y\ge 0\}$, then you cannot find minimum at $\nabla f(x)=0$
````{prf:definition} Local minimum
Local minimum: $x^*$ is a local minimum of unconstrained function $f$ if it is no worse than its neighbors, i.e.,
$$
\exists \epsilon >0, s.t., f(x^*)\le f(x), \forall x, ||x-x^*||_2\le \epsilon
$$
property: 
- local minimum (assume twice differentiable), then $\nabla f(x*)=0$, $\nabla^2 f(x*)\succeq 0$ (converse not True, $f(x)=x^3$)
- $\nabla f(x*)=0$, $\nabla^2 f(x*)\succ 0$, local minimum 
````


### Methods for establishing convexity of a set

1. apply definition
2. show that $C$ obtained from the operations:
- intersection
- affine
- Perspective function & lienar-fractional function

#### Intersection
If $S_i$ is (affine, convex, convex cone), for $i\in A$, then $\cap_{i\in A} S_i$ is (affine, convex, convex cone)

intersection need not be `finite`: e.g., a convex set is intersection of `infinite halfspaces`. (Union not the case)

#### Affine function
a general form of linear function (constant term)

````{prf:definition} Affine function
:label: affine_function
`Affine function`
$f(x)=Ax + b, \; A\in R^{m\times n}, b\in R^{m}$
````
If $S$ is convex, $f(S)$ is also convex;
If $C$ is convex, $f^{-1}(C)$ is also convex;

````{prf:example}  Affine
    The epplisoid {prf:ref} is the image of the unit ball under the affine mapping.
    It is also the inverse image of the unit ball under the affine mapping
````

#### Perspective function & lienar-fractional function
`perspective function` $P: R^{n+1} \to R^n$:
$P(x,t) = x/t,\; dom(P)={(x,t)|t>0}$

`linear-fractional function` $f: R^n\to R^m$
$f(x) = \frac{Ax+b}{c^T x+d}, \; dom(f)={x|c^T x+d>0}$

`perspective functions preserve lines s.t. preserve convexity`
$P(\theta x +(1-\theta)y)=\dots=\mu P(x) + (1-\mu)P(y)$

````{prf:theorem}  Separating hyperplane theorem
:label: separating_hyperplane_theorem

If $C$ and $D$ are two disjoint (convex?) sets, there exists
````

````{prf:theorem}  Supporting hyperplane theorem
:label: supporting_hyperplane_theorem
:class: dropdown

`supporint hyperplane` to set $C$ at boundary point $x_0$:
$\{x| a^T x = a^T x_0\}$ where $a\neq 0$ and $a^x \le a^T x_0, \forall x\in C$.
If $C$ is convex, then there exists a supporitn hyperplane at every boundary point of $C$
````

### Operations that preserve convexity
- nonnegative multiple
- sum
- composition with affine function

example:
- log barrier for linear inequalities
- piecewise-linear function

pointwise supremum: if $f(x,y)$ is convex in $x$ for each $y\in A$, then $g(x)=\sup_{y\in A} f(x,y)$ is convex

````{prf:remark}
The infimum of a subset S of a partially ordered set P, assuming it exists, does not necessarily belong to S. If it does, it is a minimum or least element of S.
Similarly, if the supremum of S belongs to S, it is a maximum or greatest element of S.
````

examples:
- distance to farthes point in a set $C$: $f(x)=\sup_{y\in C} ||x-y||$

minimization: if $f(x,y)$ is convex in $(x,y)$ and $C$ is a convex set, then $g(x) = \inf_{y\in C} f(x,y)$ is convex.


#### composition of scalar functions
composition of $g$: $R^n\to R$ and $h: R\to R$: $f(x)=h(g(x))$
$f$ is convex if $g$ convex, $h$ convex, $\tilde{h}$ nondecreasing

discover composition rules for $n=1$, twice differentiable $g, h$

composition of $g$: $R^n\to R^k$ and $h: R^k\to R$: $f(x)=h(g(x))$
$f$ is convex if $g_i$ convex, $h$ convex, $\tilde{h}$ nondecreasing in each argument

composition rule for $n=1$, $f''(x) =g'(x)^T\nabla^2 h(g(x))g'(x) + \nabla h(g(x))^T g''(x)$

the `perspective` of a function $f:R^n\to R$ is the function $g:R^n\times R\to R$,
$$
g(x,t) = t f(x/t), \; dom(g) = \{ (x,t) | x/t \in dom(f), t>0 \}
$$
$g$ is convex if $f$ is convex.

````{prf:proof}
    For $t>0$ we have $(x,t,s)\in epi(g) \Leftrightarrow tf(x/t)\le s \Leftrightarrow f(x/t) \le s/t \Leftrightarrow (x/t,s/t)\in epi(f)$
    epi(g) is the inverse image of epi f under perspective mapping

    f is convex implies epi f is convex implies epi g is convex implies g is convex.
````
````{prf:example}
- $f(x) = x^T x$ is convex; hence $g(x,t)=x^T x/t$ is convex for $t>0$
- negative logarithm $f(x) = -\log x$ is convex: hence relative entropy $g(x,t) = t \log t - t \log x$ is convex on $R_{++}^2$.
- if $f$ is convex, then 
$$g()$$
````

The `conjugate` of a function $f$ is 
$f^*(y) = \sup_{x\in dom(f)} (y^T x - f(x))$
(which means $f^*(y)$ is the max gap between $yx$ and $f(x)$)

$f^*$ is convex (even if $f$ is not), Pointiwse supremum of a family of affine functions of y

````{prf:example}
- Affine function. $f(x) = ax + b$. $f^*(y)=\sup_x(yx-ax-b)= -b, y=a; \infty, otherwise$
- negative logarithm. $f^*(y)=\sup_{x>0} (xy+\log x)$
````


#### quasiconvex
$f:R^n\to R$ is `quasiconvex` if dom f is convex and the sublevel sets 
$C_\alpha = \{ x\in dom(f) | f(x) \le \alpha \}$
is convex for all $\alpha$.

`quasilinear`

````{prf:example}
- $\sqrt{|x|}$ is quasiconvex on R 
- $ceil(x) = \inf \{ z\in Z \mid z\ge x \}$ is quasilinear
- $\log x$ is quasilinear on $R_{++}$
- $f(x_1, x_2) = x_1 x_2$ is quasiconcave on $R_{++}^2$
- linear-fractional function is quasilinear
````
properties
- for quasiconvex $f: 0\le \theta \le 1 \implies f(\theta x + (1-\theta)y)\le \max \{ f(x), f(y) \}$

`first-order condition`: differentiable $f$ with cvx domain is quasiconvex iff
$f(y)\le f(x) \implies \nabla f(x)^T (y-x)\le 0$.

a positive function f is `log-concave` if $\log f$ is concave:
$f(\theta x + (1-\theta)y)\ge f(x)^\theta f(y)^{1-\theta}$ for $0\le \theta \le 1$.

log-convex: f is log-convex if log f is convex.

`example`: 
powers: $x^a$ on $R_{++}$ is log-convex for $a\le 0$, log-concave for $a\ge 0$.

## Convex optimization

standard form convex opt problem: 
$f_0, f_1, \dots, f_m$ are convex: equality constraints are affine

`important property: feasible set of a convex opt prob is cvx`

generalized inequalities are also ok

$min c^T x$ subject to: $A_0+A_1x_1 + A_2x_2 +\dots + A_n x_n\preceq 0$