# CHAPTER 4 - Convex Functions

---
---

**Author:** Dr Giordano Scarciotti (g.scarciotti@imperial.ac.uk) - Imperial College London 

**Module:** ELEC70066 - Advanced Optimisation

**Version:** 1.1.3 - 29/01/2023

---
---

The material of this chapter is adapted from $[1]$.

In this chapter we define and study elementary convex functions. Exploiting composition theorems we develop a calculus of convex functions which allows us to establish whether a complex function is convex by decomposing it in elementary functions. We also look at generalisations of convexity which can still be easily solved by reliable numerical algorithms. Contents:

*   Section 4.1 Basic Properties and Examples
*   Section 4.2 Calculus for Convex Functions
*   Section 4.3 Generalisations of Convexity

# 4.1 Basic Properties and Examples

## 4.1.1 Definitions

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/Swz-cFNqthw"></iframe>')

A function $f : \mathbb{R}^n \to  \mathbb{R}$ is **convex** if $\textbf{dom } f$ is a convex set and if for all $x$, $y \in \textbf{dom } f$, and $\theta$ with $0 \le \theta \le  1$, we have

$$
f(\theta x + (1 − \theta)y) \le  \theta f(x) + (1 − \theta)f(y). \tag{1}
$$

Geometrically this means that the **chord** from $x$ to $y$, i.e. the line segment between $(x,f(x))$ and $(y, f(y))$, lies above the graph of $f$.

Intuitively a function is convex when it curves up.




<div>
<img src="https://drive.google.com/uc?export=view&id=1KzT8IjY2dSlkmjZ05FdTAj3sq6X2mrWe" width="400"/>
</div>

Figure 4.1. *Example of convex function. The chord (i.e., line segment) between any two points on the graph lies above the graph.*

The inequality $(1)$ is sometimes called **Jensen's inequality**. Jensen's inequality can be estended to an arbitrary number of points, i.e. $f(\theta_1 x_1 + \dots + \theta_k x_k) \le \theta_1 f(x_1) + \dots + \theta_k f(x_k)$, infinite sums, integrals and expected values, i.e. $f$ is convex then $f(\mathbf{E}x)\le \mathbf{E}f(x)$.

A function is **strictly convex** if the strict inequality holds in $(1)$ whenever $x \ne y$ and $0 < \theta <  1$. A function is (strictly) **concave** if $-f$ is (strictly) convex. 

Note that affine functions verify $(1)$ with the equality. So affine functions are both convex and concave. The converse is true. Any function that is convex and concave is affine.

A useful property to check whether a function is convex or not is that a function is convex if and only if it is convex when restricted to any line that intersects its domain, i.e. if and only if for all $x \in \textbf{dom }f$ and all $v$, the function $g(t) = f(x+tv)$ is convex on its domain ($\{t : x+tv \in \textbf{dom }f\}$). This property is useful expecially when checking convexing of high-dimensional functions as it reduces to a test on lines. 

It is often convenient to extend a convex function to all of $\mathbb{R}^n$ by defining its value to be $\infty$ outside its domain. If $f$ is convex we define its **extended-value extension** $\tilde{f} : \mathbb{R}^n  \to \mathbb{R} \cup \{\infty\}$ by

$$
\tilde{f}(x) = \left\{ \begin{array}{ll}f(x) & x\in\textbf{dom }f \\ \infty & x \not \in \textbf{dom }f   \end{array} \right.
$$

The extension $\tilde{f}$ is defined on all $\mathbb{R}^n$, and takes values in $\mathbb{R} \cup \{\infty\}$.

The extended functions simplify the notation. For instance $(1)$ needs to hold for all $x$ and $y$ if we replace the $f$ with $\tilde{f}$ instead of saying that it needs to hold for all $x$ and $y$ in the domain. From now on in this course we use the same symbol to indicate $f$ and its extended-value function $\tilde{f}$, i.e. we assume that all functions are implicitly extended (that is, convex functions are defined as $\infty$ outside their domains). There are a few exeptions where we will distinguish $f$ and $\tilde{f}$, for instance when it is not enough for a condition to be satisfied by $f$ but it needs to hold for $\tilde{f}$ (and so we use $\tilde{f}$ to make this clear).

In a similar way we can extend a concave function by defining it to be $−\infty$ (**N.B.** minus) outside its domain.

**Example 4.1:** the extended-value function of $f(x) = x^2$ with domain $\mathbb{R}_{++}$ is 
$$
\tilde{f}(x) = \left\{ \begin{array}{ll}\infty  & x \le 0 \\ x^2 & x > 0  \end{array} \right.
$$

If the function $f$ is differentiable, then $f$ is convex if and only if $\textbf{dom }f$ is convex and 

$$
f(y) \ge f(x) + \nabla f(x)^\top (y-x) \tag{2}
$$

holds for all $x,y \in \textbf{dom }f$. This inequality states that for a convex function its first-order Taylor expansion is a global underestimator of the function. The converse of this result holds. If a function always lies above its first-order Taylor approximation then this function is convex.



<div>
<img src="https://drive.google.com/uc?export=view&id=1vuCpKSSqpdKAkpuIHAu0F4a_Hz-O7qs6" width="400"/>
</div>

Figure 4.2. *If $f$ is convex and differentiable, then its first-order Taylor approximation lies below the function.*

The inequality $(2)$ shows that from local information about a convex function (i.e. its value and derivative at a point) we can derive global information (i.e., a global underestimator of it). This is perhaps the most important property of convex functions, and explains some of the remarkable properties of convex functions and convex optimization problems. *As one simple but fundamental example, the inequality $(2)$ implies that if $\nabla f(x) = 0$, then $f(y) \ge f(x)$ for all $y \in \textbf{dom }f$, i.e., $x$ is a global minimizer of the function $f$*.

Strict convexity can be defined similarly by requiring that $(2)$ holds strictly (i.e. $<$) whenever $x \ne y$.  (Strict) concavity can be defined similarly by replacing $\ge$ with $\le$ ($<$)

If the funtion $f$ is twice differentiable, then 
$f$ is convex if and only if $\textbf{dom }f$ is convex and its Hessian is positive semidefinite, i.e. for all $x \in \textbf{dom }f$

$$
\nabla^2 f(x) \succcurlyeq 0.
$$

Geometrically, a function is convex if and only if its graph has (positive) upward curvature. Similarly $f$ is concave if and only if $\textbf{dom }f$ is *convex* and $\nabla^2 f(x) \preccurlyeq 0$ for all $x \in \textbf{dom }f$. Strict convexity (or strict concavity) is a bit more tricky to characterize using a second-order condition. A function $f$ is strictly convex  if $\textbf{dom }f$ is convex and $\nabla^2 f(x) \succ 0$. **However, the converse is not true.** For instance $f(x) = x^4$ on $\mathbb{R}$ is strictly convex but its second derivative is zero at $x=0$.

**Example 4.2:** Consider the quadratic function $f : \mathbb{R}^n \to \mathbb{R}$ with $\textbf{dom }f=\mathbb{R}^n$ defined by $f(x) = (1/2)x^\top P x + q^\top x + r$ with $P \in \mathbb{S}^n$, $q\in\mathbb{R}^n$ and $r\in\mathbb{R}$. Since $\nabla^2 f(x) = P$ for all $x$, $f$ is convex (concave) if and only if $P \succcurlyeq 0$ ($P \preccurlyeq 0$).

**Example 4.3:** Consider the function $f(x) = 1/x^2$ with domain $\textbf{dom }f = \{x\in \mathbb{R} : x \ne 0\}$. This function satisfies $f''(x)>0$ for all $x \in \textbf{dom }f$ but it not convex because the domain is not a convex set. In summary, the separate requirement that $\textbf{dom }f$ be convex cannot be dropped from the first- or second-order characterizations of convexity and concavity.

## 4.1.2 Examples

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/3UkUHNpjbbs"></iframe>')

We now look at various examples of convex functions. We start with simple functions.

*    As already mentioned, **affine** (and so linear) and **quadratic** functions are convex. Affine functions on $\mathbb{R}^{m \times n}$ are $f(x)= \textbf{tr}(A^\top X)+b = \sum_{i=1}^m\sum_{j=1}^n A_{ij}X_{ij}+b$. The trace operator can be understood as the inner product over $\mathbb{R}^{m \times n}$. 

*    **Exponentials** $e^{ax}$ are convex on $\mathbb{R}$ for any $a\in\mathbb{R}$.

*    **Powers** $x^a$ are convex on $\mathbb{R}_{++}$ when $a \ge 1$ or $a \le 0$ and concave for $0 \le a \le 1$.

*    **Powers of absolute value** $|x|^p$ are convex on $\mathbb{R}$ for $p\ge 1$.

*    The **logarithm** $\log x$ is concave on $\mathbb{R}_{++}$.

*    The **negative entropy** $x \log x$ on $\mathbb{R}_{++}$ or $\mathbb{R}_{+}$ defined as $0$ for $x=0$ is convex.

*    Every **norm** on $\mathbb{R}^n$ is convex.

In fact, let $f$ be a norm and $0\le \theta \le 1$. Then

$$
f(\theta x + (1-\theta)y) \le f(\theta x) + f((1-\theta)y) = \theta f(x) + (1-\theta)f(y)
$$
where the inequality follows from the triangular inequality and the equality follows from the multiplication by scalar property of the norm.

*    The **maximum singular value** is convex. This is because it is a norm, namely $f(x) = (\lambda_{\max}(X^\top X))^{1/2} = \sigma_{\max} (X) = ||X||_2$.

*    The **max** function $\max \{x_1,\dots, x_n\}$ is convex on $\mathbb{R}^n$.

In fact, let $f = \max_i x_i$ and $0\le \theta \le 1$. Then

$$
f(\theta x + (1-\theta)y) = \max_i(\theta x_i + (1-\theta)y_i) \le \theta \max_i x_i + (1-\theta) \max_i y_i = \theta f(x) + (1-\theta)f(y)
$$

*    The **quadratic-over-linear** function $f(x,y) = x^2/y$ with $y>0$ is convex.

In fact, we compute the Hessian

$$
\nabla^2 f(x,y) = \frac{2}{y^3}\left[\begin{array}{rr} y^2 & -xy\\ -xy & x^2 \end{array}\right] = \frac{2}{y^3} \left[\begin{array}{r} y \\ -x\end{array}\right] \left[\begin{array}{rr} y & -x\end{array}\right] \succcurlyeq 0.
$$

*    The **log-sum-exp** $f(x) = \log(e^{x_1}+ \dots + e^{x_n})$ is convex on $\mathbb{R}^n$.

In fact, we compute the Hessian

$$
\nabla^2 f(x) = \frac{1}{(\mathbf{1}^\top z)^2} \left( (\mathbf{1}^\top z) \textbf{diag}(z) - zz^\top \right)
$$

where $z = [e^{x_1},\dots,e^{x_n}]^\top$. To check if the Hessian is positive semidefinite we compute

$$
v^\top \nabla^2 f(x) v = \frac{(\sum_k z_k v_k^2)(\sum_k z_k) - (\sum_k v_k z_k)^2}{(\sum_k z_k)^2}.
$$

By applying Cauchy-Schwartz inequality $(a^\top a)(b^\top b) \ge (a^\top b)^2$ with $a_i = v_i \sqrt{z_i}$ and $b_i = \sqrt{z_i}$ it follows that the Hessian is nonnegative.








*    The **geometric mean** $f(x) = \left( \prod_{i=1}^n x_i \right)^{\frac{1}{n}} $ is concave on $\textbf{dom }f = \mathbb{R}_{++}^n$. The proof is similar to the one of the log-sum-exp and it is omitted.

*    The **log-determinant** $f(X)=\log \det X$ is concave on $\textbf{dom }f = \mathbb{S}_{++}^n$.

We can prove this by exploiting the property metioned at the beginning of this note that a function is convex if and only if it is along any line. Consider an arbitrary line $X = Z + tV$ where $Z,V \in \mathbb{S}^n$. Consider $g(t)=f(Z + tV)$ and restrict $g$ to the interval of values of $t$ such that $Z + tV \succ 0$. We can assume without loss of generality that for $t=0$ we are inside this interval. Then we have

$$
g(t) = \log\det(Z+tV) = \log\det(Z^{1/2}(I+tZ^{-1/2}VZ^{-1/2})Z^{1/2}) = \sum_{i-1}^n \log (1+t\lambda_i) + \log\det Z 
$$

where the $\lambda_i$'s are the eigenvalues of $Z^{-1/2}VZ^{-1/2}$. Computing the second derivative we obtain

$$
g''(t) = - \sum_{i=1}^n \frac{\lambda_i^2}{(1+t\lambda_i)^2} \le 0.
$$

Since this holds for any $t$, and so for any line, then $f$ is concave.


## 4.1.3 Important Examples

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/-EVxwE1hOp8"></iframe>')

The **$\alpha$-sublevel sets** of a convex function $f:\mathbb{R}^n \to \mathbb{R}$, which are defined as $C_\alpha = \{x \in \textbf{dom }f : f(x) \le \alpha\}$, are convex for any value of $\alpha$.

In fact, if $x,y\in C_{\alpha}$, then $f(x) \le \alpha$ and $f(y) \le \alpha$ and so $f(\theta x + (1-\theta)y) \le \alpha$ for $0 \le \theta \le 1$, which implies that $\theta x + (1-\theta)y \in C_\alpha$.

The converse is not true, for instance $f(x)=-e^x$ is strictly concave on $\mathbb{R}$ but all its sublevel sets are convex.

Another version of this example is that if $f$ is concave, then its **$\alpha$-superlevel sets** given by $\{x \in \textbf{dom }f : f(x) \ge \alpha\}$ are convex.

These results are often useful to establish convexity of a set, by rewriting the set under study as a sublevel (superlevel) set of a convex (concave) function.



Finally, a quite important example is the epigraph. The graph of a function $f: \mathbb{R}^n \to \mathbb{R}$ is defined as $\{(x,f(x)): x \in \textbf{dom }f\} \subseteq \mathbb{R}^{n+1}$. The **epigraph** of a function $f: \mathbb{R}^n \to \mathbb{R}$ is defined as 

$$
\textbf{epi }f = \{(x,t): x \in \textbf{dom }f, f(x) \le t\}\subseteq \mathbb{R}^{n+1}
$$

The epigraph is the set above the function and it is illustrated in the figure below.

<div>
<img src="https://drive.google.com/uc?export=view&id=1tCDpuwYVlW-rfyTKp6l912B98bwX23Ud" width="300"/>
</div>

Figure 4.3. *Epigraph of a function $f$, shown shaded. The lower boundary, shown darker, is the graph of $f$.*

A function is convex if and only if its epigraph is a convex set. Thus, the epigraph provides a link between the notions of convex sets and convex functions.

Similarly, a function is concave if and only if its **hypograph** 

$$
\textbf{hypo }f = \{(x,t): x \in \textbf{dom }f, f(x) \ge t\}\subseteq \mathbb{R}^{n+1}
$$ 

is a convex set.

# 4.2 Calculus for Convex Functions

## 4.2.1 Simple Operations

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/q2b58E7EicU"></iframe>')

A **nonnegative weighted sum** of convex functions, $f = w_1 f_1 + \dots + w_m f_m$ with $w_j \ge 0$, is convex. Similarly, a nonnegative weighted sum of concave functions is concave. A nonnegative, nonzero weighted sum of strictly convex (concave) functions is strictly convex (concave).

This properties extend to infite sums and integrals, e.g. if $f(x,y)$ is convex in $x$ for each $y\in\mathcal{A}$ and $w(y) \ge 0$ for each $y\in\mathcal{A}$, then $g(x) = \int_{\mathcal{A}} w(y) f(x,y) dy$ is convex in $x$.

The proof of these properties can be obtained easily using Jensen's inequality. Alternatively, one can use the epigraph. For instance, let $w \ge 0$ and $f$ convex, then by using the [epigraph identity](https://colab.research.google.com/drive/1WwNsPbW7-2PyVM_fXsOCUX7O7q8lQM3W#scrollTo=5WaFfb2M-ZZB&line=10&uniqifier=1) 

$$
\textbf{epi }(w f) = \left[\begin{array}{ll} I & 0 \\ 0 & w\end{array}\right] \textbf{epi }(f)
$$
we have that $w f$ is convex. In fact, the right-hand side is convex because it is the image of a convex set (the epigraph of a convex function) under a linear mapping (multiplication by matrix) which is a convex function.

In fact, composition of a convex (concave) function $f$ under an affine mapping is convex (concave), i.e. $g(x) = f(Ax + b)$ with $\textbf{dom }g=\{x: Ax + b \in \textbf{dom }f\}$ is convex (concave) if $f$ is convex (concave).

If $f_1$ and $f_2$ are convex functions then their **pointwise maximum** $f$, defined by $f(x) = \max\{f_1(x), f_2(x)\}$, with $\textbf{dom }f = \textbf{dom }f_1 \cap \textbf{dom }f_2$, is also convex.

**Exercise 4.1:** Use Jensen's inequality to prove it.

***EDIT THE FILE TO ADD YOUR PROOF HERE***

Of course, this holds for an arbitrary number of convex functions.

**Example 4.4:** For $x\in \mathbb{R}^n$ we denote by $x_{[i]}$ the $i$-th largest component of $x$. Then the function $f(x) = \sum_{i=1}^r x_{[i]}$, i.e. the sum of the $r$ largest elements of $x$, is a convex function. This follows from 

$$
f(x) = \sum_{i=1}^r x_{[i]} = \max\{x_{i_1}+ \dots + x_{i_r} : 1\le i_1 < \dots < i_r \le n  \}
$$

i.e. the maximum of all possible sums of $r$ different components of $x$. Despite the complexity of such a function, convexity follows by pointwise maximum among the $n!/(r!(n-r)!)$ sums.

The pointwise maximum property extends to the **pointwise supremum** over an
infinite set of convex functions. If for each $y\in\mathcal{A}$, $f(x, y)$ is convex in $x$, then the function $g$, defined as 
$$
\displaystyle g(x) = \sup_{y\in\mathcal{A}} f(x,y)
$$ 
is convex in $x$. The domain of $g$ is $\textbf{dom }g=\{x : (x,y) \in \textbf{dom }f \text{ for all }y\in\mathcal{A}, g(x) < \infty\}$. Similarly, the pointwise infimum of a set of concave functions is a concave function.

In terms of epigraphs, the pointwise supremum of functions corresponds to the intersection of the epigraphs: we have

$$
\textbf{epi }g = \bigcap_{y\in\mathcal{A}} \textbf{epi }f(\cdot,y).
$$

Thus, the result follows from the fact that the intersection of a family of convex sets is convex.

**Example 4.5:** The support function $S_C$ associated to a non-empty set $C \subseteq \mathbb{R}^n$ is defined as $S_C(x) = \sup\{x^\top y : y\in C\}$. For each $y\in C$, $x^\top y$ is a linear function of $x$, so it is convex. $S_C$ is convex as the supremum of a family of convex functions.

**Example 4.6:** The distance to the farthest point of a set $C \subseteq \mathbb{R}^n$ defined as $f(x) = \sup_{y\in C} ||x-y||$ is convex. In fact, for any $y$, $||x-y||$ is convex in $x$ because it is a norm. Then $f$ is the the supremum of a family of convex functions.

**Example 4.7:** The maximum eigenvalue of a symmetric matrix, i.e. $f(X)=\lambda_{\text{max}}(X)$ with $\text{dom }f=\mathbb{S}^m$ is convex. In fact, $f(X) = \sup\{y^\top X y : || y ||_2 =1 \}$. Note that $y^\top X y$ is *linear* in $X$. Hence, the maximum eigenvalue of a symmetric matrix can be expressed as the supremum of a family of linear functions.

The examples above illustrate a good method for establishing convexity of a function: by expressing it as the pointwise supremum of a family of affine functions. Except for a technical condition, a converse holds: almost every convex function can be expressed as the pointwise supremum of a family of affine functions.

## 4.2.2 Composition of Convex Functions

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/ZAipB-E--tY"></iframe>')

**Errata:** At 10:39 the video says "nondecreasing". It should be "not nondecreasing".

We now want to understand what conditions two functions have to satisfy so that their **composition** is convex. We start by looking at a very special case. Let $h : \mathbb{R} \to \mathbb{R}$ and $g : \mathbb{R} \to \mathbb{R}$ be twice differentiable and with domain $\mathbb{R}$. In this case the convexity of the composed function $f(x) = h(g(x))$ can be checked by looking at its second derivative

$$
f''(x) = h''(g(x))g'(x)^2 + h'(g(x))g''(x).
$$

From this we see, for instance, that if $g$ and $h$ are convex (so $g''\ge0$ amd $h''\ge0$) and $h$ is nondescreasing (so $h'\ge 0$), it follows that $f''\ge 0$ and so $f$ is convex.

It turns out that this kind of requirements holds in general, for higher dimensional functions and **without assuming the existence of any derivative**.

Consider two functions $h : \mathbb{R}^k \to \mathbb{R}$ and $g : \mathbb{R}^n \to \mathbb{R}^k$. We now gives conditions that guarantee convexity or concavity of their composition $f = h \circ g : \mathbb{R}^n  \to \mathbb{R}$, defined by $f(x) = h(g(x))=h(g_1(x),\dots,g_k(x))$, $\textbf{dom }f = \{x \in \textbf{dom }g : g(x) \in \textbf{dom }h\}$:

*   $f$ is convex if $h$ is convex, $\tilde{h}$ is nondecreasing in each argument and $g_i$ are convex;
*   $f$ is convex if $h$ is convex, $\tilde{h}$ is nonincreasing in each argument and $g_i$ are concave;
*   $f$ is concave if $h$ is concave, $\tilde{h}$ is nondecreasing in each argument and $g_i$ are concave;
*   $f$ is concave if $h$ is concave, $\tilde{h}$ is nonincreasing in each argument and $g_i$ are convex.

It is important to note that the rules above do not hold in general if the extended-value function $\tilde{h}$ is replaced by $h$. 

We can give a simple geometric interpretation to the requirement that $\tilde{h}$ is nondecreasing. First consider the case $k=1$. Suppose $h$ is convex, so $\tilde{h}$ takes on the value $\infty$ outside $\textbf{dom }h$. To say that $\tilde{h}$ is nondecreasing means that for any $x$, $y \in \mathbb{R}$, with $x < y$, we have $\tilde{h}(x) \le \tilde{h}(y)$. In particular, this implies that if $y \in \textbf{dom }h$, then $x \in \textbf{dom }h$ (otherwise the inequality would fail because $\tilde{h}(x)=\infty$). In other words, the domain of $h$ extends infinitely in the negative direction; it is either $\mathbb{R}$, or an interval of the form $(−\infty, a)$ or $(−\infty, a]$. In a similar way, to say that $h$ is convex and $\tilde{h}$ is nonincreasing means that $h$ is nonincreasing and $\textbf{dom }h$ extends infinitely in the positive direction.


**Example 4.8:** $h(x)=\log x$ with $\textbf{dom }h = \mathbb{R}_{++}$ is concave and satisfies $\tilde{h}$ nondecreasing (remember, for a concave function the estension has value $-\infty$).

**Example 4.9:** $h(x)=x^{1/2}$ with $\textbf{dom }h = \mathbb{R}_{+}$ is concave and satisfies $\tilde{h}$ nondecreasing.

**Example 4.10:** $h(x)=x^{3/2}$ with $\textbf{dom }h = \mathbb{R}_{+}$ is convex but does not satisfy the condition $\tilde{h}$ nondecreasing. In fact, $\tilde{h}(-1)=\infty$ but $\tilde{h}(1)=1$.

**Example 4.11:** $h(x)=x^{3/2}$ for $x \ge 0$ and $h(x)=0$ for $x< 0$ with $\textbf{dom }h = \mathbb{R}$ is convex and satisfies $\tilde{h}$ nondecreasing.


Similarly, for $k >1$, if $h$ is convex and $\tilde{h}$ is nondecreasing, whenever $u \preccurlyeq v$ whe have that $\tilde{h}(u) \le \tilde{h}(v)$. This implies that if $v \in \textbf{dom }h$ then so is $u$: consequently the domain of $h$ must extend infinitely in the $-\mathbb{R}^k_{+}$ directions.

**Example 4.12:**

*   Let $h(z) = \sum_{i=1}^r z_{[i]}$, i.e. $h$ is the sum of the $r$ largest components of $z\in\mathbb{R}^k$. Note that $h$ is convex (as established in an example above) and nondecreasing in each argument. Suppose $g_1, \dots , g_k$ are convex functions of $\mathbb{R}^n$. Then $f = h \circ g$, i.e. the pointwise sum of the $r$ largest $g_i$’s, is convex.
*   Let $h(z) = \log(\sum_{i=1}^k e^{z_i})$. As shown earlier, log-sum-exp is convex. It is also nondescreasing in each argument, so $\log(\sum_{i=1}^k e^{g_i})$ is convex whenever $g_i$ are.
*   For $0 < p \le 1$, the function $h(z) = (\sum_{i=1}^k z_i^p)^{1/p}$ on $\mathbb{R}_+^k$ is concave, and its extension is nondecreasing in each
component. So if $g_i$ are concave and nonnegative, we conclude that $f(x) = (\sum_{i=1}^k g_i(x)^p)^{1/p}$ is concave.
*   Suppose $p \ge 1$, and $g_1,\dots,g_k$ are convex and nonnegative. Then the function $(\sum_{i=1}^k g_i(x)^p)^{1/p}$ is convex.



## 4.2.3 Partial Minimisation

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/M-K4i8rvMQE"></iframe>')

We have seen that maximum and supremum of convex functions is convex. We now see that this holds also for a special kind of minimisation: **partial minimisation**. If $f$ is convex in $(x,y)$ and $C$ is a convex nonempty set, then the function

$$
g(x) = \inf_{y\in C} f(x,y)
$$
is convex in $x$. The domain of $g$ is $\textbf{dom }g=\{x : (x,y) \in \textbf{dom }f \text{ for some }y\in C,\,g(x) > - \infty\}$. (For comparison, we restate here the property for the supremum: if for each $y\in\mathcal{A}$, $f(x, y)$ is convex in $x$, then $g(x) = \sup_{y\in\mathcal{A}} f(x,y)$ is convex in $x$).

**Example 4.13:** Consider the quadratic function $f(x,y) = x^\top A x + 2x^\top B y + y^\top C y$ and assume that $f$ is convex in $(x,y)$ which means

$$
\left[\begin{array}{cc}A & B\\ B^\top & C\end{array}\right] \succcurlyeq 0.
$$

Minimising over $y$ gives $g(x) = \inf_y f(x,y) = x^\top (A-BC^{-1}B^\top) x$. By partial minimisation $g(x)$ is convex, which implies that  
$A-BC^{-1}B^\top$ (called Schur complement) is positive semidefinite. 

**Example 4.14:** We have seen that the distance to the farthest point of a set is convex. Now we show that the distance to a set (i.e. to the closest point) is convex. In fact, this is defined as

$$
\textbf{dist }(x,S) = \inf_{y\in S} ||x-y||.
$$
The norm is convex in $(x,y)$ so if the set $S$ is convex, then $\textbf{dist }(x,S)$ is a convex function of $x$.



## 4.2.4 Perspective of a Function

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/_6ldQPouxgA"></iframe>')

The perspective function $g: \mathbb{R}^{n+1} \to \mathbb{R}$ of a function $f: \mathbb{R}^n \to \mathbb{R}$ is defined by

$$
g(x,t) = t f(x/t)
$$

with domain $\textbf{dom }g = \{(x,t) : x/t \in \textbf{dom }f, t>0\}$. The **perspective function** preserves convexity and concavity. This can be proved in many ways, but it is interesting to look at a proof based on the epigraph. For $t>0$ we have

$$
(x,t,s) \in \textbf{epi } g \iff t f(x/t) \le s \iff f(x/t) \le s/t \iff (x/t,s/t) \in \textbf{epi } f.
$$
Thus, $\textbf{epi } g$ is the inverse image of $\textbf{epi } f$ under the perspective mapping $(u,v,w) \to (w/v,w/v)$. Both operations preserve convexity of the original set $\textbf{epi } f$.


**Example 4.15:** $g(x,t)= \frac{x^\top x}{t}$ is convex for all $t>0$ because $f(x) =x^\top x$ is convex. This, of course, could have been shown in different ways, for instance using the quadratic-over-linear example.

**Example 4.16:** The relative entropy $g(x,t) = t \log t - t \log x$ is convex on $\mathbb{R}^2_{++}$ because it can be written as the perspective function of the negative logarithm $f(x) = -\log x$.

**Example 4.17:** Let $t = c^\top x + d$ with $c \in \mathbb{R}^n$ and $d \in \mathbb{R}$. The function 

$$
g(x) = (c^\top x + d) f \left(\frac{Ax + b}{c^\top x + d}\right)
$$
with $A \in \mathbb{R}^{m\times n}$ and $b \in \mathbb{R}^m$ is convex on $\textbf{dom }g= \{x: c^\top x + d >0, \, (Ax + b)/(c^\top x + d) \in \textbf{dom }f\}$ if $f$ is convex.


# 4.3 Generalisations of Convexity

Among the many possible ways in which convexity can be generalised, we now see a few concepts that are useful in practice.

## 4.3.1 Quasiconvex functions

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/n3ltnHU4_eU"></iframe>')

The first concept is that of **quasiconvexity**. The utility of this notion is that as we will see in later lectures, a quasiconvex problem can be solved as a sequence of convex problems. Thus, we can use exactly the same algorithms of convex optimisation to solve quasiconvex problems. The drawback, of course, is that the computational cost increases.

A function $f: \mathbb{R}^n \to \mathbb{R}$ is called **quasiconvex** (aka unimodal) if its domain and all its sublevel sets $S_{\alpha} = \{x\in\textbf{dom }f : f(x) \le \alpha \}$ for $\alpha \in \mathbb{R}$ are convex.

A function is **quasiconcave** is $-f$ is quasiconvex, i.e. its domain and all its superlevel sets $S_{\alpha} = \{x\in\textbf{dom }f : f(x) \ge \alpha \}$ are convex.

A function which is both quasiconvex and quasiconcave is called **quasilinear**, i.e. its domain and all its level sets $S_{\alpha} = \{x\in\textbf{dom }f : f(x) = \alpha \}$ are convex.

For functions on $\mathbb{R}$ quasiconvexity simply requires that each sublevel set is an interval (can be unbounded). In other words, the function cannot have more than one dip (othewise the level set would be the union of two intervals). See below for an example. 

<div>
<img src="https://drive.google.com/uc?export=view&id=1wOysYxbCBRYhq_ZC1eIBu4k89xPemUkQ" width="300"/>
</div>

Figure 4.4. *A quasiconvex function on $\mathbb{R}$. The sublevel set $S_{\alpha}$ is the interval $[a,b]$. The sublevel set $S_\beta$ is the interval $(−\infty, c]$.*

It is also possible to characterise quasiconvexity with a variation of the Jensen's inequality. A function $f$ is quasiconvex if and only if $\textbf{dom} f$ is convex and for any $x,y \in \textbf{dom} f$ and $0\le \theta \le 1$ we have

$$
f(\theta x + (1-\theta)y) \le \max\{f(x), f(y)\}
$$

i.e. the cord is below either of the two extreme points. This is illustrated below.

<div>
<img src="https://drive.google.com/uc?export=view&id=1axsUdUvjAof_lj5rdvs11mHK3qh7XPPx" width="300"/>
</div>

Figure 4.5. *A quasiconvex function on $\mathbb{R}$. The value of $f$ between $x$ and $y$ is no more than $\max\{f(x), f(y)\}$.*

Like convexity, quasiconvexity is characterised by the behavior of a function $f$ on lines: $f$ is quasiconvex if and only if its restriction to any line intersecting its domain is quasiconvex.

Like convexity, quasiconvexity is characterised by first-order (and second-order) conditions whenever the function is (twice) differentiable. Suppose $f: \mathbb{R}^n \to \mathbb{R}$ is differentiable. Then $f$ is quasiconvex if and only if $\textbf{dom }f$ is convex and for all $x,y\in\textbf{dom }f$

$$
f(y) \le f(x) \implies \nabla f(x)^\top (y-x) \le 0
$$

This condition has a simply geometric interpretation when $\nabla f(x) \ne 0$: $\nabla f(x)$ defines a supporting hyperplane to the sublevel set $\{y:f(y) \le f(x)\}$ at the point $x$. This is illustrated in the figure below.

<div>
<img src="https://drive.google.com/uc?export=view&id=1ARQDubZnYSDDRCS1h8d6ilLL5j_UN6sq" width="300" height="250"/>
</div>

Figure 4.6. *Geometric interpretation of the first-order quasiconvexity condition.*

While the first-order condition for convexity and quasiconvexity are similar, there are some important differences. For example, if $f$ is convex and $\nabla f(x) = 0$, then $x$ is a global minimizer of $f$. But this statement is false for quasiconvex functions: it is possible that $\nabla f(x) = 0$, and $x$ is not a global minimizer of $f$.

For completeness we state the second-order characterisation of quasiconvexity, without any detail. Suppose that $f$ is twice differentiable. If $\textbf{dom }f$ is convex and $f$ satisfies

$$
y^\top \nabla f(x) = 0 \implies y^\top \nabla^2 f(x) y >0
$$

for all $x\in \textbf{dom }f$ and all $y\in \mathbb{R}^n \setminus \{0\}$, then $f$ is quasiconvex.

Without going through the details, quasiconvexity is preserved under maximum, supremum, composition and partial minimisation. Do not assume any other property without proving it. For instance the innocent sum of quasiconvex functions is not usually quasiconvex. See the figure below for a counterexample.

<div>
<img src="https://drive.google.com/uc?export=view&id=1fWwCbtLg4ME2glL5LnagXzXeNb71-Jb9" width="400" height="250"/>
</div>

Figure 4.7. *The sum of two quasiconvex functions may not be quasiconvex.*


**Example 4.18:** $\log x$ on $\mathbb{R}_{++}$ is quasiconvex, quasiconcave and so quasilinear.

**Example 4.19:** $\text{ceil}(x) = \inf \{z \in \mathbb{Z} : z \ge x\}$ is quasiconvex, quasiconcave and so quasilinear.

**Example 4.20** $f(x_1,x_2) = x_1 x_2$ on $\textbf{dom }f = \mathbb{R}_+^2$ is not convex nor concave because the Hessian is indefinite, i.e. it has one positive and one negative eigenvalue. The function is quasiconcave because its superlevel sets $\{x \in \mathbb{R}^2_+ : x_1x_2 \ge \alpha\}$ are convex sets for all $\alpha$. Note that $f$ is not quasiconcave on all $\mathbb{R}^2$.

**Example 4.21:** The function 

$$
f(x) = \frac{a^\top x + b}{c^\top x + d}
$$

with $\textbf{dom }f=\{x : c^\top x + d > 0\}$ is quasiconvex, quasiconcave and so quasilinear. In fact its $\alpha$-sublevel sets are
$$
S_\alpha = \{x : c^\top x + d >0,\, a^\top x + b \le \alpha(c^\top x + d)\}
$$

which is convex as the intersection of two halfspaces.


### Representation via a family of convex functions

As mentioned earlier the reason why we consider quasiconvexity is because we can represent any quasiconvex function as a family of convex functions. In particular, we want to find a parametrised representation of the sublevel sets of a quasiconvex function $f$ (which are convex) using inequalities of convex functions, i.e. we seek a family of convex functions $\phi_t : \mathbb{R}^n \to \mathbb{R}$, parametrised in $t\in \mathbb{R}$, with the property that

$$
f(x) \le t \iff \phi_t(x) \le 0.
$$

Evidently $\phi_t$ must satisfy the property that if for all $x \in \mathbb{R}^n$, $\phi_t(x) \le 0 \implies \phi_s(x) \le 0$ for $s \ge t$. This is satisfied for instance if, for each $x$, $\phi_t(x)$ is a nonincreasing function of $t$.

Thus, a family with such property is, for instance

$$
\phi_t(x) = \left\{\begin{array}{ll}0 & f(x) \le t \\ \infty & \text{otherwise.}\end{array}\right.
$$

This representation is not unique, and if possible we should find one that is differentiable.


**Example  4.22:** Consider the function $f(x) = \frac{p(x)}{q(x)}$ on a convex set $C$, with $p(x) \ge 0$ convex and $q(x)>0$ concave. Then $f(x)$ is quasiconvex and

$$
f(x) \le t \iff p(x) - t q(x) \le 0
$$

Thus we take $\phi_t(x) = p(x) - t q(x)$ for $t\ge 0$. For each $t$, $\phi_t$ is convex (because $p(x)$ and $-tq(x)$ are convex) and for each $x$, $\phi_t$ is decreasing in $t$.

## 4.3.2 Log-concave Functions

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/OOGCPJJOIeI"></iframe>')

**Errata:** There is a substantial mishap in the video starting at 6:40. The mishap regards the preservation of log-convexity under integration. What is stated in the video is correct. However, what is really important is that **log-concavity** as well is preserved under a special type of integration, which implies that log-concavity is preserved under convolution and probability. The text below has been changed from the video to refer to log-concavity when referring to the integration property, and so it is now as intended. (And again, what stated in the video is not wrong, but simply not relevant in applications.) 

The second generalisation is that of **log-concavity** (and log-convexity). A function $f : \mathbb{R}^n \to \mathbb{R}$ is log-concave (log-convex) if $f(x)>0$ for all $x \in \textbf{dom }f$ and $\log f$ is concave (convex). It follows that $f$ is log-convex if and only if $1/f$ is log-concave.

Note that here the main definition is that of log-concavity, rather than log-convexity. This is because log-concave problems are usually maximisation problems.

So log-concavity is really nothing special. The reason it deserves its own definition is just because some convex optimisation problems are naturally maximisations of logarithmic functions, which are concave.

Log-concavity can be characterised with a Jensen's-like inequality. A function $f : \mathbb{R}^n \to \mathbb{R}$ with convex domain and $f(x)>0$ for all $x \in \textbf{dom }f$ is log-concave if and only if for all $x,y \in \textbf{dom }f$ and $0 \le \theta \le 1$ we have

$$
f(\theta x + (1-\theta)y) \ge f(x)^\theta f(y)^{1-\theta},
$$

which means that the value of a log-concave function at the average of two points it at least the geometric mean of the values at the two points.



**Example 4.23:** The affine function $f(x) = a^\top x + b$ is obviously log-concave on its domain, since $\log x$ is a concave function.

**Example 4.24:** For the same reason, the function $f(x)=x^\alpha$ on $\mathbb{R}_{++}$ is log-convex for $\alpha \le 0$ and log-concave for $\alpha\ge 0$.

**Example 4.25:** Exponentials $f(x)=e^{ax}$ are log-convex and log-concave.

**Example 4.26:** Many common probability densities are log-concave, such as the normal distribution

$$
f(x) =\frac{1}{\sqrt{(2\pi)^n \det \Sigma}}  e^{-\frac{1}{2}(x-\bar x)^\top\Sigma^{-1}(x-\bar x)}
$$

the cumulative Gaussian distribution

$$
f(x)=\frac{1}{2\pi} \int_{-\infty}^x e^{-u^2/2}du
$$

and the uniform distribution (since it is constant on a set and zero outside, it becomes constant on the set and $-\infty$ outside once we take the log).

Log-concavity of twice differentiable functions can be characterized by means of the Hessian. In fact, $f$ is log-concave if and only if $\textbf{dom }f$ is convex and for all $x\in\textbf{dom }f$

$$
f(x) \nabla^2 f(x) \preccurlyeq \nabla f(x) \nabla f(x)^\top.
$$

Log-concavity is preserved under product (because $\log f(x)g(x) = \log f(x) + \log g(x)$) and a special type of integration, but not under addition. The integration property is particularly important. If $f(x,y)$ is log-concave in $x \in \mathbb{R}^n$ and $y\in \mathbb{R}^m$ then

$$
g(x) = \int_{\mathbb{R}^m} f(x,y) dy
$$

is log-concave in $x$. This property is important because it implies that log-concavity is preserved under convolution and [probability](https://colab.research.google.com/drive/1WwNsPbW7-2PyVM_fXsOCUX7O7q8lQM3W#scrollTo=ZpyjDG4HTVEl).

**Example 4.27:** Suppose $D \subseteq \mathbb{R}^n$ is a convex set and $w$ is a random vector in $\mathbb{R}^n$ with log-concave probability density $p$. Then $f(x) = \textbf{prob}(x+w \in D)$ is log-concave in $x$. In fact,
$$
\textbf{prob}(x+w \in D) = \int g(x+w)p(w) dw
$$
where $g$ is such that 
$$
g(u)=\left\{\begin{array}{ll}1 & u\in D\\ 0 & u \not\in D\end{array}\right.
$$
The function $g$ is log-concave, $p$ is log-concave by assumption, their product is log-concave by the product rule and the integral is log-concave by the integration rule.

## 4.3.3 Convexity with Respect to Generalised Inequalities

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/ktXLWM_RIUw"></iframe>')

Convexity with respect to generalised inequalities is just what the name suggests.

Suppose $K \subseteq \mathbb{R}^m$ is a proper cone with associated generalized inequality $\preccurlyeq_K$. We say that $f : \mathbb{R}^n  \to \mathbb{R}^m$ is $K$-convex if for all $x$, $y$, and $0 \le \theta \le 1$

$$
f(\theta x + (1 − \theta)y) \preccurlyeq_K \theta f(x) + (1 − \theta)f(y).
$$

The function is strictly $K$-convex if

$$
f(\theta x + (1 − \theta)y) \prec_K \theta f(x) + (1 − \theta)f(y).
$$

for all $x \ne y$ and $0 < \theta < 1$. These definitions reduce to ordinary convexity and strict convexity when $m = 1$ and $K = \mathbb{R}_+$.

Many of the results for convex functions have extensions to $K$-convex functions. As a simple example, a function is $K$-convex if and only if its restriction to any line in its domain is $K$-convex. $K$-convexity can be characterised by means of first-order and second-order conditions. The composition rules hold as well. 

**Example 4.28:** Let $f : \mathbb{S}^m \to \mathbb{S}^m$ be defined as $f(X) = X^2$. $f$ is $\mathbb{S}^m_+$-convex (also called "**matrix convex**"). In fact, for fixed $z \in \mathbb{R}^m$, $z^\top X^2 z = || Xz ||_2^2$ is convex in $X$ because it is a norm or linear form in the components of $X$, i.e. by triangular inequality

$$
z^\top(\theta X + (1-\theta)Y)^2 z \le \theta z^\top X^2 z + (1- \theta)z^\top Y^2 z
$$

for $X,Y\in \mathbb{S}^m$, $0 \le \theta \le 1$. Hence, $(\theta X + (1-\theta)Y)^2 \preccurlyeq  \theta X^2  + (1- \theta) Y^2$.

# End of CHAPTER 4