# CHAPTER 3 - Convex Sets

---
---

**Author:** Dr Giordano Scarciotti (g.scarciotti@imperial.ac.uk) - Imperial College London 

**Module:** ELEC70066 - Advanced Optimisation

**Version:** 1.1.0 - 05/01/2023

---
---

The material of this chapter is adapted from $[1]$.

In this chapter we cover the definitions, theory and properties of convex sets. The material in this chapter is mathematically dense, but it is essential to understand convex optimisation. Contents:

*   Section 3.1 Affine Sets
*   Section 3.2 Convex Sets
*   Section 3.3 Examples of Convex Sets
*   Section 3.4 Operations that Preserve Convexity
*   Section 3.5 Generalized inequalities
*   Section 3.6 Separating and Supporting Hyperplanes
*   Section 3.7 Dual Cones and Generalised Inequalities



# 3.1 Affine Sets

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/OcAV_kGRhfc"></iframe>')

Suppose $x_1 \ne x_2$ are two points in $\mathbb{R}^n$. The **line** passing through $x_1$ and $x_2$ is formed by the points of the form

$$
y = \theta x_1 + (1-\theta) x_2
$$

where $\theta\in\mathbb{R}$. The parameter value $\theta = 0$ corresponds to $y = x_2$, and the parameter value $\theta=1$ corresponds to $y = x_1$. For $\theta>1$ the point $y$ lies beyond $x_1$ and for $\theta< 0$ the point $y$ lies beyond $x_2$. Values of $\theta$ such that $0\le \theta \le 1$ correspond to the (closed) **line segment** between $x_1$ and $x_2$.

<div>
<img src="https://drive.google.com/uc?export=view&id=1SAcPWwZLnQJm9CQ1fniKDeywaDsAjJav" width="600"/>
</div>

Figure 3.1. *The line passing through $x_1$ and $x_2$ described parametrically as $\theta x_1 + (1-\theta) x_2$, with $\theta \in \mathbb{R}$. The line segment between $x_1$ and $x_2$ is shown in red.*

A set $C\subseteq \mathbb{R}^n$ is an **affine set** if the line through any two dinstict points in $C$ lies in $C$, that is, $C$ is an affine set if every **affine combination** $\theta_1 x_1 + \cdots+ \theta_k x_k$, with $\theta_1 + \cdots+\theta_k = 1$, of its points  $x_1, \dots, x_k \in C$ belongs to $C$. If $C$ is an affine set and $x_0 \in C$, then the set $V = C − x_0 = \{x − x_0 : x \in C\}$ is a **subspace**, i.e., it is closed under sums and scalar multiplication. The **dimension** of an affine set $C$ is defined as the dimension of the subspace $V$.



**Exercise 3.1:** Prove that the solution set of a system of linear equations, $C = \{x : Ax = b\}$, where $A\in \mathbb{R}^{m \times n}$ and $b \in\mathbb{R}^m$, is an affine set. 

***EDIT THE FILE TO ADD YOUR PROOF HERE***

Note that the converse is true. Every affine set can be expressed as the solution set of a system of linear equations.

The set of all affine combinations of points in some set $C\subseteq \mathbb{R}^n$ is called the **affine hull** of $C$ and it is denoted by $\textbf{aff }C$. The idea is that if you are given a set and you want to "make it" affine, you can generate the affine hull, which is a larger set that contains the original set and it is affine (in fact, the affine hull is the smallest affine set containing the original set). The **affine dimension** of a set $C$ is the dimention of its affine hull.

**Example 3.1:** Consider the unit circle on the plane. It is not an affine set, because a line would either be tangent or cut through two points, with most of the line not lying on the circle. If you consider all the possible lines passing through the unit circle you will get the plane, which is in fact its affine hull. So even though the dimension of the circle is 1 (this should be known, you need only an angle to identify a point on the unit circle) its affine hull has dimension 2.



# 3.2 Convex Sets

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/Imnr2Iln_MA"></iframe>')

Consider a set $C$. A **convex combination** of the points $x_1, \dots, x_k \in C$ is a point of the form $\theta_1 x_1 + \cdots+ \theta_k x_k$, with $\theta_1 + \cdots+\theta_k = 1$ and $\theta_i\ge 0$, for all $i = 1,\dots, k$. A set $C$ is a **convex set** if and only if it contains every convex combination of its points. Roughly speaking, a set is convex if every point in the set can be seen by every other point, along an unobstructed straight path between them, where unobstructed means lying in the set. Every affine set is also convex, since it contains the entire line between any two distinct points in it, and therefore also the line segment between the points.

<div>
<img src="https://drive.google.com/uc?export=view&id=1Yp4luLpNFjxMvVbh10xyw320E3JBF6nH" width="800"/>
</div>

Figure 3.2. *Some simple convex and nonconvex sets. From left we have a convex set (parallelogram), and three nonconvex sets: kidney shaped set, triangle without all the boundary points, and a square with a hole.*

The convex hull of a set $C$, denoted by $\textbf{conv } C$ is the set of all convex combinations of points in $C$. Again, the idea is that if you have a set that is not convex and you want to "make it" convex, you can consider a larger set that contains the original set and is convex (in fact, the convex hull is the smallest convex set containing the original set).

<div>
<img src="https://drive.google.com/uc?export=view&id=1zM9--rs9galTzJno0NpMkhSggKqRoSNi" width="600"/>
</div>

Figure 3.3. *The convex hulls of two sets. Left: the convex hull of 17 points is a pentagon (in this case). Right: the convex hull of the kidney shaped set.*

A set $C$ is called a **cone** if for every $x \in C$ and $\theta \ge 0$ we have $\theta x \in C$. A set $C$ is a **convex cone** if it is convex and a cone.

A **conic combination** of the points $x_1, \dots, x_k \in C$ is a point of the form $\theta_1 x_1 + \cdots+ \theta_k x_k$, with $\theta_i\ge 0$, for all $i = 1,\dots, k$. A set $C$ is a **convex cone** if and only if it contains every conic combination of its points.

<div>
<img src="https://drive.google.com/uc?export=view&id=1astHDI_NUERHPDhQF1GhDKNCIM1tKpfi" width="300"/>
</div>

Figure 3.4. *Two-dimensional convex cone, i.e. pie slice with apex $0$ ($\theta_1=0$, $\theta_2=0$) and edges passing through $x_1$ ($\theta_1=1$, $\theta_2=0$) and $x_2$ ($\theta_1=0$, $\theta_2=1$).*

A **conic hull** of a set $C$ is the set of all conic combinations of points in $C$.

<div>
<img src="https://drive.google.com/uc?export=view&id=1S1JVGitfm_NzA7p23ZXlMON-KfCZixmv" width="600"/>
</div>

Figure 3.5. *The conic hulls of the two sets of Figure 3.3.*

# 3.3 Examples of Convex Sets

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/v2EDaCWBSX8"></iframe>')

Some trivial examples of affine and convex sets follow.

* The empty set $\emptyset$, any set with a single point (called singleton) $\{x_0\}$, and the whole space $\mathbb{R}^n$ are affine and (hence) convex.
* Any line is affine. If it passes through zero, it is a subspace and a convex cone.
* A line segment is convex. Obviously it is not affine (unless you consider the degenerate line segment constituted by a single point).
* A ray, i.e. a semi-line which has a terminal/origin point on one side and it goes forever on the other side, is convex, but not affine. It is a convex cone if the teminal/origin point is 0.
* Any subspace is affine and a convex cone.


**Exercise 3.2:** Prove each of the trivial statements above by using the corresponding definition.

***EDIT THE FILE TO ADD YOUR PROOF HERE***

A **hyperplane** is a set of the form $\{x : a^{\top}x = b\}$, where $a\in \mathbb{R}^n$, $a\ne 0$ and $b \in \mathbb{R}$. Analytically it is the solution set of a nontrivial linear equation among the components of $x$. Geometrically, the hyperplane is the set of points with zero inner product between the normal vector $a$ and the vector $x-x_0$, where $x_0$ is any point such that $a^{\top}x_0=b$ (just rewrite the equation as $a^{\top}x - b =a^{\top}(x-x_0) =0$). Obviously, a $0$ inner product indicates that the vectors $a$ and $x-x_0$ are perperdicular. Hyperplanes are affine and convex (trivially, because they are the solution set of linear equations). 



<div>
<img src="https://drive.google.com/uc?export=view&id=1MlMj0ycyq6ZIn_xzuZf6cpZtSDiS2OH9" width="400"/>
</div>

Figure 3.6. *Hyperplane in $\mathcal{R}^2$, with normal vector $a$ and a point $x_0$ in the hyperplane. For any point $x$ in the hyperplane, $x−x_0$ (shown as the darker
arrow) is orthogonal to $a$.*

A (closed) **halfspace** is a set of the form $\{x : a^{\top}x \le b\}$, where $a\in \mathbb{R}^n$. A geometric interpretation is that a halfspace is expressed by the set of points with negative inner product between the normal verctor $a$ and the vector $x-x_0$, where $x_0$ is any point such that $a^{\top}x_0=b$. Thus the angle is obtuse with respect to the outward normal. A hyperplane divides $\mathbb{R}^n$ in two halfspaces. The boundary of the halfspace $\{x : a^{\top}x \le b\}$ is the hyperplane $\{x : a^{\top}x = b\}$. The set $\{x : a^{\top}x < b\}$ is its interior and it is called **open halfspace**. Halfspaces are convex, but not affine (just trace the line through any two points which do not lie parallel to the boundary hyperplane and the line will exit the hyperspace eventually). 


<div>
<img src="https://drive.google.com/uc?export=view&id=12ECPR_OHQ8YNXoi5WdM4DC8DGlwkaDyR" width="400"/>
</div>

Figure 3.7. *A hyperplane defined by $a^\top x = b$ in $\mathbb{R}^2$ determines two halfspaces. The halfspace determined by $a^\top x \ge  b$ (not shaded) is the halfspace
extending in the direction $a$.*

<div>
<img src="https://drive.google.com/uc?export=view&id=1NNUaeZ5wV3-aT8mIQkACQdpDOqLsnCnP" width="400"/>
</div>

Figure 3.8. *The shaded set is the halfspace determined by $a^\top (x − x_0) \le 0$. The vector $x_1−x_0$ makes an acute angle with $a$, so $x_1$ is not in the halfspace. The vector $x_2 − x_0$ makes an obtuse angle with $a$, and so is in the halfspace.*

Suppose $|| \cdot ||$ is any norm on $\mathbb{R}^n$. The **norm ball** of radius $r$ and center $x_c$ is by $\{x : ||x-x_c|| \le r\}$. Any norm ball is convex. One specially important ball is the **Euclidean ball**, indicated by $\mathcal{B}(x_c, r) = \{x : ||x-x_c||_2 \le r\} = \{x : (x-x_c)^{\top}(x-x_c) \le r\}$ or equivalently as $\mathcal{B}(x_c, r) = \{x_c + ru : ||u||_2 \le 1\}$. A related concept is that of **ellipsoid** which is defined by $\{x : (x-x_c)^{\top}P^{-1}(x-x_c) \le 1\}$ where $P$ is symmetric and positive definite. Of course we recover the ball when $P=r^2 I$. 

**Exercise 3.3:** Prove that any norm ball and the ellipsoid are convex. 

***EDIT THE FILE TO ADD YOUR PROOF HERE***

<div>
<img src="https://drive.google.com/uc?export=view&id=1atcdYmhh-znkrxTp6EAsWItogiiFVE1m" width="300"/>
</div>

Figure 3.9. *An ellipsoid in $\mathbb{R}^2$.*

The **norm cone** is the set $C = \{(x,t) : ||x||\le t\} \subseteq \mathbb{R}^{n+1}$. Of course, it is convex. Of special interest is the **second-order cone** (aka quadratic cone, Lorentz cone, ice-cream cone), which uses the Euclidean norm.


<div>
<img src="https://drive.google.com/uc?export=view&id=1LfFXBjX5VI393hI7lAWxbo-6egQnLLXk" width="200"/>
</div>

Figure 3.10. *Second-order cone in $\mathbb{R}^3$, i.e. $\left\{(x_1,x_2,t): (x_1^2 + x_2^2)^{\frac{1}{2}}\le t\right\}$.*

A **polyhedron** (aka polytope) is defined as $\mathcal{P} = \{x: a_j^{\top}x \le b,\, j=1.\dots.m,\, c_j^{\top}x = d_j,\, j=1,\dots,p\}$. Hence, a polyhedron is the intersection of a finite number of halfspaces and hyperplanes. Affine sets (e.g., subspaces, hyperplanes, lines), rays, line segments, and halfspaces are all polyhedra. Polyhedra are convex (it will be obvious from the property of preservation under intersection which is introduced later). Polyheadra can be written with compact notation as $\mathcal{P} = \{x: Ax \preccurlyeq b,\, Cx=d\}$ where $A=[a_1, \cdots, a_m]^{\top}$ and $C = [c_1, \cdots, c_p]^{\top}$.




<div>
<img src="https://drive.google.com/uc?export=view&id=1nOaqDryVycTXFQRvH9qmMs-20xLAQQF-" width="500"/>
</div>

Figure 3.11. *A polyhedron $\mathcal{P}$ defined by the intersection of five halfspaces, with outward normals $a_1$, ..., $a_5$.*

An important example of polyhedron is the **nonnegative orthant** $\mathbb{R}_+^n =\{x\in\mathbb{R}^n : x \succcurlyeq 0\}$. It is also a cone, and it is is an example of *polyheadral cone*.

Other example of polyhedra are **simplexes** and convex hulls of set of finite points. If you are interested, see p. 32-34 of $[1]$ for more details.  

The final important example of convex sets is the **positive semidefinite cone** $\mathbb{S}^n_+ $. Consider the set of symmetric $n \times n$ matrices $\mathbb{S}^n = \{X \in \mathbb{R}^{n \times n} : X=X^{\top}\}$, which is a vector space of dimention $n(n+1)/2$. We define the set of symmetric positive semidefinite matrices as $\mathbb{S}^n_+ = \{X \in \mathbb{S}^n : X \succcurlyeq 0\}$ and the set of symmetric positive definite matrices as $\mathbb{S}^n_{++} = \{X \in \mathbb{S}^n : X \succ 0\}$.

The trick to make positive (semi)-definite algebra click is to realize that the symmetric matrix $X$ plays in $\mathbb{S}^n_+$ and $\mathbb{S}^n_{++}$ the same role that $x$ plays in $\mathbb{R}^n_+$ and $\mathbb{R}^n_{++}$. We knew that $\mathbb{R}^n_+$ is a convex cone, thus we expect $\mathbb{S}^n_+$ to be a convex cone (and indeed it is).

**Exercise 3.4:** Prove that the set $\mathbb{S}^n_+$ is a convex cone.

***EDIT THE FILE TO ADD YOUR PROOF HERE***

<div>
<img src="https://drive.google.com/uc?export=view&id=1ncczaRLulefW92wQ3kAbP8oHFqoHCCc9" width="400"/>
</div>

Figure 3.12. *Boundary of positive semidefinite cone in $\mathbb{S}^2$.*

# 3.4 Operations that Preserve Convexity



In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/GYKSCnMitSo"></iframe>')

In this section we describe some operations that preserve convexity of sets, or allow us to construct convex sets from others.


Convexity is preserved under **intersection**. This property extends to the intersection of an infinite number of sets (also subspaces, affine sets, and convex cones are closed under arbitrary intersections). As a simple example, a polyhedron is the intersection of halfspaces and hyperplanes (which are convex), and therefore is convex. 

Note that a converse holds. Every closed (so not all) convex set $S$ is an intersection of halfspaces. For instance, a closed convex set $S$ is the (uncountable) intersection of all halfspaces that contain it

$$
S = \bigcap \,\{\mathcal{H} : \mathcal{H} \text{ halfspace}, S \subseteq \mathcal{H}\}.
$$

**Example 3.2:** The positive semidefinite cone can be expressed as the uncountable intersection
$$
\mathbb{S}_+^n = \displaystyle \bigcap_{z\ne 0}\,\{X \in \mathbb{S}^n : z^{\top} X z \ge 0 \}.
$$

Note that $z^{\top} X z$ is a linear function of $X$, so the sets $\{X \in \mathbb{S}^n : z^{\top} X z \ge 0 \}$ are halfspaces in $\mathbb{S}^n $. Thus the positive semidefinite cone is convex because it is the intersection of an infinite number of halfspaces.


An **affine function** is a function $f : \mathbb{R}^n \to \mathbb{R}^m$ that is a sum of a linear function and a constant, i.e., if it has the form $f(x) = Ax + b$, where $A \in \mathbb{R}^{m\times n}$ and $b \in \mathbb{R}^m$. Suppose $S \subseteq \mathbb{R}^n$ is convex and $f : \mathbb{R}^n \to \mathbb{R}^m$ is an affine function. Then the **image** of $S$ under $f$ 

$$
f(S) = \{f(x) : x \in S\}
$$

is convex. Similarly, the **pre-image** (inverse image) of $S$ under an affine function $f$

$$
f^{-1}(S) = \{x : f(x) \in S\}
$$

is convex. Simple examples include **scaling** ($f(x)=\alpha x$), **translation** ($f(x)= x + a$), **projection** (e.g. $f(x_1,x_2)=x_1$), the **sum of two convex sets** (apply $f(x_1,x_2)=x_1+x_2$ to the set $S = S_1 \times S_2$, which is their catesian product, which is convex), **partial sum of two convex sets** (similar).


**Example 3.3:** The polyhedron $\{x : Ax \preccurlyeq b, Cx = d\}$ can be expressed as the inverse image of the cartesian product of the nonnegative orthant and the origin under the affine function $f(x) = (b-Ax, d-Cx)$, i.e.

$$
\{x : Ax \preccurlyeq b, Cx = d\} = \{x : f(x) \in \mathbb{R}^m_+ \times \{0\} \}.
$$

**Example 3.4:** The solution set of a linear matrix inequality (LMI), $\{x : A(x) \preccurlyeq B\}$, where $A(x) = x_1 A_1 + \dots + x_n A_n$, with $A_i \in \mathbb{S}^m$ and $B\in \mathbb{S}^m $, is convex. Indeed,
it is the inverse image of the positive semidefinite cone under the affine function $f : \mathbb{R}^n \to \mathbb{S}^m$ given by $f(x) = B − A(x)$.

**Example 3.5:** The ellipsoid is the image of the unit Euclidean ball under the affine mapping $f(u)=P^{-1/2}u+x_c$.

The **perspective function** $P: \mathbb{R}^n \times \mathbb{R}_{++} \to \mathbb{R}^n$ is defined as $P(z,t)=z/t$. The perspective function normalises vectors with respect to their last component. If a set $C$ is convex, then the image $P(C)=\{P(x) : x\in C\}$ is convex. A generalization is obtained by composing the perspective function with an affine function. This yields the **linear-fractional function** (aka projective function), i.e. let $g(x) = [A^\top c]^\top x + [b^\top d^\top]^\top$ be an affine function, then the linear-fractional function is $f(x) = P \circ g(x) = (Ax+b)/(c^\top x +d)$ with $\textbf{dom }f=\{x: c^{\top}x + d>0\}$. Like the perspective function, linear-fractional functions preserve convexity. This follows from the results above: a convex set through an affine function stays convex and in turn this set through the perspective function stays convex.


<div>
<img src="https://drive.google.com/uc?export=view&id=1IkhaKUsYi8b3efwti6ieZiAGYSbDzHwY" width="600"/>
</div>

Figure 3.13. *Left. A (nonconvex) set $C \subseteq \mathbb{R}^2$. The line shows the boundary of the domain of the linear-fractional function $f(x) = x/(x_1 + x_2 + 1)$. Right. Image of $C$ under $f$ (still nonconcex).*

# 3.5 Generalized Inequalities

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/xx2MXWp5sr4"></iframe>')

A cone $K \subseteq \mathbb{R}^n$ is called a **proper cone** if it is convex, closed, solid (i.e. non-empty interior) and pointed (i.e. it contains no line).

A proper cone can be used to define **generalised inequalities**. We associate with the proper cone $K$ the partial ordering on $\mathbb{R}^n$ defined by

$$
x \preccurlyeq_K y \iff y-x \in K.
$$

For the strict version, we define

$$
x \prec_K y \iff y-x \in \textbf{int }K.
$$
Similarly, we can define $x \succcurlyeq_K y$ and $x \succ_K y$.

Note that when $K=\mathbb{R}_+$, then the partial ordering $\preccurlyeq_K$ reduces to the usual ordering $\le$ on $\mathbb{R}$.

**Example 3.6:** The nonnegative orthant $K=\mathbb{R}_+^n$ is a proper cone. The associated generalized inequality $\preccurlyeq_K$ corresponds to componentwise inequality between vectors. This is a very common operation, so we drop the subscript, i.e. $\preccurlyeq$.

**Example 3.7:** The positive semidefinite cone $K=\mathbb{S}_+^n$ is a proper cone. The associated generalized inequality $\preccurlyeq_K$ is the usual matrix inequality: $X \preccurlyeq_K Y$ means $Y − X$ is positive semidefinite. This is a very common operation between symmetric matrices, so we drop the subscript, i.e. $\preccurlyeq$.




A generalized inequality $\preccurlyeq_K$ satisfies the properties:
*    $\preccurlyeq_K$ is preserved under addition: if $x\preccurlyeq_K y$ and $u\preccurlyeq_K v$, then $x+u\preccurlyeq_K y+v$.
*    $\preccurlyeq_K$ is transitive: if $x\preccurlyeq_K y$ and $y\preccurlyeq_K z$ then $x\preccurlyeq_K z$.
*    $\preccurlyeq_K$ is preserved under nonnegative scaling: if $x\preccurlyeq_K y$ and $\alpha \ge 0$ then $\alpha x\preccurlyeq_K \alpha y$.
*    $\preccurlyeq_K$ is reflexive: $x\preccurlyeq_K x$.
*    $\preccurlyeq_K$ is antisymmetric: if $x\preccurlyeq_K y$ and $y\preccurlyeq_K x$, then $x = y$.
*    $\preccurlyeq_K$ is preserved under limits: if $x_i\preccurlyeq_K y_i$ for $i = 1, 2, \dots$, and $x_i \to x$ and $y_i \to y$ as $i\to \infty$, then $x\preccurlyeq_K y$.

The corresponding strict generalized inequality $ \prec_K$ satisfies:
*    if $x \prec_K y$ then $x \preccurlyeq_K y$.
*    if $x \prec_K y$ and $u \preccurlyeq_K v$ then $x+u \prec_K y+v$.
*    if $x \prec_K y$ and $\alpha>0$  then $\alpha x \prec_K \alpha y$.
*    $x \not \prec_K x$ ($x$ is not strictly less than itself).
*    if $x \prec_K y$, then for $u$ and $v$ small enough, $x+u \prec_K y+v$.

The notation of generalised inequality (i.e., $\preccurlyeq_K$ and $\prec_K$) is meant to suggest an analogy with the ordinary inequality on $\mathbb{R}$ (i.e., $\le$, $<$). While many properties of ordinary inequality do hold for generalised inequalities, some important ones do not. The most obvious difference is that $\le$ on $\mathbb{R}$ is a **linear ordering**: any two points are **comparable**, meaning either $x \le y$ or $y \le x$. This property does not hold for generalised inequalities. One implication is that concepts like minimum and maximum are more complicated in the context of generalised inequalities.



We say that $x\in S$ is the **minimum** (**maximum**) element of $S$ (with respect to the generalized
inequality $\preccurlyeq_K$) if for every $y\in S$ we have $x \preccurlyeq_K y$ ($x \succcurlyeq_K y$). If a set has a minimum (maximum) element, then it is unique.

We say that $x\in S$ is a **minimal** (**maximal**) element of $S$ (with respect to the generalized inequality $\preccurlyeq_K$) if $y\in S$, $y \preccurlyeq_K x$ ($y \succcurlyeq_K x$) only if $y = x$.

We can describe minimum and minimal elements using simple set notation. A point $x \in S$ is the minimum element of $S$ if and only if $S \subseteq x + K$. Here $x + K$ denotes all the points that are comparable to $x$ and greater than or equal to $x$ (according to $\preccurlyeq_K$). A point $x \in S$ is a minimal element if and only if $(x − K) \bigcap S = \{x\}$. Here $x−K$ denotes all the points that are comparable to $x$ and less than or equal to $x$ (according to $\preccurlyeq_K$); so $x$ is minimal if and only if the only point in common of $x−K$ with $S$ is $x$.

An example of minimum and minimal elements is represented in the figure below.

<div>
<img src="https://drive.google.com/uc?export=view&id=1FKnZ0p9nqAm_RMPp7NyfwY4g571HIHSc" width="500"/>
</div>

Figure 3.14. *Left. The set $S$ has a minimum element $x_1$ with respect to the cone $K=\mathbb{R}_+^2$, i.e. with respect to componentwise inequality in $\mathbb{R}^2$. $x_1$ is the minimum element of $S$ since $S\subseteq x_1 + K$. Right. The point $x_2$ is a minimal point of $S$. The point $x_2$ is minimal because $x_2 − K$ and $S$ intersect only at $x_2$. Note that all the points on the same edge as $x_2$ are minimal.*

# 3.6 Separating and Supporting Hyperplanes

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/9XA6MmSltDY"></iframe>')

**Errata:** At 2:53 the video says "it has its boundary". It should be "it does not have its boundary".

An idea that is useful in convex optimisation is the the use of hyperplanes to separate convex sets that do not intersect. The basic result is the **separating hyperplane theorem**: Suppose $C$ and $D$ are nonempty disjoint convex sets. Then there exist a $a\ne 0$ and $b$ such that $a^\top x \le b$ for all $x\in C$ and $a^\top x \ge b$ for all $x\in D$.  The hyperplane $\{x : a^\top x = b\}$ is called a **separating hyperplane** for the sets $C$ and $D$. 





<div>
<img src="https://drive.google.com/uc?export=view&id=1z4Hix4dAUMh-fTLYCZd6QZm6O6gyJZ6M" width="500"/>
</div>

Figure 3.15. *The hyperplane $\{x : a^\top x = b\}$ separates the disjoint convex sets $C$ and $D$. The affine function $a^\top x − b$ is nonpositive on $C$ and nonnegative
on $D$.*

If the separating hyperplane constructed above satisfies the stronger condition that $a^\top x < b$ for all $x\in C$ and $a^\top x > b$ for all $x\in D$, then this is called **strict separation** of the sets $C$ and $D$. **ATTENTION:** Simple examples show that in general, disjoint convex sets need not be strictly separable by a hyperplane, i.e. the separating hyperplane theorem holds for the non-stric equality but not for the strict equality. For instance an open convex set $C$ and a singleton $D$ on its boundary. Another example with closed convex sets is given by $C = \{x \in \mathbb{R}^2 : x_2 \le 0\}$ and $D=\{x \in \mathbb{R}^2_+ : x_1 x_2 \ge 1\}$: these are closed disjoint non-empty convex sets which cannot be stricly separated.

The converse of the separating hyperplane theorem (i.e., existence of a separating
hyperplane implies that convex sets $C$ and $D$ do not intersect) is not true, unless one imposes
additional constraints on $C$ or $D$. As a simple counterexample, consider $C = D = \{0\} \subseteq \mathbb{R}$. Here the hyperplane $x = 0$ separates (non-stricly) $C$ and $D$ because $0$ is nonpositive on $C$ and nonnegative on $D$, but their intersection is not empty. However, by adding conditions on $C$ and $D$ various converse separation theorems can be derived. For instance, any two convex sets $C$ and $D$, where at least one of these is open, are disjoint if and only if there exists a separating hyperplane.

Suppose $C \subseteq \mathbb{R}^n$, and $x_0$ is a point in its boundary $\textbf{bd } C$, i.e.,
$x_0 \in  \textbf{bd } C = \textbf{cl } C \setminus \textbf{int } C$. If $a \ne 0$ satisfies $a^\top x \le a^\top x_0$ for all $x\in C$, then the hyperplane $\{x : a^\top x = a^\top x_0\}$ is called a **supporting hyperplane** to $C$ at the point $x_0$. This is equivalent to saying that the point $x_0$ and the set $C$ are separated by the hyperplane $\{x : a^\top x = a^\top x_0\}$. The geometric interpretation is that the hyperplane $\{x : a^\top x = a^\top x_0\}$ is tangent to $C$ at $x_0$, and the halfspace $\{x : a^\top x \le a^\top x_0\}$ contains $C$.


<div>
<img src="https://drive.google.com/uc?export=view&id=1KdDj8a_YsJOODSMh4rlgZUNZpPODF3Po" width="400"/>
</div>

Figure 3.16. *The hyperplane $\{x : a^\top x = a^\top x_0\}$ supports $C$ at $x_0$.*

A basic result, called the **supporting hyperplane theorem**, states that for any nonempty convex set $C$, and any $x_0 \in \textbf{bd } C$, there exists a supporting hyperplane to $C$ at $x_0$. There is also a partial converse of the supporting hyperplane theorem: if a set
is closed, has nonempty interior, and has a supporting hyperplane at every point
in its boundary, then it is convex.

# 3.7 Dual Cones and Generalised Inequalities

In [None]:
from IPython.display import HTML
HTML('<iframe width="850" height="480" src="https://www.youtube.com/embed/of-V0cmYi44"></iframe>')

Let $K$ be a cone. The set $K^∗ = \{y : x^\top  y \ge  0 \text{ for all } x \in K\}$ is called the **dual cone** of $K$. As the name suggests, $K^∗$ is a cone. The dual cone is always convex, even when the original cone $K$ is not. Geometrically, $y\in K^∗$ if and only if $−y$ is the (outer) normal of a hyperplane that supports $K$ at the origin.

<div>
<img src="https://drive.google.com/uc?export=view&id=1Zl1nIcflCC8utjP5GNs7xXGdZWjpTJL7" width="400"/>
</div>

Figure 3.17. *Left. The halfspace with inward normal $y$ contains the cone $K$, so $y\in K^∗$. Right. The halfspace with inward normal $z$ does not contain $K$, so $z \not \in K^∗$.*

Dual cones have the following properties:


*   $K^*$ is closed and convex.
*   If $K_1 \subseteq K_2$, then $K_2^* \subseteq K_1^*$.
*   If $K$ has nonempty interior, then $K^*$ is pointed.
*   If the closure of $K$ is pointed, then $K^*$ has nonempty interior.
*   $K^{**}$ is the closure of the convex hull of $K$. Thus if $K$ is convex and closed, then $K^{**}=K$.
*   If $K$ is a proper cone, then so is $K^*$ and $K^{**}=K$.


**Example 3.8:** The cone $\mathbb{R}^n_+$ is **self-dual**, i.e it is its own dual. In fact, $x^\top y \ge 0$ for all $x \succcurlyeq 0$ $\iff$ $y \succcurlyeq 0$.  

**Example 3.9:** As usual, what holds for $\mathbb{R}^n_+$ holds also for the positive semidefinite cone $\mathbb{S}^n_+$. $\mathbb{S}^n_+$ is self-dual with respect to the standard inner product $\textbf{tr}(XY)= \sum_{i,j=1}^n X_{ij}Y_{ij}$, i.e. $\textbf{tr}(XY)\ge 0 \text{ for all } X \succcurlyeq 0 \iff Y \succcurlyeq 0$.

**Exercise 3.5:** Prove the statement above.

***EDIT THE FILE TO ADD YOUR PROOF HERE***

**Example 3.10:** The dual of the norm cone $K = \{(x, t) \in \mathbb{R}^{n+1} : ||x|| \le t\}$ is the cone defined by the dual norm, i.e., $K^∗ = \{(u, v) \in \mathbb{R}^{n+1} : ||u||_* \le v\}$, where the dual norm is given by $||u||_∗ = \sup\{u^\top x : ||x|| \le 1\}$.

**Exercise 3.6:** (difficult) To prove the statement above you need to show that 

$$
x^\top u + tv \ge 0 \text{ whenever } ||x||\le t \iff ||u||_* \le v. \tag{1}
$$

In fact, the left-hand side of (1) is exactly the definition of dual cone where $y=(u,v)$ and $ \hat x=(x,t)$, i.e. a point $y=(u,v)$ in the dual cone is such that $\hat x^\top y =[x^\top\,\,t]^\top [u\,\,\,v] =x^\top u + tv\ge0$ for all $\hat x \in K$, which are all $\hat x$ such that $||x|| \le t$.

Prove $(1)$, i.e. that the points in the dual cone of $K$ are all, and only those, that satisfy $||u||_* \le v$.



***EDIT THE FILE TO ADD YOUR PROOF HERE***

Suppose that the convex cone $K$ is proper, so it induces a generalised inequality $\preccurlyeq_K$. Then its dual cone $K^∗$ is also proper, and therefore induces a generalised
inequality $\preccurlyeq_{K^*}$. Generalised inequalities and their dual are related by the following properties:
*   $x \preccurlyeq_K y$ if and only if $\lambda^\top  x \le \lambda^\top y$ for all $\lambda^\top \succcurlyeq_{K^*} 0$.
*   $x \prec_K y$ if and only if $\lambda^\top  x < \lambda^\top y$ for all $\lambda^\top \succcurlyeq_{K^*} 0$, $\lambda \ne 0$.

We can use dual generalized inequalities to characterize minimum and minimal elements.

$x$ is the minimum element of $S$, with respect to the generalized inequality induced by $K$, if and only if for all $\lambda \succ_{K^*} 0$, $x$ is the unique minimizer of $\lambda^\top z$ over $z \in S$. Geometrically, this means that for any $\lambda \succ_{K^*} 0$, the hyperplanes $\{z : \lambda^\top (z − x) = 0\}$ are strict supporting hyperplanes to $S$ at $x$ (by strict supporting hyperplanes, we
mean that the hyperplanes intersect $S$ only at the point $x$). Note that convexity of the set $S$ is not required.

<div>
<img src="https://drive.google.com/uc?export=view&id=1lyjIthsECxmLovnQes3JIVFvirLXBIu1" width="300"/>
</div>

Figure 3.18. *Dual characterization of minimum element. The point $x$ is the minimum element of the (non-convex) set $S$ with respect to $\mathbb{R}^2_+$. This is equivalent to: for every $\lambda \succ 0$, the hyperplane $\{z : \lambda^\top (z − x) = 0\}$ strictly supports $S$ at $x$.*

$x$ is a minimal element of $S$, with respect to the generalized inequality induced by $K$, if (**N.B.** this is only sufficient) $\lambda \succ_{K^*} 0$ and $x$ minimizes $\lambda^\top z$ over $z \in S$. 

<div>
<img src="https://drive.google.com/uc?export=view&id=1Ffu9MkMuzXUDkPaTkSTkfCxfkmGwkUyb" width="400"/>
</div>

Figure 3.19. *A set $S \subseteq \mathbb{R}^2$. Its set of minimal points, with respect to $\mathbb{R}^2_+$, is shown as the darker section of its (lower, left) boundary. The minimizer of $\lambda_1^\top z$ over $S$ is $x_1$, and is minimal since $λ_1 \succ 0$. The minimizer of $\lambda_2^\top z$ over
$S$ is $x_2$, which is another minimal point of $S$, since $\lambda_2 \succ 0$.*

The converse is in general false: a point $x$ can be minimal in $S$, but not a minimizer of $\lambda^\top z$ over $z \in S$ for any $\lambda$. 

<div>
<img src="https://drive.google.com/uc?export=view&id=1diM3eYC120BGvx6pceCyRthqD_fxzQST" width="300"/>
</div>

Figure 3.20. *The point $x$ is a minimal element of $S \subseteq \mathbb{R}^2$ with respect to $\mathbb{R}^2_+$. However there exists no $\lambda$ for which $x$ minimizes $\lambda^\top z$ over $z \in S$.*

One can create a converse statement by increasing the assumptions. Provided the set $S$ is convex, we can say that for any minimal element $x$ there exists a nonzero $\lambda \succcurlyeq_{K^*} 0$ (**N.B.** weaker than $\lambda \succ_{K^*} 0$) such that $x$ minimizes $\lambda^\top z$ over $z \in S$.

This converse theorem cannot be strengthened to $\lambda \succ_{K^*} 0$. In fact, examples show that a point $x$ can be a minimal point of a convex set $S$, but not a minimizer of $\lambda^\top z$ over $z \in S$ for any $\lambda \succ_{K^*} 0$. Nor is it true that any minimizer of $\lambda^\top z$ over $z \in S$, with $\lambda \succcurlyeq_{K^*} 0$, is minimal. These two cases are illustrated in the figure below.

<div>
<img src="https://drive.google.com/uc?export=view&id=1aGeMpMatrpBvOi3k1q1IB3AbHnhS0hz_" width="400"/>
</div>

Figure 3.21. *Left. The point $x_1 \in S_1$ is minimal, but is not a minimizer of $\lambda^\top z$ over $S_1$ for any $\lambda \succ 0$. (It does, however, minimize $\lambda^\top z$ over $z \in S_1$ for
$\lambda = (1, 0)$). Right. The point $x_2 \in S_2$ is not minimal, but it does minimize
$\lambda^\top z$ over $x_2 \in S_2$ for $\lambda  = (0, 1) \succcurlyeq  0$.*

**Example 3.11:** Consider a product which
requires $2$ resources, such as labor and fuel, to manufacture. The product can be manufactured or produced in many ways. With each production
method, we associate a resource vector $x \in\mathbb{R}^2$, where $x_i$ denotes the amount of resource $i$ consumed by the method to manufacture the product. We assume that $x_i \ge 0$ (i.e., resources are consumed) and that the resources are valuable. The production set $P\subseteq \mathbb{R}^n$ is defined as the set of all resource vectors $x$ that correspond to some production method.

Production methods with resource vectors that are minimal elements of $P$, with respect to componentwise inequality, are called **Pareto optimal** or efficient. The set of minimal elements of $P$ is called the **efficient production frontier**. The interpretation of Pareto optimality is that one production method
is better than another if it uses no more of each resource than another method, and
for at least one resource, actually uses less. Thus, a production method is Pareto optimal if there is no better production method.

Pareto optimal production methods can be found using the (sufficient, but not necessary) dual characterisation of minimal elements, i.e. we want to minimize $\lambda^\top  x = \lambda_1x_1 + \dots + \lambda_n x_n$ over the set $P$ of production vectors, using any $\lambda \succ 0$. The component $\lambda_i$ can be interpreted as the price of resource $i$. By minimizing $\lambda^\top x$ over $P$ we are finding the overall cheapest production method (for
the resource prices $λ_i$).

<div>
<img src="https://drive.google.com/uc?export=view&id=1QItupMPLwurWq4mizkVXYaQE3OD8EEdS" width="500"/>
</div>

Figure 3.22. *The production set $P$, for a product that requires labor and fuel to produce, is shown shaded. The two dark curves show the efficient production frontier. The points $x_1$, $x_2$ and $x_3$ are efficient. The points $x_4$
and $x_5$ are not (since in particular, $x_2$ corresponds to a production method that uses no more fuel, and less labor). The point $x_1$ is also the minimum cost production method for the price vector $\lambda$ (which is positive). The point $x_2$ is efficient, but cannot be found by minimizing the total cost $\lambda^\top x$ for any
price vector $λ\succcurlyeq0$. Source: page 58 of $[1]$.*

# End of CHAPTER 3