So far, we have been familiar with many optimization problems. Learning Machine Learning means learning Optimization Mathematics, and to understand Optimization Mathematics better, my best way is to learn Machine Learning algorithms.

So far, the optimization problems you have seen are all unconstrained optimization problems (***unconstrained optimization problems***), instantaneously optimizing the loss function without any exploitation conditions (***constraints***) experienced.

Not only in Machine Learning, in reality, optimization problems often have many different constraints. For example:

- I want to rent a house no more than 5km from the center of Hanoi at the lowest price possible. In this problem, the rental price is the loss function (sometimes people also use the cost function to refer to the function to be optimized), **the condition that the distance is not more than 5km is the constraint**.

In Optimization, a constrained problem is often written in the form:

$$ \mathbf{x}^{*} = \text{argmin}_{\mathbf{x}} f_{0}(\mathbf{x})

subject to:

$$ f_{i}(\mathbf{x}) \leq 0, \quad i = 1, 2, ..., m $$
$$ h_{i}(\mathbf{x}) = 0, \quad i = 1, 2, ..., p $$

Where:
- $\vec{x} = [x_1, x_2, \cdots, x_n]$ is the variable to be optimized.
- $f_0(\vec{x})$ is the objective function to be optimized. (often called the loss function)
- $f_i(\vec{x}), h_i(\vec{x})$ are the constraint.

Set of $\vec{x}$ that satisfies all constraints is called the **feasible set**. Each point in the feasible set is called a **feasible point**.

**NOTE**:
- If the problem is to find the maximum value instead of the minimum, we just need to change the sign of $f_0(\vec{x})$.
- If the constraint is greater than or equal to $f_i(\vec{x}) \geq \beta$, we can convert it to $-f_i(\vec{x}) \leq -\beta$. (Where $\beta$ is a constant)
- If constant is equal, thus $h_i(\vec{x}) = \beta$, we can write it under two types of constraints: $h_i(\vec{x}) \leq \beta$ and $-h_i(\vec{x}) \leq -\beta$.

Optimization problems generally **do not have a general solution**, and **some problems do not have a solution**. Most methods for finding solutions do not prove whether the solution is a global optimal or not, that is, whether the point that makes the function reach its minimum or maximum value is correct. Instead, the solutions are often local optimals, that is, extreme points.

# Convex sets

### Definition

**Definition 1**: A set is called a ***convex set*** if the line segment connecting any two points in the set lies completely within the set.

Some examples of convex sets:

![image.png](attachment:image.png)

Here are a few examples of nonconvex sets, that is, sets that are not convex:

![image.png](attachment:image.png)

**Definition 2**: A set $\mathcal{C}$ is called a ***convex set*** if for any $\vec{x}, \vec{y} \in \mathcal{C}$ and $\theta \in [0, 1]$, we have $\theta \vec{x} + (1 - \theta) \vec{y} \in \mathcal{C}$.

With these definitions, the entire space is a convex set because every line segment lies in the space. The empty set can also be considered a special case of a convex set.

# The intersection of convex sets is a convex set.

Take a look on the example below:

![image.png](attachment:image.png)

The intersection of two or all three convex sets is convex.

It is not difficult to prove this by Definition 2. If $\vec{x}_1, \vec{x}_2$ belongs to the intersection of convex sets, thus belongs to all given convex sets, then $\theta \vec{x}_1 + (1 - \theta) \vec{x}_2$ also belongs to all given convex sets.

From this we can deduce that the intersection of *halfspaces* and *hyperplanes* is also a convex set. In two-dimensional space, this convex set is a **convex polygon**, in three-dimensional space, it is called a **convex polyhedron**.

# Convex combination và Convex hulls

A point is called a ***convex combination*** of points $x_1, x_2, \cdots, x_n$ if it can be written as:

$$ x = \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n $$

Where $\theta_i \geq 0$ and $\sum_{i=1}^{n} \theta_i = 1$.

The **convex hull** of **any set** is the set of all points that are *convex combinations* of that set. The *convex hull* is a *convex set*. The **convex hull** of a **convex set** is **itself**. An easy way to remember is that the convex hull of a set is the smallest convex set that contains that set. The concept of smallest is difficult to define, but it is also an intuitive way to remember.

Two sets are called linearly separable if their convex hulls have no points in common.

**Separating hyperplane theorem**: This theorem states that *if two convex sets are not empty* $\mathcal{C}$, $\mathcal{D}$ are disjoint, then there is vector $\vec{a}$ and scalar $b$ such that:

$$ \vec{a}^T \vec{x} \leq b, \quad \forall \vec{x} \in \mathcal{C} $$
$$ \vec{a}^T \vec{x} \geq b, \quad \forall \vec{x} \in \mathcal{D} $$

A set of points $\vec{x}$ satisfy $\vec{a}^T \vec{x} = b$ is called a hyperplane. The hyperplane divides the space into two halfspaces. The hyperplane is the boundary of the halfspaces.

# Convex function

A function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is called a ***convex function*** if $\text{domf}$ is a convext set and:

$$ f(\theta \vec{x} + (1 - \theta) \vec{y}) \leq \theta f(\vec{x}) + (1 - \theta) f(\vec{y}) $$

With every $\vec{x}, \vec{y} \in \text{domf}$ and $\theta \in [0, 1]$. (This is called the ***Jensen's inequality***)

**Strictly convex function**: A function $f$ is called a ***strictly convex function*** if:

$$ f(\theta \vec{x} + (1 - \theta) \vec{y}) < \theta f(\vec{x}) + (1 - \theta) f(\vec{y}) $$

With every $\vec{x}, \vec{y} \in \text{domf}$, $\vec{x} \neq \vec{y}$, and $\theta \in (0, 1)$.

**Strigly concave function**: A function $f$ is called a ***strictly concave function*** if $-f$ is a strictly convex function.

**Properties of convex functions**:
- If $f(\vec{x})$ is *convex* then $\alpha f(\vec{x})$ is also *convex* for $\alpha \geq 0$ and $\alpha f(\vec{x})$ is *concave* for $\alpha \leq 0$.
- Sum of two convex functions is also a convex function, with definition set is the intersection of the two functions' definition set.
- **Pointwise maximum and supremum**: If all functions $f_i$ are convex, then this also a convext function:
$$f(\vec{x}) = \max_{i} f_i(\vec{x})$$

For further information, you can refer to the book "Convex Optimization" by Stephen Boyd and Lieven Vandenberghe.