# 4. Non-linear Classification

 * Nonlinear transformation on data, then just apply a linear classifier
 * We will use this *kernel trick* with our SVMs
 * In order to be able to do this, we will have to reformulate one of our primal SVM formulations into a dual formulation

## Background

Gathered from multiple sources, such as Wikipedia, the ETH DM lecture slides, mmds.org, and Stanford lectures on convex optimization (EE 364A, available on YouTube).

### Positive semidefiniteness
 * Only defined for square matrices (obvious, see definition below)
 * $X \in \mathbb{R}^{n \times n}$ positive definite $\iff z^TXz \ge 0 \> \forall z$
 * semi = non-strict inequality
 * Analogous definitions for negative and for non-semi variants

### Affine and convex sets
 * An **affine set** always also contains the line through any two  points in itself.
     - Any two points $x_1, x_2$; $x = \theta x_1 + (1 - \theta)x_2, \theta \in \mathbb{R}$
     - example: solution set of linear equations $\{x \> | \> Ax = b \}$.
 * A **convex set** always contains the line segment between any two points in itself.
     - Any two points $x_1, x_2$; $x = \theta x_1 + (1 - \theta)x_2, \theta \in [0, 1]$
     - example: from geometry, a square-shape is convex, a donut isn't
     - Intuition: an affine set is a more general form of a convex set
     - *Any affine set is also a convex set.*
     
### Convex combination and convex hull
 * A **convex combination** of points $x_1,\dots,x_k$: any point $x$ of the form:
     - $ x = \theta_1x_1 + \theta_2x_2 + \dots + \theta_kx_k $
     - with $\theta_1 + \dots + \theta_k = 1, \> \theta_i \ge 0$
     - omitting the second constraint $\implies$ linear combination
 * A **convex hull of a set S** ($\operatorname{conv} S$)  is the set of all convex combination points in S.
 
### Hyperplanes and halfspaces
 * A **hyperplane** is a set of the form $\{x \> | \> a^Tx = b\}, (a \ne 0)$
     - $a$ is the normal, $b$ is the intercept (or bias)
     - all hyperplanes are affine (and convex)
 * A **halfspace** is a set of the form $\{x \> | \> a^x < b\}, (a \ne 0)$
     - same terminology as for hyperplanes
     - halfspaces are not vector spaces (not closed under scalar multiplication; we can easily "escape it")
     - all halfspaces are convex
     
### Euclidean balls and ellipsoids
 * Euclidean = using the $l_2$ (Euclidean) norm in their description
 * An **(Euclidean) ball** of center $x_c$ and radius $r$ is
     - $B(x_c, r) = \left\{ x \> | \> \| x - x_c \|_2 \le r \right\} = \left\{x_c + ru \> | \> \|u\|_2 \le 1 \right\}$
 * An **ellipsoid** is a set of form:
     - $\left\{x\>|\> (x - x_c)^{T} P^{-1} (x - x_c) \le 1 \right\}$, with $P \in \mathbf{S}^n_{++}$ ($P$ symmetric, positive definite matrix)
     - generalization of the Euclidean ball
 * Both are convex sets
 
### Norms
 * Any function $\| \cdot \|$ which satisfies
     - positive; $\|x\| = 0 \iff x = 0$
     - $\|tx\| = \lvert t \rvert \|x\| \> \forall t \in \mathbb{R}$
     - $\|x + y \| \le \|x\| + \|y\|$
 * Can define a *norm ball* with any norm and radius. The Euclidean ball from the previous section was a particular example.
     - in $\mathbb{R}^2$, $\| \cdot \|_2$ is a diamond shape centered around the origin; $\| \cdot \|_\infty$ is a square.

### Polyhedra
 * Solution sets of finitely many linear equalities and inequalities
     - $Ax \preccurlyeq b$ (component-wise less than, since we're shoving a bunch of inequalities in a big matrix-vector multiplication followed by a vector-vector comparison) and $Cx = d$
 * Convex
 * Polyhedron is intersection of finite number of halfspaces and hyperplanes
 * Halfspaces and hyperplanes are special cases of polyhedra
 * Equalities turn our resulting polyhedron into a "slice" in that dimension

### Preserving convexity
Practical methods for establishing convexity of a set $C$

 * Apply definition
     $x_1, x_2 \in C, \> 0 \le \theta \le 1 \implies \theta x_1 + (1 - \theta)x_2 \in C$
     
 * Show that $C$ is obtained from simple convex sets (hyperplanes, halfspaces, norm balls, etc.) by operations that preserve convexity
    * intersection (any number); the intersection of any number of convex sets is convex
    * affine functions
    * perspective functions
    * linear-fractional functions
     
 * "Matlab" approach
    * Pick some random points from the space
    * Test points by brute force (or maybe just keep trying with $\theta = 0.5$)

### Affine functions
 * Suppose $f : \mathbb{R}^n \rightarrow \mathbb{R}^m$ is *affine* ($\> f(x) = Ax + b, A \in \mathbb{R}^{m \times n}, b \in \mathbb{R}^m$)
 * $f$ turns convex sets into convex sets
 * $f^{-1}$ does the same
 * Examples: scaling, translation, projection
 
### Perspective and linear-fractional functions
 * A **perspective function** $P : \mathbb{R}^{n+1} \rightarrow \mathbb{R}^{n}$:
     * $P(x, t) = x / t$, divide by last element and discard it.
     * Perspective functions and their inverses conserve convexity
 * A **linear-fractional function**
     * A generalization of the perspective function.
     * Maintains convexity
     
### Generalized inequalities and cones
 * A simple ray from the origin is not a proper cone.
 * The nonnegative orthant is a proper cone.
 * Can define point inequalities in proper cones. However, they don't define a general ordering.
 * Minimum vs. minimal has to deal with other, non-comparable points (which exist when we're not dealing with a general ordering)
 * A dual cone of a cone $K$:
     - $K^* = \left\{y \> \lvert \> y^Tx \ge 0 \> \forall \> x \in K \right\}$
     - Any vector in the dual cone makes an angle $\le 90$ degrees to any vector in the original cone
     - Some cones are self-dual (e.g. the positive orthant)
     - $(K^*)^* = K$ if $K$ is proper
 * Optimal production frontier problem
     - the optimal frontier vectors are vecotrs minimal w.r.r. $\mathbb{R}_{+}^{n}$
 
### Supporting hyperplane theorem
 * A **supporting hyperplane** to set $C$ at boundary point $x_0$: $\left\{x \> | \> a^T x = a^T x_0 \right\}$, where $a \ne 0$ and $a^Tx \le a^Tx_0 \> \forall x \in C$
 * Goes through $x_0$ and all of the set is on one side

## Duality