# Mathematical Formulation of SVMs:

## Linear SVM for Separable Data


**Intro** \
Given a training dataset of $n$ points of the form $(xᵢ, yᵢ)$ \
where 
- $xᵢ ∈ ℝᵈ$
- $yᵢ ∈ \{-1, 1\}$ 

We want to **find the hyperplane that maximizes the margin between the two classes**


The hyperplane is defined as $w · x - b = 0$
where 
- $w$ is the normal vector to the hyperplane

We want to maximize the margin $2 / ||w||$ subject to the constraint:
$yᵢ(w · xᵢ - b) ≥ 1$ for $i = 1, ..., n$

This can be formulated as a quadratic optimization problem:
$$minimize ~(1/2)||w||²$$
subject to $yᵢ(w · xᵢ - b) ≥ 1$ for $i = 1, ..., n$

---

## Soft Margin SVM (for Non-Separable Data)
We introduce slack variables $ξᵢ ≥ 0$ to allow for misclassification:
$$minimize ~ (1/2)||w||² + C Σᵢ ξᵢ$$
subject to $yᵢ(w · xᵢ - b) ≥ 1 - ξᵢ$ and $ξᵢ ≥ 0$ for $i = 1, ..., n$
Here, $C > 0$ is a regularization parameter that controls the trade-off between maximizing the margin and minimizing the training error.

---

## Dual Form

Using Lagrange multipliers, we can derive the dual form:
$$maximize~ Σᵢ αᵢ - (1/2)Σᵢ Σⱼ αᵢαⱼyᵢyⱼ(xᵢ · xⱼ)$$
subject to $0 ≤ αᵢ ≤ C$ and $Σᵢ αᵢyᵢ = 0$
The optimal $w$ is given by $Σᵢ αᵢyᵢxᵢ$

## Kernel Trick:

For non-linearly separable data, we can use the kernel trick. We replace the dot product $xᵢ · xⱼ$ with a kernel function $K(xᵢ, xⱼ)$.
Common kernels include:

- Linear: $K(xᵢ, xⱼ) = xᵢ · xⱼ$
- Polynomial: $K(xᵢ, xⱼ) = (γxᵢ · xⱼ + r)ᵈ$
- RBF (Gaussian): $K(xᵢ, xⱼ) = exp(-γ||xᵢ - xⱼ||²)$

