### Kernelizing SVMs

- Recall the soft and hard margin SVMs. 
- Recall that the SVM can also be defined in terms of the hinge loss. 
- Goal is to kernelize SVMs: rewrite the optimization problem so that it no longer depends on $\phi(x_n)$ but only on inner products $\phi(x_n)^{T}\phi(x_m)$. If we can do this, then we can replace the inner products with a kernel function $k(x_n, x_m)$. 
- First we will need to learn about constained optimization problems to kernelize SVMs. 

### Solving Constrained Optimization problems

#### Primal Problem

- Done using Lagrange multipliers
- General form: 
- $min_{x} f_0(x)$ s.t. $f_i(x) \leq 0, i \in [1...M], h_{j}(x) = 0, j \in [1...p] $
- This is called the **primal** problem where we have the variable $x$ and $m$ inequality constraints and $p$ equality constraints. 
- We can define the Lagrangian for the primal problem as follows: 
- $L(x, \alpha, \beta) = f_0(x) + \sum_{i=1}^{m} \alpha_i f_i(x) + \sum_{j=1}^{n} \beta_j h_j(x) $
- This is composed of the function that we want to minimize as well as constants called the lagrange multiplier for each of the inequality and equality constraints. 
- Consider the function $max_{\alpha, \beta, \alpha_i \geq 0} L(x,\alpha,\beta)$. If $x$ violates a primal constraint, then we can pick values of $\alpha_i$ and $\beta_j$ that make this function approach infinity. Otherwise, if $x$ does not violate any primal constraints, then the third term vanishes. And since we will always have $f_i(x) \leq 0$ but we must have $\alpha_i \geq 0$, then all $\alpha_i$ will be set to $0$ to maximize the function, and the function will be equal $f_0(x)$. 
- We can show that $min_{x} max_{\alpha, \beta, \alpha_i \geq 0} L(x,\alpha,\beta)$ has the same solution as the primal problem, which is denoted with $p*$
- So if we are given a primal problem in constrained optimization form, we can define the Langrangian for it and solve $min_{x} max_{\alpha, \beta, \alpha_i \geq 0} L(x,\alpha,\beta)$.

#### Dual Problem
- let $g(\alpha, \beta) = min_{x} L(x, \alpha, \beta)$. 
- Then the dual problem is given by $d* = max_{\alpha, \beta, \alpha > 0} g(\alpha, \beta) = max_{\alpha, \beta, \alpha > 0} min_{x} L(x, \alpha, \beta)$
- Primal and dual are the same, except we switched the order of the min and max. 
- In the primal problem, we took the Lagrangian, found the $\alpha, \beta$ that maximized it, and then found the $x$ that minimized that result. 
- In the dual problem, we took the Lagrangian, found the $x$ that minimized it, and then found the $\alpha, \beta$ that maximized that result.

#### Relationship
- In general, we have weak duality: $d* \leq p*$ but for SVMs, strong duality holds: $d* = p*$ and this is the key equality that lets us kernelize SVMs. Instead of the primal problem, we can now solve the dual problem, which may turn out to be easier to solve. 




### Dual Formation for SVM
- Rewrite primal form of SVM in terms of dual
- Primal: $min_{w,b, \zeta} \frac{1}{2} ||w||_2^{2} + C \sum_{n}\zeta_n $ s.t. $y_n(w^{T}\phi(x_n) + b) \geq 1 - \zeta_n, n \in [1...N]$ and $\zeta_n \geq 0, n \in [1...N] $
- Dual: 
- $ max_{\alpha} 
