# Training a SVM with gradient descent

In this notebook I show how to derivate an algorithm to train a SVM using projected gradient descent. In practice SVM are trained using more nuance algorithms like [SMO](https://en.wikipedia.org/wiki/Sequential_minimal_optimization). However as suggested in [CIML](http://ciml.info/), SVMs can also be trained using projected gradient descent.

In [1]:
import sys
import numpy as np
import matplotlib.pyplot as plt

sys.path.append("..")
from models.svm import SVM
from utils.datasets import blobs_classification_dataset
from utils.visualization import plot_decision_boundary

In [2]:
%matplotlib inline

# Turn interactive plotting off
plt.ioff()

# Reproducibility
np.random.seed(1)

## Maximizing the margin

The most simple version of the SVM is the problem of finding the optimal separating hyperplane between 2 classes which are lineary separable. The optimal plane is that that maximizes the margin M defined as the distance from the plane to its closest point in the training set. Let $f(x) = x^T\beta + \beta_0$ be the separating hyperplane. Then we can solve the optimization problem:
$$
\max_{\beta, \beta_0, \|\beta\|=1} M \\
\text{subject to: } y_i(x_i^T\beta + \beta_0) \geq M, i=1,2,..,N
$$
where $y_i \in {-1, 1}$ is the label of the $i$-th training example $x_i$. The $\|\beta\|=1$ constrain can be avoided setting the condition to $\frac{1}{\|\beta\|}y_i(x_i^T\beta + \beta_0) \geq M, i=1,2,..,N$


## Dealing with noise: Hinge Loss

## Optimizing the dual form