# Support Vector Machine Solvers

### Description

Given $m$ data points $x_i \in \mathbb{R}^n$ with labels $y_i \in \{−1, 1\}$. We aim to solve 
the classification problem 

\begin{align}
\mathrm{minimize} \enspace & \frac{1}{2} \lVert w\rVert_2^2 + C\mathbf{1}^\top z
    \hspace{2cm}\\
\mathrm{subject\thinspace to} \enspace & y_i(w^\top x_i) \ge 1-z_i, i = 1, ..., m
    \hspace{2cm}\\
& z \ge 0
\end{align}

in the variables $w \in \mathbb{R}^n, z \in \mathbb{R}^m$, and its dual. Solving this problem trains a classifier vector $w$ such that, up to some errors

\begin{align}
w^\top x_i > 0 &\enspace \mathrm{when}\enspace y_i = 1 \\
w^\top x_i < 0 &\enspace \mathrm{when}\enspace y_i = −1.
\end{align}

This classifier can then be used to classify new points $x$ as positives or negatives by simply computing the scalar product $w^\top x$.

### Dual

We can also form the dual problem

\begin{align}
\mathrm{maximize} \enspace & 
    -\frac{1}{2} \lVert \sum_{i=1}^m \alpha_i y_i x_i\rVert_2^2 + \mathbf{1}^\top \alpha
    \hspace{2cm}\\
\mathrm{subject\thinspace to} \enspace & 0 \le \alpha \le C
    \hspace{2cm}
\end{align}

in the variable $\alpha\in\mathbb{R}^m$.

### Barrier Method

At each centering step we want to minimize the function $tf + \phi$ where

\begin{align}
f &= \frac{1}{2} \lVert w\rVert_2^2 + C\mathbf{1}^\top z, \\
\phi &= -\sum_{i=1}^m(\log(y_i(w^\top x_i)+z_i-1) + \log(z_i)).
\end{align}

We need thus to compute its gradient and Hessian for each Newton step. The detailed implementation can be found in the file `barrier.jl`. To test the algoritm, I'll sample data from two bivariate Gaussian distributions with different moments. Some useful functions are defined in `common.jl`.

In [379]:
using Distributions
using Plots
pyplot()

include("barrier.jl")
include("common.jl")

plotdualitygap (generic function with 2 methods)

Start with an easy example where the generated data for the two classes have high probability to be linearly separable.

In [398]:
numdatapoints = 300
gaussianA = MvNormal([5.; 5.], [2. 2.; 2. 3.])
gaussianB = MvNormal([-2.; -8.], [5. 1.; 1. 3.])
cloudA = rand(gaussianA , numdatapoints)
cloudB = rand(gaussianB, numdatapoints)
plotclouds(cloudA, cloudB)

In [400]:
xs = linspace(-7.5, 9, 100)
ys = linspace(-15, 10, 100)
contour(xs, ys, (x, y) -> pdf(gaussianA, [x; y]))
contour!(xs, ys, (x, y) -> pdf(gaussianB, [x; y]))

In [227]:
X = [[cloudA cloudB]' ones(2*numdatapoints)]
Y = [-1.*ones(numdatapoints); ones(numdatapoints)]
w, alpha, numstepsarray = svmbarrier(X, Y, 10, 1e-4)

([-0.228401, -0.432727, -0.167144], [3.29815e-9, 2.07791e-9, 2.94043e-9, 4.83262e-9, 3.27998e-9, 2.05385e-9, 3.48354e-9, 2.6491e-9, 3.90993e-9, 2.73843e-9  …  3.03539e-9, 2.88547e-9, 3.27735e-9, 2.30873e-9, 3.0797e-9, 2.39e-9, 4.20634e-9, 2.69231e-9, 2.8367e-9, 1.73028e-9], [41, 24, 80, 31, 21])

In [228]:
plotclouds(cloudA, cloudB)
drawborder((x, y) -> [x; y; 1]⋅w)

In [230]:
plotdualitygap(2*numdatapoints, 100., numstepsarray)

In [232]:
for C in [1e-5, 1e-3, 1, 10, 100, 1000]
    w, alpha, numstepsarray = svmbarrier(X, Y, C, 1e-4)
    println("When C = $C, the error rate is $(errorrate(gaussianA, gaussianB, w, Int(1e6)))")
end

When C = 1.0e-5, the error rate is 0.000736
When C = 0.001, the error rate is 0.000657
When C = 1.0, the error rate is 0.000492
When C = 10.0, the error rate is 0.000469
When C = 100.0, the error rate is 0.000481
When C = 1000.0, the error rate is 0.000486


In [353]:
numdatapoints = 200
gaussianA = MvNormal([2.; 2.], [7. 2.5; 2.5 3.])
gaussianB = MvNormal([-3.; 1.], [2. -0.3; -0.3 5.])
cloudA = rand(gaussianA , numdatapoints)
cloudB = rand(gaussianB, numdatapoints)
plotclouds(cloudA, cloudB)

In [397]:
xs = linspace(-7.5, 9, 100)
ys = linspace(-5, 7, 100)
contour(xs, ys, (x, y) -> pdf(gaussianA, [x; y]))
contour!(xs, ys, (x, y) -> pdf(gaussianB, [x; y]))

In [343]:
X = [[cloudA cloudB]' ones(2*numdatapoints)]
Y = [-1.*ones(numdatapoints); ones(numdatapoints)]
w, alpha, numstepsarray = svmbarrier(X, Y, 100, 1e-4)

([-0.898604, 0.0495682, -1.59737], [8.9274e-8, -8.53468e-7, 4.18613e-7, -1.32076e-6, 1.0e-6, 2.00247e-7, 9.66419e-6, 3.3974e-6, 3.19182e-7, 3.39758e-7, 1.0e-6, 1.40731e-6, 9.72417e-6, 1.0e-6, 3.91432e-7, 5.42379e-6, -1.99136e-6, 5.28729e-7, 3.69762e-7, 6.5461e-7], [46, 22, 21, 21])

In [344]:
plotclouds(cloudA, cloudB)
drawborder((x, y) -> [x; y; 1]⋅w)

In [345]:
for C in [1e-5, 1e-3, 1, 10, 100, 1000]
    w, alpha, numstepsarray = svmbarrier(X, Y, C, 1e-4)
    println("When C = $C, the error rate is $(errorrate(gaussianA, gaussianB, w, Int(1e6)))")
end

When C = 1.0e-5, the error rate is 0.23736
When C = 0.001, the error rate is 0.237171
When C = 1.0, the error rate is 0.208094
When C = 10.0, the error rate is 0.264376
When C = 100.0, the error rate is 0.264955
When C = 1000.0, the error rate is 0.265797


In [297]:
errorrate(classA, classB, w, Int(1e6))

0.179395