# Support Vector Machine Solvers

### Description

Given $m$ data points $x_i \in \mathbb{R}^n$ with labels $y_i \in \{−1, 1\}$. We aim to solve 
the classification problem 

\begin{align}
\mathrm{minimize} \enspace & \frac{1}{2} \lVert w\rVert_2^2 + C\mathbf{1}^\top z
    \hspace{2cm}\\
\mathrm{subject\thinspace to} \enspace & y_i(w^\top x_i) \ge 1-z_i, i = 1, ..., m
    \hspace{2cm}\\
& z \ge 0
\end{align}

in the variables $w \in \mathbb{R}^n, z \in \mathbb{R}^m$, and its dual. Solving this problem trains a classifier vector $w$ such that, up to some errors

\begin{align}
w^\top x_i > 0 &\enspace \mathrm{when}\enspace y_i = 1 \\
w^\top x_i < 0 &\enspace \mathrm{when}\enspace y_i = −1.
\end{align}

This classifier can then be used to classify new points $x$ as positives or negatives by simply computing the scalar product $w^\top x$.

### Dual

We can also form the dual problem

\begin{align}
\mathrm{maximize} \enspace & 
    -\frac{1}{2} \lVert \sum_{i=1}^m \alpha_i y_i x_i\rVert_2^2 + \mathbf{1}^\top \alpha
    \hspace{2cm}\\
\mathrm{subject\thinspace to} \enspace & 0 \le \alpha \le C
    \hspace{2cm}
\end{align}

in the variable $\alpha\in\mathbb{R}^m$.

Solving the dual problem solves the primal problem at the same time since we know that at the optimum, we must have

$$ w = \sum_{i=1}^m \alpha_i y_i x_i.$$

If we note $X$ the data matrix with rows $x_i$, the dual can then be written as

\begin{align}
\mathrm{maximize} \enspace & 
    -\frac{1}{2} \alpha^{\top}\mathrm{diag}(y)XX^{\top}\mathrm{diag}(y)\alpha
    + \mathbf{1}^\top \alpha
    \hspace{2cm}\\
\mathrm{subject\thinspace to} \enspace & 0 \le \alpha \le C.
    \hspace{2cm}
\end{align}

The data only appears through the matrix $XX^T$. We are thus able to solve the dual by the kernel tricks. This is particularly useful when the dimension $n$ of the feature space is very high.

### Barrier Method

At each centering step we want to minimize the function $tf + \phi$ where

\begin{align}
f &= \frac{1}{2} \lVert w\rVert_2^2 + C\mathbf{1}^\top z, \\
\phi &= -\sum_{i=1}^m(\log(y_i(w^\top x_i)+z_i-1) + \log(z_i)).
\end{align}

We need thus to compute its gradient and Hessian for each Newton step. The detailed implementation can be found in the file `barrier.jl`. To test the algoritm, I'll sample data from two bivariate Gaussian distributions with different moments. Some useful functions are defined in `common.jl`.

In [2]:
using Distributions
using Plots
pyplot()

include("barrier.jl")
include("common.jl")

plotdualitygap (generic function with 1 method)

#### A first example

Start with an easy example where the generated data for the two classes have high probability to be linearly separable.

In [2]:
numdatapoints = 100
gaussianA = MvNormal([5.; 5.], [2. 2.; 2. 3.])
gaussianB = MvNormal([-2.; -8.], [5. 1.; 1. 3.])
cloudA = rand(gaussianA , numdatapoints)
cloudB = rand(gaussianB, numdatapoints)
plotclouds(cloudA, cloudB)

In [3]:
xs = linspace(-7.5, 9, 100)
ys = linspace(-15, 10, 100)
contour(xs, ys, (x, y) -> pdf(gaussianA, [x; y]))
contour!(xs, ys, (x, y) -> pdf(gaussianB, [x; y]))

Compute the line that separates the two classes of data and plot duality gap versus iteration number.

In [4]:
X = [[cloudA cloudB]' ones(2*numdatapoints)]
Y = [-1.*ones(numdatapoints); ones(numdatapoints)]
w, α, numstepsarray = svmbarrier(X, Y, 10, 1e-4)

([-0.336833, -0.369152, -0.290642], [3.03515e-9, 3.14314e-9, 3.09477e-9, 3.64491e-9, 5.05257e-9, 4.80063e-9, 7.99688e-9, 6.2593e-9, 2.20604e-9, 4.07674e-9  …  4.93043e-9, 4.64329e-9, 5.19705e-9, 4.82052e-9, 4.44585e-9, 3.09003e-9, 3.63057e-9, 7.87702e-8, 5.71562e-9, 2.58971e-9], [39, 43, 92, 30, 21])

In [5]:
plotclouds(cloudA, cloudB)
drawborder((x, y) -> [x; y; 1]⋅w)

In [6]:
plotdualitygap(2*numdatapoints, 100., numstepsarray)

This is the ideal curve that we get supposing that for each value of $t$ we're able to get the true optimum of the funtion. This is apparently not the case and when $t$ gets larger it becomes often more and more difficult to find a better solution. However, below we compute the real duality gap from the $x$ and $\alpha$ that we obtained and show that the estimation is rather correct.

In [7]:
C = 10
Q = Diagonal(Y)*X*X'*Diagonal(Y)
primal(w) = sum(w.^2)/2 + C*sum(max.(0, ones(Y)-Y.*(X*w)))
dual(α) = -α⋅(Q*α)/2 + sum(α)

primal(w) - dual(α)

1.9996556768209217e-6

Run the algorithm with different values of $C$ and measure out-of-sample performance. However, probably since the problem to solve is too easy here, we don't see a great different between the use of different $C$s (smaller is better?).

In [8]:
for C in [1e-5, 1e-3, 1, 10, 100, 1000]
    w, α, numstepsarray = svmbarrier(X, Y, C, 1e-4)
    println("When C = $C, the error rate is $(errorrate(gaussianA, gaussianB, w, Int(1e6)))")
end

When C = 1.0e-5, the error rate is 0.000752
When C = 0.001, the error rate is 0.000671
When C = 1.0, the error rate is 0.001355
When C = 10.0, the error rate is 0.001446
When C = 100.0, the error rate is 0.001447
When C = 1000.0, the error rate is 0.001464


#### Example 2

Now try with another example where the generated data are most of the time not linearly separable.

In [9]:
numdatapoints = 200
gaussianA = MvNormal([2.; 2.], [7. 2.5; 2.5 3.])
gaussianB = MvNormal([-3.; 1.], [2. -0.3; -0.3 5.])
cloudA = rand(gaussianA , numdatapoints)
cloudB = rand(gaussianB, numdatapoints)
plotclouds(cloudA, cloudB)

In [10]:
xs = linspace(-7.5, 9, 100)
ys = linspace(-5, 7, 100)
contour(xs, ys, (x, y) -> pdf(gaussianA, [x; y]))
contour!(xs, ys, (x, y) -> pdf(gaussianB, [x; y]))

In [11]:
X = [[cloudA cloudB]' ones(2*numdatapoints)]
Y = [-1.*ones(numdatapoints); ones(numdatapoints)]
w, α, numstepsarray = svmbarrier(X, Y, 100, 1e-4)

plotclouds(cloudA, cloudB)
drawborder((x, y) -> [x; y; 1]⋅w)

In [12]:
plotdualitygap(2*numdatapoints, 100., numstepsarray)

In [13]:
for C in [1e-5, 1e-3, 1, 10, 100, 1000]
    w, α, numstepsarray = svmbarrier(X, Y, C, 1e-4)
    println("When C = $C, the error rate is $(errorrate(gaussianA, gaussianB, w, Int(1e6)))")
end

When C = 1.0e-5, the error rate is 0.226528
When C = 0.001, the error rate is 0.228764
When C = 1.0, the error rate is 0.208533
When C = 10.0, the error rate is 0.209726
When C = 100.0, the error rate is 0.209497
When C = 1000.0, the error rate is 0.209632


Again, we still don't see great differences in term of the performance when different value of $C$ is chosen, but the results seems to suggest that greater the $C$, better the separating line we'll get. In fact, $C$ measures the penalization of misclassifiying a point and choosing a $C$ sufficiently large yields the hard-margin classifier for linearly separable data.

Nonetheless, it's also well-known that large $C$ leads often to overfitting of the training data. This can be seen for some extremly cases in this setting.

In [45]:
numdatapoints = 10
cloudA = rand(gaussianA , numdatapoints)
cloudB = rand(gaussianB, numdatapoints)
plotclouds(cloudA, cloudB)

In [46]:
X = [[cloudA cloudB]' ones(2*numdatapoints)]
Y = [-1.*ones(numdatapoints); ones(numdatapoints)]
w, α, numstepsarray = svmbarrier(X, Y, 100, 1e-4)

plotclouds(cloudA, cloudB)
drawborder((x, y) -> [x; y; 1]⋅w)

In [47]:
for C in [1e-5, 1e-3, 1, 10, 100, 1000]
    w, α, numstepsarray = svmbarrier(X, Y, C, 1e-4)
    println("When C = $C, the error rate is $(errorrate(gaussianA, gaussianB, w, Int(1e6)))")
end

When C = 1.0e-5, the error rate is 0.234937
When C = 0.001, the error rate is 0.234493
When C = 1.0, the error rate is 0.205512
When C = 10.0, the error rate is 0.276011
When C = 100.0, the error rate is 0.286903
When C = 1000.0, the error rate is 0.287675


As predicted, since we have few training data here. When they fail to describe properly the underlying model a large value of $C$ can cause overfitting and thus deteriorate the out-of-sample performance of the classifier. In contrast, choosing a small value of $C$ (in this case samller than $1$) sometimes allows us to get around this problem.

### Compare with Different Solvers

In this section I'll try to use some off-the-shelf solvers to solve the classification problem and compare their performances with the previous implementation. I'll continue to use the previous example for the tests.

In [48]:
numdatapoints = 100
gaussianA = MvNormal([2.; 2.], [7. 2.5; 2.5 3.])
gaussianB = MvNormal([-3.; 1.], [2. -0.3; -0.3 5.])
cloudA = rand(gaussianA , numdatapoints)
cloudB = rand(gaussianB, numdatapoints)

X = [[cloudA cloudB]' ones(2*numdatapoints)]
Y = [-1.*ones(numdatapoints); ones(numdatapoints)]
plotclouds(cloudA, cloudB)

#### IPOPT

In Julia, the `JuMP` package allows us to interact with many optimization backends a uniform syntax. Among the open source solvers that support quadratic programming, there is notably **IPOPT (Interior Point OPTimizer)** which I choose to use here. To compare the performances, I'll fix $C=1$ and $\epsilon=10^{-3}$ (tolerance of termination critirion).

In [49]:
using JuMP
using Ipopt

model = Model(solver=IpoptSolver(tol=0.001))
C = 1

@variables model begin
    w[1:3]
    z[1:2*numdatapoints], (lowerbound=0)
end

@objective(model, Min, sum(w.^2)/2+C*sum(z))
@constraint(model, con, Y.*X*w .≥ 1 - z)

status = solve(model)

This is Ipopt version 3.12.1, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:      800
Number of nonzeros in Lagrangian Hessian.............:        3

Total number of variables............................:      203
                     variables with only lower bounds:      200
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:      200
        inequality constraints with only lower bounds:      200
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  

:Optimal

In [50]:
Q = Diagonal(Y)*X*X'*Diagonal(Y)
primal(w) = sum(w.^2)/2 + C*sum(max.(0, ones(Y)-Y.*(X*w)))
dual(α) = -α⋅(Q*α)/2 + sum(α)

println("The error rate is $(errorrate(gaussianA, gaussianB, getvalue(w), Int(1e6)))")
println("The duality gap is $(primal(getvalue(w)) - dual(getdual(con))).")
plotclouds(cloudA, cloudB)
drawborder((x, y) -> [x; y; 1]⋅getvalue(w))

The error rate is 0.204707
The duality gap is 0.0018408788423585065.


#### Benchmarking, LIBSVM

I'll now suppress output of the solver in order to do some benchmark.

In [51]:
using BenchmarkTools

model = Model(solver=IpoptSolver(tol=0.001, print_level=0))
C = 1

@variables model begin
    w[1:3]
    z[1:2*numdatapoints], (lowerbound=0)
end

@objective(model, Min, sum(w.^2)/2+C*sum(z))
@constraint(model, con, Y.*X*w .≥ 1 - z)

@benchmark solve($model)

BenchmarkTools.Trial: 
  memory estimate:  168.63 KiB
  allocs estimate:  1719
  --------------
  minimum time:     9.890 ms (0.00% GC)
  median time:      10.129 ms (0.00% GC)
  mean time:        10.425 ms (0.28% GC)
  maximum time:     35.710 ms (12.99% GC)
  --------------
  samples:          480
  evals/sample:     1

A famous library knowing for solving SVM problems is clearly **libsvm**. SVR servers as the Julia interface of it (the default $C$ and $\epsilon$ match the values that I choose).

In [52]:
import SVR

X_ = X[:, 1:2]'
@benchmark SVR.train($Y, $X_, svm_type=Int32(0), kernel_type=Int32(0))

BenchmarkTools.Trial: 
  memory estimate:  12.39 KiB
  allocs estimate:  30
  --------------
  minimum time:     600.922 μs (0.00% GC)
  median time:      616.687 μs (0.00% GC)
  mean time:        632.310 μs (0.22% GC)
  maximum time:     3.497 ms (77.57% GC)
  --------------
  samples:          7885
  evals/sample:     1

Since with the Julia interface of **libsvm**, it's quite painful to get the value of the support vector (dealing with C pointers etc.), I just show here that the classification performance is comparable with what we has been seen before.

In [53]:
numtestpoints = 10000
testA = rand(gaussianA, numtestpoints)
testB = rand(gaussianB, numtestpoints)
Xtest = [testA testB]
Ytest = [-1.*ones(numtestpoints); ones(numtestpoints)]

model = SVR.train(Y, X_, svm_type=Int32(0), kernel_type=Int32(0))
predict(x) = SVR.predict(model, x)
sum(mapslices(predict, Xtest, 1)'.≠Ytest)/numtestpoints

0.2039

Finally I test my own implementation which is of course not optimized at all.

In [54]:
@benchmark svmbarrier(X, Y, 1, 1e-3)

BenchmarkTools.Trial: 
  memory estimate:  725.59 MiB
  allocs estimate:  21872
  --------------
  minimum time:     395.083 ms (17.31% GC)
  median time:      408.043 ms (16.79% GC)
  mean time:        428.109 ms (15.88% GC)
  maximum time:     559.816 ms (12.26% GC)
  --------------
  samples:          12
  evals/sample:     1

#### Analysis

We see that **libsvm** is $10 \sim 100$ times faster than **IPOPT** while **IPOPT** is still about $100$ times faster than my own implementation.

What I implemented is a naif barrier method for which at each centering step we need to compute the Hessian (a matrix of size $(n+m) \times (n+m)$) and its inverse. The algorithm becomes extremely slow and even intractable when the value of $m$ grows.

Without knowing the details, it seems that **IPOPT** implements a primal-dual interior point method and exploits both first and second derivative information. However it is still quite efficient despite the use of Hessian and I think there are certainly also improvements that can be carried out for the barrier method to get something that runs faster.

Finally, **libsvm** of course uses the algorithm that adpats the best to the problem (it seems that dual coordinate descent is used for linear SVM) and has the best performance. Now let's try with more data points.

In [24]:
numdatapoints = 500
gaussianA = MvNormal([2.; 2.], [7. 2.5; 2.5 3.])
gaussianB = MvNormal([-3.; 1.], [2. -0.3; -0.3 5.])
cloudA = rand(gaussianA , numdatapoints)
cloudB = rand(gaussianB, numdatapoints)

X = [[cloudA cloudB]' ones(2*numdatapoints)]
Y = [-1.*ones(numdatapoints); ones(numdatapoints)]
plotclouds(cloudA, cloudB)

In [25]:
model = Model(solver=IpoptSolver(tol=0.001, print_level=0))
C = 1

@variables model begin
    w[1:3]
    z[1:2*numdatapoints], (lowerbound=0)
end

@objective(model, Min, sum(w.^2)/2+C*sum(z))
@constraint(model, con, Y.*X*w .≥ 1 - z)

@benchmark solve($model)

BenchmarkTools.Trial: 
  memory estimate:  848.52 KiB
  allocs estimate:  13828
  --------------
  minimum time:     47.517 ms (0.00% GC)
  median time:      48.335 ms (0.00% GC)
  mean time:        49.331 ms (0.21% GC)
  maximum time:     57.781 ms (5.88% GC)
  --------------
  samples:          102
  evals/sample:     1

In [26]:
X_ = X[:, 1:2]'
@benchmark SVR.train($Y, $X_, svm_type=Int32(0), kernel_type=Int32(0))

BenchmarkTools.Trial: 
  memory estimate:  56.02 KiB
  allocs estimate:  31
  --------------
  minimum time:     11.449 ms (0.00% GC)
  median time:      11.918 ms (0.00% GC)
  mean time:        11.959 ms (0.05% GC)
  maximum time:     14.411 ms (17.23% GC)
  --------------
  samples:          418
  evals/sample:     1

In [27]:
@time svmbarrier(X, Y, 1, 1e-3)

 37.553907 seconds (24.86 k allocations: 18.694 GiB, 55.02% gc time)


([-0.781917, -0.0699528, -0.541532], [8.66628e-9, 1.0, 5.21391e-9, 6.47601e-8, 1.0, 2.16875e-9, 1.0, 8.2447e-9, 1.0, 1.0  …  1.0, 1.0, 1.0, 2.11287e-8, 1.0, 4.53497e-9, 6.74479e-9, 1.36051e-8, 1.0, 8.22219e-9], [14, 150, 63, 22, 21])

### Coordinate Descent

As shown above, a direct implementation of barrier method is not very efficient. We can address the problem from its dual, which has a boxed constraint and can be efficiently solved by coordinate descent. The primal solution is later recovered thanks to the representer theorem. The implementation of this method can be found in the file `dualcoordinatedescent.jl`.

In [28]:
include("dualcoordinatedescent.jl")

svm_dualcoordinatedescentopt (generic function with 1 method)

#### Give a first try

The coordinate descent process is composed of a series of outer iterations while each outer iteration contains $m$ inter iterations. In each inter iteration, we try to minimize the function along a particular coordinate. In my fisrt implementation, I compute the duality gap after every outer iteration to see if the stopping critirion is met.

In [29]:
numdatapoints = 50
gaussianA = MvNormal([2.; 2.], [7. 2.5; 2.5 3.])
gaussianB = MvNormal([-3.; 1.], [2. -0.3; -0.3 5.])
cloudA = rand(gaussianA , numdatapoints)
cloudB = rand(gaussianB, numdatapoints)
plotclouds(cloudA, cloudB)

In [30]:
X = [[cloudA cloudB]' ones(2*numdatapoints)]
Y = [-1.*ones(numdatapoints); ones(numdatapoints)]
@time w, α, dualitygaps = svm_dualcoordinatedescent(X, Y, 10, 1; maxiter=5000)

  0.384481 seconds (1.15 M allocations: 222.957 MiB, 8.40% gc time)


([-0.88246, -0.0293925, -1.04544], [0.0, 0.0, 10.0, 10.0, 0.0, 0.0, 0.0, 0.0, 10.0, 0.0  …  0.0, 0.0, 10.0, 0.0, 0.0, 0.0, 0.0, 10.0, 0.0, 10.0], [2.17773e6, 6895.77, 1544.13, 969.569, 723.54, 628.212, 508.01, 447.554, 395.353, 355.712  …  1.22085, 1.21962, 1.21838, 1.21714, 1.2159, 1.21467, 1.21343, 1.21219, 1.211, 0.667693])

In [31]:
println("The error rate is $(errorrate(gaussianA, gaussianB, w, Int(1e6)))")
plotclouds(cloudA, cloudB)
drawborder((x, y) -> [x; y; 1]⋅w)

The error rate is 0.215605


In [32]:
plot(dualitygaps, yscale=:log10, xlabel="outer iterations", ylabel="duality gap")

The convergence is slow comparing with the barrier mathod, in the sense that a lot more iterations are needed. In theory, the convergence should be linear, but in practice when we add more data points, as shown in the case below, the algorithm may fail to converge (or may take very long time to converge) and that's why I need to add `maxiter` as an argument.

In [33]:
numdatapoints = 500
gaussianA = MvNormal([2.; 2.], [7. 2.5; 2.5 3.])
gaussianB = MvNormal([-3.; 1.], [2. -0.3; -0.3 5.])
cloudA = rand(gaussianA , numdatapoints)
cloudB = rand(gaussianB, numdatapoints)

X = [[cloudA cloudB]' ones(2*numdatapoints)]
Y = [-1.*ones(numdatapoints); ones(numdatapoints)]
@time w, α, dualitygaps = svm_dualcoordinatedescent(X, Y, 10, 1; maxiter=10000)

 50.619687 seconds (69.85 M allocations: 79.816 GiB, 20.25% gc time)


([-0.843011, -0.0557934, -0.762457], [0.0, 10.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.0, 10.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 10.0, 10.0, 0.0, 10.0, 0.0], [1.74602e8, 7.42345e5, 263215.0, 1.4634e5, 86824.2, 64822.5, 49046.8, 35471.0, 25267.1, 17434.0  …  8.54574, 8.54232, 8.5389, 8.53548, 8.53206, 8.52864, 8.52522, 8.5218, 8.51838, 8.51496])

In [34]:
plot(dualitygaps, yscale=:log10, xlabel="outer iterations", ylabel="duality gap")

In [35]:
@time wb, αb, numstepsarray = svmbarrier(X, Y, 10, 1)

 24.480221 seconds (48.14 k allocations: 12.456 GiB, 54.83% gc time)


([-0.835879, -0.0481739, -0.765987], [2.59338e-5, 9.99998, 0.000130178, 5.31712e-5, 0.000117654, 2.35368e-5, 7.45481e-5, 9.99885, 9.99997, 1.54957e-5  …  0.000302875, 3.52793e-5, 0.000110119, 0.000423534, 8.63209e-5, 9.99968, 9.99993, 0.000127717, 9.99981, 4.03743e-5], [51, 86, 43])

In [36]:
Q = Diagonal(Y)*X*X'*Diagonal(Y)
primal(w) = sum(w.^2)/2 + 10*sum(max.(0, ones(Y)-Y.*(X*w)))
dual(α) = -α⋅(Q*α)/2 + sum(α)
primal(wb) - dual(αb)

0.09979675860449788

Strangely it seems that the barrier method is faster here. Nonetheless, the dual coordinate descent can in fact be more efficient by modifying a little bit the implementation.

#### Some ameliorations

The function `svm_dualcoordinatedescentopt` is another implementation of the dual coordinate descent method. We apply random permutation of descent orders to each outer iteration. The partial derivative can be computed more efficiently in time $O(n)$ rather than $O(m)$ which is a great improvement when $m \gg n$. Finally computing the duality gap after each outer iteration is expensive, but to solve this problem other stopping criterion should be found. I will not address this issue here and therefore I simply give the number of outer iterations to run as an argument.

In [37]:
@time wopt, αopt = svm_dualcoordinatedescentopt(X, Y, 10, 10000)

  1.718417 seconds (35.90 M allocations: 2.828 GiB, 20.79% gc time)


([-0.835787, -0.0481665, -0.766012], [0.0, 10.0, 0.0, 0.0, 0.0, 0.0, 0.0, 10.0, 10.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 10.0, 10.0, 0.0, 10.0, 0.0])

In [38]:
primal(wopt) - dual(αopt)

0.009655013771407539

In [39]:
println("The error rate is $(errorrate(gaussianA, gaussianB, w, Int(1e6)))")
plotclouds(cloudA, cloudB)
drawborder((x, y) -> [x; y; 1]⋅w)

The error rate is 0.206972


We see that the algorithm is now much faster and has a better precision. Therefore, if we don't aim to solve the classification problem with a very high precision, say, something like $10^{-2}$ is sufficient, it may worth considering the use of the coordinate descent method.

### Analytic Center Cutting-Plane Method (ACCPM)

In [28]:
include("analyticcenter.jl")
include("accpm.jl")

nextstartpoint (generic function with 1 method)

In [29]:
numdatapoints = 50
gaussianA = MvNormal([5.; 5.], [2. 2.; 2. 3.])
gaussianB = MvNormal([-2.; -8.], [5. 1.; 1. 3.])
cloudA = rand(gaussianA , numdatapoints)
cloudB = rand(gaussianB, numdatapoints)
plotclouds(cloudA, cloudB)

In [30]:
X = [[cloudA cloudB]' ones(2*numdatapoints)]
Y = [-1.*ones(numdatapoints); ones(numdatapoints)]
w, optdistances = svmaccpm(X, Y, 10, 1e-4)

plotclouds(cloudA, cloudB)
drawborder((x, y) -> [x; y; 1]⋅w)

In [32]:
plot(optdistances, yscale=:log10)