<center><h1>Least-Squares Support Vector machine</h1></center>

# Summary:

1. [Introduction](#introduction)

2. [LSSVM CPU Sample](#lssvm_cpu)
    
3. [LSSVM GPU Sample](#lssvm_gpu)

(adapted from https://github.com/RomuloDrumond/LSSVM)

# 1. Introduction <a class="anchor" id="introduction"></a>

The Least-Squares Support Vector Machine (LSSVM) is a variation of the original Support Vector Machine (SVM) in which we have a slight change in the objective and restriction functions that results in a big simplification of the optimization problem.

First, let's see the optimization problem of an SVM:

$$ 
\begin{align}
    minimize && f_o(\vec{w},\vec{\xi})=\frac{1}{2} \vec{w}^T\vec{w} + C \sum_{i=1}^{n} \xi_i &&\\
    s.t. && d_i(\vec{w}^T\vec{x}_i+b)\geq 1 - \xi_i, && i = 1,..., n \\
         && \xi_i \geq 0,                            && i = 1,..., n
\end{align}
$$

In this case, we have a set of inequality restrictions and when solving the optimization problem by it's dual we find a discriminative function, adding the kernel trick, of the type:


$$ f(\vec{x}) = sign \ \Big( \sum_{i=1}^{n} \alpha_i^o d_i K(\vec{x}_i,\vec{x}) + b_o \Big) $$

Where $\alpha_i^o$ and $b_o$ denote optimum values. Giving enough regularization (smaller values of $C$) we get a lot of $\alpha_i^o$ nulls, resulting in a sparse model in which we only need to save the pairs $(\vec{x}_i,d_i)$ which have the optimum dual variable not null. The vectors $\vec{x}_i$ with not null $\alpha_i^o$ are known as support vectors (SV).



In the LSSVM case, we change the inequality restrictions to equality restrictions. As the $\xi_i$ may be negative we square its values in the objective function:

$$ 
\begin{align}
    minimize && f_o(\vec{w},\vec{\xi})=\frac{1}{2} \vec{w}^T\vec{w} + \gamma \frac{1}{2}\sum_{i=1}^{n} \xi_i^2 &&\\
    s.t. && d_i(\vec{w}^T\vec{x}_i+b) = 1 - \xi_i, && i = 1,..., n
\end{align}
$$


The dual of this optimization problem results in a system of linear equations, a set of Karush-Khun-Tucker (KKT) equations:

$$
\begin{bmatrix} 
    0 & \vec{d}^T \\
    \vec{d} & \Omega + \gamma^{-1} I 
\end{bmatrix}
\
\begin{bmatrix} 
    b  \\
    \vec{\alpha}
\end{bmatrix}
=
\begin{bmatrix} 
    0 \\
    \vec{1}
\end{bmatrix}
$$

Where, with the kernel trick, &nbsp; $\Omega_{i,j} = d_i d_j K(\vec{x}_i,\vec{x}_j)$,  &nbsp;  $\vec{d} = [d_1 \ d_2 \ ... \ d_n]^T$, &nbsp; $\vec{\alpha} = [\alpha_1 \ \alpha_2 \ ... \ \alpha_n]^T$ &nbsp;  e &nbsp; $\vec{1} = [1 \ 1 \ ... \ 1]^T$.

The discriminative function of the LSSVM has the same form of the SVM but the $\alpha_i^o$ aren't usually null, resulting in a bigger model. The big advantage of the LSSVM is in finding it's parameters, which is reduced to solving the linear system of the type:

$$ A\vec{x} = \vec{b} $$

A well-known solution of the linear system is when we minimize the square of the residues, that can be written as the optimization problem:

$$
\begin{align}
    minimize && f_o(\vec{x})=\frac{1}{2}||A\vec{x} - \vec{b}||^2\\
\end{align}
$$

And have the analytical solution:

$$ \vec{x} = A^{\dagger} \vec{b} $$

Where $A^{\dagger}$ is the pseudo-inverse defined as:

$$ A^{\dagger} = (A^T A)^{-1} A^T$$

# 2. LSSVM CPU Sample <a class="anchor" id="lssvm_cpu"></a>

In [1]:
# Data loading and pre-processing

from data_object import *

data = DATA()
data.class_loading(problem = 'iris')
data.label_encode(label_type = 'bipolar')
data.hold_out(hold_method = 'aleatory', train_size = 0.8)
data.normalize(norm_type = 'zscore')

# Classifier Training and Test

from lssvm_classifier import *

lssvm = LSSVM(gamma=1, kernel='rbf', sigma=1)
lssvm.fit(data.X_tr, data.y_tr)
y_h = lssvm.predict(data.X_ts)

# Classifier Statistics

from statistics_object import *

stats = STATSCLASS()
stats.calculate(data.y_ts,y_h)

display(stats.confusion_matrix)
display(stats.accuracy)

array([[10.,  0.,  0.],
       [ 0.,  8.,  2.],
       [ 0.,  1.,  9.]])

0.9

# 3. LSSVM GPU Sample <a class="anchor" id="lssvm_gpu"></a>

In [2]:
# Data loading and pre-processing

from data_object import *

data = DATA()
data.class_loading(problem = 'iris')
data.label_encode(label_type = 'bipolar')
data.hold_out(hold_method = 'aleatory', train_size = 0.8)
data.normalize(norm_type = 'zscore')

# Classifier Training and Test

from lssvm_classifier import *

lssvm = LSSVM_GPU(gamma=1, kernel='rbf', sigma=1)
lssvm.fit(data.X_tr, data.y_tr)
y_h = lssvm.predict(data.X_ts)

# Classifier Statistics

from statistics_object import *

stats = STATSCLASS()
stats.calculate(data.y_ts,y_h)

display(stats.confusion_matrix)
display(stats.accuracy)

array([[11.,  0.,  0.],
       [ 0., 10.,  2.],
       [ 0.,  0.,  7.]])

0.9333333333333333