# SVC #

Given training:

Input vector $x_{i} \in \mathcal{R}^{p}, i=1,\dots ,n$

class = $2$

Output vector $y \in \{1, -1\}^{n}$

$$\min_{w,b,c}\frac{1}{2}w^{T}w+C\sum_{i=1}^{n}\zeta_{i}$$

subject to

$y_{i}(w^{T}\phi(x_{i})+b)\ge1-\zeta_{i}$

$\zeta_{i}\ge0,i=1,\dots,n$


** Dual coffecient **

$$\min_{\alpha}\frac{1}{2}\alpha^{T}\mathcal(Q)\alpha-e^{T}\alpha$$

subjected to :

$y^{T}\alpha = 0$

$0\le\alpha_{i}\le C, i=1,\dots,n$


$e$ is vector of all ones

$C\gt0$ is upper bound

$\mathcal{Q}_{n\times n}$ semidefinite positive matrix

$\mathcal{Q}_{ij} \equiv y_{i}y_{j}K(x_{i},x_{j})$

*Kernel*: $K(x_{i},x{j}) = \phi(x_{i})^{T}\phi(x_{j})$

Training vectors are implicitly mapped into higher dimention space ($\infty$) by decision function ($\phi$)

$$sgn(\sum_{i=1}^{n}y_{i}\alpha_{i}K(x_{i}, x)+\rho)$$


> *libsvm and liblinear use $c$ as gegularization parameter.
Other use $\alpha$*

<b><i>dual\_coef\_</i></b> holds $y_{i}\alpha_{i}$

<b><i>support\_vectors\_</i></b> holds support vector

<b><i>intercept\_</i></b> holds independent term $\rho$

# NuSVC #
$\nu$ controls number of support vectors and training errors
$$\nu\in(0,1]$$

Upper bound: training errors

Lower bound: support vectors

> *mathematically $\nu$-SVC $\equiv$ $C$-SVC* (reparameterization of $C$-SVC)


# SVR #

Given training:

Input vector $x_{i} \in \mathcal{R}^{p}, i=1,\dots ,n$

Output vector $y \in \mathcal{R}^{n}$

$\epsilon$-SVR solves:

$$\min_{w,b,\zeta,\zeta^{*}}\frac{1}{2}w^{T}w + C \sum_{i=1}^{n}(\zeta_{i} + \zeta_{i}^{*})$$

subjected to:

$y_{i} - w^{T}\phi(x_{i})-b \le \epsilon + \zeta_{i}$

$w^{T}\phi(x_{i}) + b -y_{i} \le \epsilon + \zeta_{i}^{*}$

$\zeta_{i}, \zeta_{i}^{*} \ge 0, i=1, \dots , n$

** Dual coffecient: **

$$\min_{\alpha, \alpha ^{*}} \frac{1}{2}(\alpha - \alpha^{*})^{T}\mathcal{Q}(\alpha - \alpha^{*}) + \epsilon e^{T}(\alpha + \alpha^{*}) - y^{T}(\alpha - \alpha^{*})$$

subjected to :

$e^{T}(\alpha - \alpha^{*}) = 0$

$0\le \alpha_{i}$

$\alpha_{i}^{*} \le C, i=1, \dots, n$

$e$ is vector of all ones

$C\gt0$ is upper bound

$\mathcal{Q}_{n\times n}$ positive matrix

$\mathcal{Q}_{ij}\equiv K(x_{i},x_{j})= \phi(x_{i})^{T}\phi(x_{j})$

Training vectors are implicitly mapped into higher dimention space ($\infty$) by decision function ($\phi$)
$$\sum_{i=1}^{n}(\alpha_{i}-\alpha_{i}^{*})K(x_{i}, x)+ \rho$$

<b><i>dual\_coef\_</i></b> holds $\alpha_{i}-\alpha_{i}^{*}$

<b><i>support\_vectors\_</i></b> holds support vector

<b><i>intercept\_</i></b> holds independent term $\rho$

<hr />
# $\langle code \rangle$ #
<hr />

Training samples: $X \in [n\_samples, n\_features]$

Class labels: $y \in [n\_samples]$

> LinearSVC does not accept keyword ** *kernel* ** and ** *support_* **

In [1]:
from sklearn import svm
X = [[0, 0], [1, 1]]
y = [0, 1]
clf = svm.SVC()
clf.fit(X, y) 

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [2]:
clf.predict([[2., 2.]])

array([1])

SVMs decision function properties

In [3]:
# get support vectors
clf.support_vectors_

array([[ 0.,  0.],
       [ 1.,  1.]])

In [4]:
# get indices of support vectors
clf.support_ 

array([0, 1])

In [5]:
# get number of support vectors for each class
clf.n_support_

array([1, 1])

# Multi class Classification #
number of class = $n\_class$

## For SVC and NuSVC ##
num of Classifiers: $n\_class \times \frac{(n\_class - 1)}{ 2}$

decission_function_shape =   (n_samples, n_classes)

> approach : "one-against-one

In [6]:
X = [[0], [1], [2], [3]]
Y = [0, 1, 2, 3]
clf = svm.SVC(decision_function_shape='ovo')
clf.fit(X, Y) 

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovo', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [7]:
dec = clf.decision_function([[1]])
dec.shape[1] # 4 classes: 4*3/2 = 6

6

In [8]:
clf.decision_function_shape = "ovr"
dec = clf.decision_function([[1]])
dec.shape[1] # 4 classes

4

## For Linear SVC ##
> approach: "one-vs-the-rest"

In [9]:
lin_clf = svm.LinearSVC()
lin_clf.fit(X, Y) 

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0)

In [10]:
dec = lin_clf.decision_function([[1]])
dec.shape[1]

4

** Alternative multi-class SVM **

LinearSVC(multi_class='crammer_singer') 

> In practice, one-vs-rest classification is usually preferred, since the results are mostly similar, but the runtime is significantly less.

** "One vs Rest" LinearSVC **

<b><i>coef_</i></b> : [$n\_class$, $n\_features$]

<b><i>intercept_</i></b> : [$n_class$]

** "One-vs-One" SVC **

<b><i>coef_</i></b> : [$n\_class \times \frac{(n\_class - 1)}{ 2}$, $n\_features$]

<b><i>intercept_</i></b> : [$n_class$]

** Dual Coffecient **
<b><i>dual_coef_</i></b> : [$n\_class$, $n\_SV$]

```python
from sklearn.svm import SVC  # C-Support Vector Classification.
```
> * support_
> * support_vectors_
> * n_support_ 
> * dual_coef_
> * coef_
> * intercept_

| Parameters | type | value | dafault |
|------------|------|-------|---------|
| C |  float | optional | 1.0 |
| kernel | string | optional | ’rbf’|
| degree | int | optional | 3 |
| gamma | float | optional | ’auto’ |
| coef0 | float | optional | 0.0 |
| probability | boolean | optional | False |
| shrinking | boolean | optional | True |
| tol | float | optional | 1e-3 |
| cache_size | float | optional| |
| class_weight | dict | balanced | optional |
| verbose | bool | |False |
| max_iter | int | optional | -1 |
| decision_function_shape | str |‘ovo’, ‘ovr’ | ’ovr’ |
| random_state | int | RandomState instance or None, optional | None |


| Methods | Parameters | return |
|---------|------------|--------|
| decision_function | X | array |
| fit | X, y, sample_weight=None | self |
| get_params | deep=True | params |
| predict | X | c |
| predict_log_proba | X | T |
| predict_proba | X | T |
| score | X, y, sample_weight=None | score |
| set_params | \*\*params | self | 

```python
from sklearn.svm import NuSVC  # Nu-Support Vector Classification.
```
> * support_
> * support_vectors_
> * n_support_
> * dual_coef_
> * coef_
> * intercept_

| Parameters | type | value | dafault |
|------------|------|-------|---------|
| nu | float | optional |0.5 |
| kernel | string | optional | ’rbf’ |
| degree | int | optional | 3 |
| gamma | float | optional | ’auto’ |
| coef0 | float | optional | 0.0 |
| probability | boolean | optional | False |
| shrinking | boolean | optional | True |
| tol | float | optional | 1e-3 |
| cache_size | float | optional| |
| class_weight | dict | ‘balanced’ | optional|
| verbose | bool | | False|
| max_iter | int | optional | -1 |
| decision_function_shape | str | ‘ovo’, ‘ovr’|’ovr’|
| random_state | int | RandomState instance or None, optional |None)|

| Methods | Parameters | return |
|---------|------------|--------|
| decision_function | X | array |
| fit | X, y, sample_weight=None | self |
| get_params | deep=True | params |
| predict | X | c |
| predict_log_proba | X | T |
| predict_proba | X | T |
| score | X, y, sample_weight=None | score |
| set_params | \*\*params | self | 

```python
from sklearn.svm import LinearSVC  # kernel = 'linear'.
```
> * coef_
> * intercept_


| Parameters | type | value | dafault |
|------------|------|-------|---------|
| penalty | str | ‘l1’ or ‘l2’ | ‘l2’ |
| loss | str |  ‘hinge’ or ‘squared_hinge’ | ’squared_hinge’ |
| dual | bool |True or False | True |
| tol | float  | optional |1e-4|
| C | float | optional | 1.0 |
| multi_class | str |  ‘ovr’ or ‘crammer_singer’ | ‘ovr’ |
| fit_intercept | bool | | True |
| intercept_scaling | float | | 1 |
| class_weight | dict | ‘balanced’ | optional |
| verbose | int| | 0|
| random_state | int | RandomState instance or None, optional | None |
| solver | str | {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’} | ‘liblinear’ |
| max_iter | int | | 100 |


| Methods | Parameters | return |
|---------|------------|--------|
| decision_function | X | array |
| densify | | self |
| fit | X, y, sample_weight=None | self |
| get_params | deep=True | params |
| predict | X | c |
| score | X, y, sample_weight=None | score |
| set_params | \*\*params | self | 
| sparsify | | self |