# Support Vector Machine(SVM)

- Author: Guo Zhang
- Created: 2017-03-22
- Updated: 2017-03-26

## Pre-knowledge

- Classification
- Regression

## Support Vector Classification(SVC)
### Principles

#### Start point: Maximal Margin Classifier (Linear Hard Margin)

Let's begin with the simplest two classes problems. Support we want to classify **two** kinds of fruit, apple and banana. Support we have selected the features(such as color, weight, volume, etc.) of apples and banana into a n-dimensional space. The most natural way to divide the two class is to cut the space with a piece of paper:

The boundary is the "paper" cutting the space. It is a line in 2-dimensional space, a plane in 3-dimensional space, and a **hyperplane** in n-dimensional space.  <!--definition of margin-->

However, there are infinite<!--to be sure--> many margins. Of course, we want to find a **optimal seperate hyperplane**. Therefore, we define **maximal margin hyperplane**(optimal seperate hyperplane) as <!--to be sure-->


![Hard Margin](SVM_hard.jpg)


The <!--definition of support vectors--> as **support vectors**. These "vectors" "support" the "machine", that's how **support vector machine** was named.


#### From Hard Margin to Soft Margin Classifier (Linear Support Vector Machine)

Maximal margin classifier mentioned above leaves us two problems. First, we can use it only one can find seperating hyperplanes. In many cases, there exists no seperating hyperplanes. Second, the optimal seperating hyperplanes will be sensitive to the change of <!--to be sure--->. In other words, the classifier is not robust. Therefore, we hope to improve the classifier.

It makes the introduction of **soft margin**. That is, it allows some incorrect ones classified into the other sides.
<!--introduction the key change on the new parameters-->

![Soft Margin](SVM_soft.jpg)


#### From Linear to Nonlinear

Soft margin classifier gives us an opportunity to avoid robustness problems. However, we do still not completely solve the problem if there is no linear seperating hypreplane. Therefore, we need to look for methods to construct nonlinear hyperplanes.

The key of constructing nonlinear soft margin is to select proper **kernel** function instead linear kernel function.
<!--no good way to introduce kernel without math-->

![Nonlinear](SVM_nonlinear.jpg)


#### From Two Classes to Multiple Classes

Now let's consider multiple classes problems. Suppose now we need to classify three kinds of fruit, apple, banana and pear.
<!--not understand the different methods-->

![Multiple Class](SVM_mul.jpg)


#### One-Class Case

### Models

#### Maximal Margin Classifier (Hard Margin Classifier)

$$\max_{\beta_0, \beta_1, \beta_2,\ldots, \beta_p} M$$

subject to 

- $$\sum_{j=1}^p \beta_j^2=1$$
- $$y_i(\beta_0+\beta_1 x_{i1}+\ldots+\beta_p x_{ip})\geq M$$




#### Soft Margin Classifier

 $$\max_{\beta_0, \beta_1, \beta_2,\ldots, \beta_p, \varepsilon_1, \ldots, \varepsilon_n} M$$

subject to 

  - $$\sum_{j=1}^p \beta_j^2=1$$
  - $$y_i(\beta_0+\beta_1 x_{i1}+\ldots+\beta_p x_{ip})\geq M(1-\varepsilon_i)$$
  - $$\varepsilon_i \geq 0, \sum_{i=1}^{n} \leq C$$


#### Nonlinear Classifier


#### Multiple Classes Case


#### One Class Case


### Python Implements

#### Prepare

In [29]:
# import SVM classes
from sklearn.svm import SVC, LinearSVC, OneClassSVM

In [30]:
# prepare data
# data for two classes
x_two = [[0, 0], [2, 2]]
y_two = [1, 2]

x_two_predict = [[1, 2]]

# data for multiple classes
x_mul = [[0], [1], [2], [3]]
y_mul = [0, 1, 2, 3]

x_mul_predict = [[2.4]]

# data for one class
x_one = [[0],[2],[4],[5]]

#### Soft Margin
There are two implements by *sklearn*.
- Two classes linear models by SVC

In [31]:
clf = SVC(kernel='linear')
clf.fit(x_two, y_two)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [32]:
clf.predict(x_two_predict)

array([2])

- Two classes linear models by LinearSVC

In [33]:
clf = LinearSVC()
clf.fit(x_two, y_two)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0)

In [34]:
clf.predict(x_two_predict)

array([2])

#### Nonlinear SVC

In [35]:
# import SVC class
from sklearn.svm import SVC

In [36]:
clf = SVC()
clf.fit(x_two, y_two)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [37]:
clf.predict(x_two_predict)

array([2])

#### Muti-Class SVC

- Linear multiple classes models by SVC

In [38]:
clf = SVC(kernel='linear')
clf.fit(x_mul, y_mul)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [39]:
clf.predict(x_mul_predict)

array([2])

- Linear multiple classes models by LinearSVC

In [40]:
clf = LinearSVC()
clf.fit(x_mul, y_mul)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0)

In [41]:
clf.predict(x_mul_predict)

array([3])

Notice that the result of two implements are not the same.

- Nonlinear multiple classes models by SVC

In [42]:
clf = SVC()
clf.fit(x_mul, y_mul)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [43]:
clf.predict(x_mul_predict)

array([2])

#### One-Class SVC

In [44]:
from sklearn.svm import OneClassSVM

In [45]:
clf = OneClassSVM()

In [46]:
clf.fit(x_one)

OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma='auto', kernel='rbf',
      max_iter=-1, nu=0.5, random_state=None, shrinking=True, tol=0.001,
      verbose=False)

In [47]:
clf.predict([[5],[3.55],[2],[3],[0],[-1],[4.3]])

array([-1., -1., -1., -1.,  1., -1.,  1.])

Notice that it will return **1** of it is in the class, and return **-1** if not.

### Economic Application

- https://www.researchgate.net/profile/Young-Chan_Lee/publication/222580945_Bankruptcy_prediction_using_support_vector_machine_with_optimal_choice_of_kernel_function_parameters/links/02e7e52bac4f202c28000000/Bankruptcy-prediction-using-support-vector-machine-with-optimal-choice-of-kernel-function-parameters.pdf
- http://svms.org/finance/HuangNakamoriWang2005.pdf

## Support Vector Regression(SVR)
### Intuition

### Models

### Python Implements

In [2]:
# import SVR class
from sklearn.svm import SVR

In [23]:
# simulated data
x = [[0, 0], [2, 2]]
y = [1, 2]

In [24]:
# fit 
clf = SVR()
clf.fit(x,y)

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto',
  kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [25]:
# prediction
clf.predict([[1, 2]])

array([ 1.71369217])

Comparing to previous results, notice that the prediction of support vector **regression** is different from support vector **classification**. Regression is continous while classification is discrete.

### Economic Applications

- Vapnik(1995)

## References

- Chapter 9, [An Introduction to Statistical Learning with R](http://www-bcf.usc.edu/~gareth/ISL/)
- Chapter 12, [Elements of Statistical Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/)
- Chapter 6, [Machine Learning in Action](https://github.com/pbharrin/machinelearninginaction)
- [A Tutorial to Support Vector Regression](http://alex.smola.org/papers/2003/SmoSch03b.pdf)
- [Support Vector Regression Machine](http://papers.nips.cc/paper/1238-support-vector-regression-machines.pdf)
- [Support Vector Machines - scikit-learn](http://scikit-learn.org/stable/modules/svm.html#svm)
