# Support Vector Machines Notes

In [1]:
import pandas as pd
import numpy as np

#### Scikit learn Documentation
http://scikit-learn.org/stable/modules/svm.html

### Key Terms

Margin = distance between line and nearest point of either class

### Intuition

SVMS separate classes. They can be used for classification, regression, and outlier detection.

The margin is maximized- that is the distance between the nearest points of both classes is maximized.
If maximized, the robustness is higher.

SVMS put first and foremost the minimization of classification error. 

#### Non-linearity

- SVMs can be "non-linear" if you look at 2 dimensions!
- Third dimension calculated by X^2 + y^2
- This third dimension allows for linear hyperplanes for data not obsiously separatable in 2-d
- Can also add similar 3rd Ds, such as |x| or |y|

#### Advantages

- Memory Effecient
- Can handle high dimensionality to a reasonable degree
- Versatile: use kernel functions to specify descision function

#### Disadvantages

- To avoid over-fitting when have high dimensionality, need to use right kernel function and regularization term
- Not directly provide probability estimates

#### Outliers

SVMs tolerate individual outliers. They can identify outliers and will create the best margin possible not taking into account outliers.

Thus, they ignore outliers.

### Sample Code

In [3]:
from sklearn import svm
X = [[0, 0], [1, 1]]
y = [0, 1]
clf = svm.SVC()
clf.fit(X, y)  

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [4]:
clf.predict([[2., 2.]])

array([1])

In [7]:
clf.support_vectors_

array([[ 0.,  0.],
       [ 1.,  1.]])

In [8]:
# get indices of support vectors
clf.support_ 

array([0, 1], dtype=int32)

In [9]:
# get number of support vectors for each class
clf.n_support_ 

array([1, 1], dtype=int32)

###  Kernel Trick

- Can make changes in input space to increase size of input space
- This allows formerly non-linear seperable data to linear separable
- This can result in a decision line that is non-linear

#### Key Parameters

- C "gamma": defines how far influence of single training set reaches. 
- Low values will allow for far reach
- High values will allow for close reach. 
- so higher values will result in potential over fitting and more jagged hyperplane.

- Kernel: can be linear, rbf, poly, etc.
- for rbf kernel need to raise the C value

#### How to stop overfitting
- Need to tune parameters to not overfit