# Support Vector Machines
## Linear SVM Classification

In [1]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline

from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

iris = datasets.load_iris()
X = iris["data"][:, (2,3)] # petal length, petal width
y = (iris["target"] == 2).astype(np.float64) # Iris virginica

svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("linear_svc", LinearSVC(C=1, loss="hinge"))
])

svm_clf.fit(X, y)

Pipeline(steps=[('scaler', StandardScaler()),
                ('linear_svc', LinearSVC(C=1, loss='hinge'))])

In [2]:
svm_clf.predict([[5.5, 1.7]])

array([1.])

## Nonlinear SVM Classification

Many datasets are not linearly separable. One simple approach is to add polynomial features.

In [3]:
from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

X, y = make_moons(n_samples=100, noise=0.15) # Generates two interwoven "crescents" of points.
polynomial_svm_clf = Pipeline([
    ("poly_features", PolynomialFeatures(degree=3)),
    ("scaler", StandardScaler()),
    ("svm_clf", LinearSVC(C=10, loss="hinge"))
])

polynomial_svm_clf.fit(X, y)

Pipeline(steps=[('poly_features', PolynomialFeatures(degree=3)),
                ('scaler', StandardScaler()),
                ('svm_clf', LinearSVC(C=10, loss='hinge'))])

In [4]:
polynomial_svm_clf.predict([[1, 1]])

array([0])

In [5]:
polynomial_svm_clf.predict([[1, -1]])

array([1])

### Polynomial Kernel

The *kernel trick* makes it possible to get the same result as adding polynomial features without the slowness downsides of actually adding them.

In [6]:
from sklearn.svm import SVC
poly_kernel_svc_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
])
poly_kernel_svc_clf.fit(X, y)

Pipeline(steps=[('scaler', StandardScaler()),
                ('svm_clf', SVC(C=5, coef0=1, kernel='poly'))])

### Similarity Features

Another approach for nonlinear data - add *similarity functions*, computing how close an instance resembles a particular *landmark* point.

A common similarity function is the Gaussian Radial Basis Function (Gaussian RBF), $\phi_y(\mathbf{x},\mathbf{\ell})=\text{exp}(-\gamma||\mathbf{x}-\mathbf{\ell}||^2)$

How do you select the landmarks? The simplest approach is to just add a landmark at every point in the dataset, however this means adding $m$ features for a dataset of size $m$. Once again, the kernel trick comes to the rescue.

### Gaussian RBF Kernel

In [7]:
rbf_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="rbf", gamma=5, C=0.001))
])
rbf_kernel_svm_clf.fit(X, y)

Pipeline(steps=[('scaler', StandardScaler()),
                ('svm_clf', SVC(C=0.001, gamma=5))])

In general, try `LinearSVC` first because it is much faster than `SVC`. `SVC(kernel="rbf")` is a good option if the training set is not too large. There are special kernels for certain data types, for example string kernels for text, but for general classification rbf is the most widely used kernel by far.

## SVM Regression

In [8]:
from sklearn.svm import LinearSVR

svm_reg = LinearSVR(epsilon=1.5)
svm_reg.fit(X, y)

LinearSVR(epsilon=1.5)

In [9]:
from sklearn.svm import SVR

svm_poly_reg = SVR(kernel="poly", degree=2, C=100, epsilon=0.1)
svm_poly_reg.fit(X, y)

SVR(C=100, degree=2, kernel='poly')