# Linear SVM Classification

分类适用于 复杂的小型或中型的数据集

* large margin classification

* 在边界的样例叫支持向量

* 对数据缩放敏感

## Soft Margin Classification

hard Margin Classification 有两个问题
* 对outlier非常敏感
* 数据必须线性可分

你要在保持街道的宽度和限制margin violation(样例停留在街道上，并且可能会有错误的分类)中取得平衡

参数C越小 街道越宽 但是会有更多的margin violation

In [2]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

In [3]:
iris = datasets.load_iris()
X = iris["data"][: ,(2, 3)] # length , width
y = (iris["target"] == 2 ).astype(np.float32)

In [4]:
svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("clf", LinearSVC(C=1, loss='hinge')),
])

In [5]:
svm_clf.fit(X, y)

Pipeline(memory=None,
     steps=[('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('clf', LinearSVC(C=1, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='hinge', max_iter=1000, multi_class='ovr',
     penalty='l2', random_state=None, tol=0.0001, verbose=0))])

In [6]:
svm_clf.predict([[5.5, 1.7]]) #svm 是不输出概率的

array([ 1.], dtype=float32)

如果内存不够可以用SGDClassifier取得 具体看书

# Nonlinear SVM Classification

一个特征也许是线性不可分的 但是加上特征的平方就变得线性可分

In [7]:
from sklearn.datasets import make_moons
from sklearn.preprocessing import PolynomialFeatures

poly_svm_clf = Pipeline([
    ("poly_feature", PolynomialFeatures(degree=3)),
    ("scaler", StandardScaler()),
    ("svm_clf", LinearSVC(C=10, loss="hinge"))
])

In [8]:
poly_svm_clf.fit(X, y)

Pipeline(memory=None,
     steps=[('poly_feature', PolynomialFeatures(degree=3, include_bias=True, interaction_only=False)), ('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svm_clf', LinearSVC(C=10, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='hinge', max_iter=1000, multi_class='ovr',
     penalty='l2', random_state=None, tol=0.0001, verbose=0))])

## Polynomial Kernel

当维度低的时候不能处理复杂的数据 当维度高的时候有需要大量的运算资源

svm有一个神奇的数学方法叫kernel trick

可以得到和升高维度后一样的结果，但是并不是真正的提高维度

In [9]:
from sklearn.svm import SVC
poly_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel='poly', degree=3, coef0=1, C=5))
])

In [10]:
poly_kernel_svm_clf.fit(X, y)

Pipeline(memory=None,
     steps=[('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('svm_clf', SVC(C=5, cache_size=200, class_weight=None, coef0=1,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='poly',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

coef0 高维或低维对模型的影响

## Adding Similarity Features

* 最简单的方法是每个样例都是landmark


## Gaussian RBF Kernel

* 在svm中也有小技巧

In [11]:
rbf_kernel_clf = Pipeline([
    ("scaler", StandardScaler()),
    ('clf', SVC(kernel='rbf', gamma=5, C=0.001))
])

In [12]:
rbf_kernel_clf.fit(X, y)

Pipeline(memory=None,
     steps=[('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('clf', SVC(C=0.001, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=5, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False))])

## Computational Complexity

看书

# SVM Regression

In [13]:
from sklearn.svm import LinearSVR

svm_reg = LinearSVR(epsilon=1.5)
svm_reg.fit(X, y)

LinearSVR(C=1.0, dual=True, epsilon=1.5, fit_intercept=True,
     intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=1000,
     random_state=None, tol=0.0001, verbose=0)

In [14]:
from sklearn.svm import SVR

svm_poly_reg = SVR(kernel='poly', degree=2, C=100, epsilon=0.1)
svm_poly_reg.fit(X, y)

SVR(C=100, cache_size=200, coef0=0.0, degree=2, epsilon=0.1, gamma='auto',
  kernel='poly', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

# Under the hood

In [16]:
# TODO