# 线性SVM分类
* 大间隔分类
* 对特征缩放非常敏感（类似可以加宽街道的宽度）
* 如果SVM过拟合可以降低超参数C来对其进行正则化

In [11]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
from sklearn.datasets import make_moons
from sklearn.preprocessing import PolynomialFeatures
from sklearn.svm import SVC
#核技巧
from sklearn.svm import LinearSVR
#SVM回归
from sklearn.svm import SVR
#核技巧

In [5]:
iris = datasets.load_iris()
iris.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])

In [6]:
x = iris['data'][:, (2, 3)]
y = (iris['target'] == 2).astype(np.float64)

svm_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('Linear_svc', LinearSVC(C = 1, loss='hinge'))
])

svm_clf.fit(x, y)

Pipeline(steps=[('scaler', StandardScaler()),
                ('Linear_svc', LinearSVC(C=1, loss='hinge'))])

In [7]:
svm_clf.predict([[5.5,1.7]])

array([1.])

与Logistic不同，SVM分类器不会输出每个类的概率

# 非线性SVM分类

* 当数据集线性不可分时，可以添加更多特征。

In [7]:
x, y = make_moons(n_samples=100, noise=0.15)
#卫星数据集：用于二元分类的小数据集，其中数据点的形状为两个交织的半圆。
poly_svm_clf = Pipeline([
    ('Poly_fea', PolynomialFeatures(degree=3)),
    ('scaler', StandardScaler()),
    ('Linear_svc', LinearSVC(C = 1, loss='hinge'))
])
poly_svm_clf.fit(x, y)

Pipeline(steps=[('Poly_fea', PolynomialFeatures(degree=3)),
                ('scaler', StandardScaler()),
                ('Linear_svc', LinearSVC(C=1, loss='hinge'))])

### 多项式内核

SVM有一种核技巧，它产生的结果就跟添加了许多多项式特征，但实际上并不是真的添加。

In [11]:
poly_kernel_svm_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('Linear_svc', SVC(kernel = 'poly', degree=3, coef0=1, C=5))
])
poly_kernel_svm_clf.fit(x, y)
#使用3阶多项式内核训练SVM分类器，如果模型过拟合，应降低多项式阶数，如果欠拟合，可以尝试使之提升

Pipeline(steps=[('scaler', StandardScaler()),
                ('Linear_svc', SVC(C=5, coef0=1, kernel='poly'))])

寻找正确超参数值的常用方法为网络搜索

* 添加相似特征

### 高斯RBF内核

In [8]:
rbf_kernel_svm_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('Linear_svc', SVC(kernel = 'rbf', gamma=5, C=0.001))
])
rbf_kernel_svm_clf.fit(x, y)

Pipeline(steps=[('scaler', StandardScaler()),
                ('Linear_svc', SVC(C=0.001, gamma=5))])

# SVM回归

In [10]:
svm_reg = LinearSVR(epsilon = 1.5)
svm_reg.fit(x , y)

LinearSVR(epsilon=1.5)

In [12]:
svm_poly_reg = SVR(kernel='poly', degree=2, C=100, epsilon=0.1)
svm_poly_reg.fit(x, y)

SVR(C=100, degree=2, kernel='poly')