# SVM
Support vector machine支持向量机，是深度学习流行前效果最好的模型。

基本原理是二分类线性分类器，但也可以解决多分类问题，非线性问题和回归问题。
```
class sklearn.svm.SVC(
            C=1.0,         # 错误样本的惩罚函数
            kernel='rbf',    # 使用何种核算法.linear线性、poly多项式、rbf高斯、sigmod、precomputed自定义
            degree=3,     # 多项式核函数的阶数
            gamma='auto',   # kernel='rbf'、'poly'或'sigmoid'时的kernel系数。默认1/n_features
            coef0=0.0,     # kernel函数的常数项
            shrinking=True, 
            probability=False,  # 是否估计概率，会增加计算时间
            tol=0.001,      # 误差项达到指定值时则停止训练，默认为0.001
            cache_size=200, 
            class_weight=None, 
            verbose=False, 
            max_iter=-1, 
            decision_function_shape='ovr', 
            random_state=None)

```
* support_vectors 支持向量
* support_ 支持向量的索引
* n_support_ 个数
## 参考
* https://blog.csdn.net/xiaodongxiexie/article/details/70667101
* https://www.bilibili.com/video/BV12P411774P

## SVM分类

In [4]:
import numpy as np
import pandas as pd
data = pd.read_csv('/data/Iris.csv')
X = data.iloc[:,1:-1]
y = data.iloc[:,-1]
y1 = y.map({'Iris-setosa' : 0, 'Iris-versicolor' : 1, 'Iris-virginica' : 2})
y1.drop_duplicates()
X.head()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [13]:
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler #训练前先做标准化
from sklearn.model_selection import train_test_split

std = StandardScaler()
X_std = std.fit_transform(X)

train_X, test_X, train_y,test_y = train_test_split(X_std, y1, test_size = 0.3, random_state=0)

In [14]:
svm_classification = SVC()
svm_classification.fit(train_X, train_y)

In [15]:
svm_classification.score(test_X, test_y)

0.9777777777777777

In [19]:
svm_classification.predict(X_std[:10])

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)

In [17]:
np.asarray(y1[:10])

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)

## SVM回归
如果不做标准化，模型效果会特别差

In [56]:
boston = pd.read_csv('/data/boston_housing.csv')
boston.head()

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.9,5.33,36.2


In [57]:
X_bos = boston.iloc[:,:-1]
y_bos = boston.iloc[:,-1]

In [58]:
from sklearn.svm import SVR
std_bos = StandardScaler()
X_bos_std = std_bos.fit_transform(X_bos)

In [59]:
train_X1, test_X1, train_y1, test_y1 = train_test_split(X_bos_std, y_bos, test_size = 0.3, random_state =0)

In [60]:
train_X2, test_X2, train_y2, test_y2 = train_test_split(X_bos, y_bos, test_size = 0.3, random_state =0)
# 不做标准化

In [61]:
svm_regression_1 = SVR()
svm_regression_1.fit(train_X2, train_y2)
svm_regression_1.score(test_X2, test_y2)
# 不做标准化的话，效果会非常差

0.1811277097860169

In [62]:
svm_regression = SVR()
svm_regression.fit(train_X1, train_y1)
svm_regression.score(test_X1, test_y1)
# 虽然但是，做了标准化效果也不太好，数据问题

0.5556920462148689

## 模型调参-网格搜索
svm对于标准化很敏感，对于参数也很敏感。用GridSearchCV网格搜索和交叉验证。
* 网格搜索:在参数的排列组合中找到最好的参数组合

In [63]:
from sklearn.model_selection import GridSearchCV
params = {
    'kernel':('linear','rbf','poly'),
    'C':[0.01,0.1,0.5,1,2,10,100]
}
model = GridSearchCV(svm_classification, param_grid = params, cv=10)
# cv=Cross validation交叉验证，10表示10折
model.fit(X,y1)
print('最好的参数组合:', model.best_params_)
print('最好的score:', model.best_score_)

最好的参数组合: {'C': 0.1, 'kernel': 'poly'}
最好的score: 0.9866666666666667


In [64]:
params_1 = {
    'kernel':('linear','rbf','poly'),
    'C':[0.01,0.1,0.5,1,2,10,100]
}
model_1 = GridSearchCV(svm_regression, param_grid = params_1, cv=10)
model_1.fit(X_bos_std, y_bos)
print('最好的参数组合:', model_1.best_params_)
print('最好的score:', model_1.best_score_)

最好的参数组合: {'C': 10, 'kernel': 'rbf'}
最好的score: 0.5241088776988827
