# 乳癌資料庫預測SVM分類
>使用scikit-learn 機器學習套件裡的SVR演算法

* (一)引入函式庫及內建乳癌資料集<br>
引入之函式庫如下<br>
sklearn.datasets: 用來匯入內建之乳癌資料集`datasets.load_breast_cancer()`<br>
sklearn.SVR: 支持向量機回歸分析之演算法<br>
matplotlib.pyplot: 用來繪製影像

In [14]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

## Step1. 下載資料

In [15]:
breast_cancer=datasets.load_breast_cancer()
X=breast_cancer.data
y=breast_cancer.target

## Step2. 區分訓練集與測試集

In [16]:
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3,random_state=0)

## Step3. 建模

In [26]:
#調參：先調gamma，固定C=15
C= [25]
gamma = [0.000005,0.000006,0.000007,0.000008,0.000009,0.00001,0.00002,0.00003,0.00004,0.00005]
for i in range(len(C)):
    for j in range(len(gamma)):
        print(f'C={C[i]},gamma={gamma[j]}')
        clf=svm.SVC(kernel='poly',gamma=gamma[j],C=C[i])
        clf.fit(X_train,y_train)
        clf.predict(X_test)
        print(clf.score(X_train,y_train))
        print(clf.score(X_test, y_test))

C=25,gamma=5e-06
0.9597989949748744
0.9532163742690059
C=25,gamma=6e-06
0.9623115577889447
0.9532163742690059
C=25,gamma=7e-06
0.964824120603015
0.9590643274853801
C=25,gamma=8e-06
0.964824120603015
0.9590643274853801
C=25,gamma=9e-06
0.964824120603015
0.9649122807017544
C=25,gamma=1e-05
0.9673366834170855
0.9766081871345029
C=25,gamma=2e-05
0.9773869346733668
0.9707602339181286
C=25,gamma=3e-05
0.9798994974874372
0.9532163742690059
C=25,gamma=4e-05
0.9798994974874372
0.9532163742690059
C=25,gamma=5e-05
0.9824120603015075
0.9532163742690059


In [27]:
#上面結果為gamma=2e-05為最佳解
#調參：調C，固定gamma=2e-05
C= [20,21,22,23,24,25,26,27,28,29,30]
gamma = [0.00002]
for i in range(len(C)):
    for j in range(len(gamma)):
        print(f'C={C[i]},gamma={gamma[j]}')
        clf=svm.SVC(kernel='poly',gamma=gamma[j],C=C[i])
        clf.fit(X_train,y_train)
        clf.predict(X_test)
        print(clf.score(X_train,y_train))
        print(clf.score(X_test, y_test))

C=20,gamma=2e-05
0.9698492462311558
0.9707602339181286
C=21,gamma=2e-05
0.9698492462311558
0.9707602339181286
C=22,gamma=2e-05
0.9723618090452262
0.9707602339181286
C=23,gamma=2e-05
0.9723618090452262
0.9707602339181286
C=24,gamma=2e-05
0.9773869346733668
0.9707602339181286
C=25,gamma=2e-05
0.9773869346733668
0.9707602339181286
C=26,gamma=2e-05
0.9773869346733668
0.9707602339181286
C=27,gamma=2e-05
0.9773869346733668
0.9649122807017544
C=28,gamma=2e-05
0.9773869346733668
0.9590643274853801
C=29,gamma=2e-05
0.9773869346733668
0.9590643274853801
C=30,gamma=2e-05
0.9773869346733668
0.9590643274853801


## Step4. 預測

```
上面結果為C=25，固定gamma=2e-05為最佳解
```


In [28]:
C= [25]
gamma = [0.00002]
for i in range(len(C)):
    for j in range(len(gamma)):
        print(f'C={C[i]},gamma={gamma[j]}')
        clf=svm.SVC(kernel='poly',gamma=gamma[j],C=C[i])
        clf.fit(X_train,y_train)
        clf.predict(X_test)
        

C=25,gamma=2e-05
0.9773869346733668
0.9707602339181286


## Step5. 準確度分析

In [30]:
print('training score:',clf.score(X_train,y_train))
print('testing score:',clf.score(X_test, y_test))

training score: 0.9773869346733668
testing score: 0.9707602339181286
