# 乳癌資料庫預測SVM分類
>使用scikit-learn 機器學習套件裡的SVR演算法

* (一)引入函式庫及內建乳癌資料集<br>
引入之函式庫如下<br>
sklearn.datasets: 用來匯入內建之乳癌資料集`datasets.load_breast_cancer()`<br>
sklearn.SVR: 支持向量機回歸分析之演算法<br>
matplotlib.pyplot: 用來繪製影像

In [441]:
from sklearn import svm
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

## Step1. 下載資料

In [443]:
breast_cancer=datasets.load_breast_cancer()
dir(breast_cancer)

['DESCR',
 'data',
 'data_module',
 'feature_names',
 'filename',
 'frame',
 'target',
 'target_names']

In [445]:
data_X=pd.DataFrame(breast_cancer.data,columns=breast_cancer.feature_names)
data_y=pd.DataFrame(breast_cancer.target,columns=["target"])
#delete error columns
data_X=data_X.drop(data_X.columns[data_X.columns.str.find("error")>0],axis=1)
#check missing values by feature
def checknull(i):
    var0=sum(pd.isnull(data_X[data_X.columns[i]]))
    return data_X.columns[i] + " " +str(var0)
pd.Series (range(0,len(data_X.columns))).apply(checknull)


0                 mean radius 0
1                mean texture 0
2              mean perimeter 0
3                   mean area 0
4             mean smoothness 0
5            mean compactness 0
6              mean concavity 0
7         mean concave points 0
8               mean symmetry 0
9      mean fractal dimension 0
10               worst radius 0
11              worst texture 0
12            worst perimeter 0
13                 worst area 0
14           worst smoothness 0
15          worst compactness 0
16            worst concavity 0
17       worst concave points 0
18             worst symmetry 0
19    worst fractal dimension 0
dtype: object

In [447]:

#colinearity check
data_X.corr()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
mean radius,1.0,0.323782,0.997855,0.987357,0.170581,0.506124,0.676764,0.822529,0.147741,-0.311631,0.969539,0.297008,0.965137,0.941082,0.119616,0.413463,0.526911,0.744214,0.163953,0.007066
mean texture,0.323782,1.0,0.329533,0.321086,-0.023389,0.236702,0.302418,0.293464,0.071401,-0.076437,0.352573,0.912045,0.35804,0.343546,0.077503,0.27783,0.301025,0.295316,0.105008,0.119205
mean perimeter,0.997855,0.329533,1.0,0.986507,0.207278,0.556936,0.716136,0.850977,0.183027,-0.261477,0.969476,0.303038,0.970387,0.94155,0.150549,0.455774,0.563879,0.771241,0.189115,0.051019
mean area,0.987357,0.321086,0.986507,1.0,0.177028,0.498502,0.685983,0.823269,0.151293,-0.28311,0.962746,0.287489,0.95912,0.959213,0.123523,0.39041,0.512606,0.722017,0.14357,0.003738
mean smoothness,0.170581,-0.023389,0.207278,0.177028,1.0,0.659123,0.521984,0.553695,0.557775,0.584792,0.21312,0.036072,0.238853,0.206718,0.805324,0.472468,0.434926,0.503053,0.394309,0.499316
mean compactness,0.506124,0.236702,0.556936,0.498502,0.659123,1.0,0.883121,0.831135,0.602641,0.565369,0.535315,0.248133,0.59021,0.509604,0.565541,0.865809,0.816275,0.815573,0.510223,0.687382
mean concavity,0.676764,0.302418,0.716136,0.685983,0.521984,0.883121,1.0,0.921391,0.500667,0.336783,0.688236,0.299879,0.729565,0.675987,0.448822,0.754968,0.884103,0.861323,0.409464,0.51493
mean concave points,0.822529,0.293464,0.850977,0.823269,0.553695,0.831135,0.921391,1.0,0.462497,0.166917,0.830318,0.292752,0.855923,0.80963,0.452753,0.667454,0.752399,0.910155,0.375744,0.368661
mean symmetry,0.147741,0.071401,0.183027,0.151293,0.557775,0.602641,0.500667,0.462497,1.0,0.479921,0.185728,0.090651,0.219169,0.177193,0.426675,0.4732,0.433721,0.430297,0.699826,0.438413
mean fractal dimension,-0.311631,-0.076437,-0.261477,-0.28311,0.584792,0.565369,0.336783,0.166917,0.479921,1.0,-0.253691,-0.051269,-0.205151,-0.231854,0.504942,0.458798,0.346234,0.175325,0.334019,0.767297


## Step2. 區分訓練集與測試集

In [449]:
X_train, X_test, y_train, y_test = train_test_split(data_X,data_y,test_size=0.3,random_state=101)
y_train=np.array(y_train).ravel()

## Step3. 建模

In [455]:
clf=svm.LinearSVC()
clf.fit(X_train,y_train)



## Step4. 預測

```

```


In [457]:
Z=clf.predict(X_test)

## Step5. 準確度分析

In [459]:

print (f"{clf.score(X_train,y_train):.3f}")
print (f"{clf.score(X_test,y_test):.3f}")

0.935
0.912
