## Acquire the data of features and result of sample of flowers

In [39]:
from sklearn import datasets
import pandas as pd

iris = datasets.load_iris()
print(iris)    #This code is used to check the raw data

In [41]:
print(len(iris['data']))
print(iris['feature_names'])
print(iris['target_names'])

150
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
['setosa' 'versicolor' 'virginica']


#### Analysis

* There are five different data in this set, which included 'data', 'target', 'target_names', 'dtype' and 'DESCR'.

* Each elements in 'data' type is a vector with four properties & features. (which is shown in the part of 'Attribute Information')

* There're 150 samples in this set, and there're three different kind of flowers. (Target names)

In [54]:
# Extract all the elements in 'data' type and create a dataframe with module "pandas".
dataOfFW = pd.DataFrame(iris['data'], columns = iris['feature_names'])
print(newDF.head(10))    # Show top 10 rows of dataOfFW.


# Extract all the elements in 'target' type and create a dataframe with module "pandas".
resultOfFW = pd.DataFrame(iris['target'], columns = ['target result'])
print(resultOfFW.head(10))    # Show top 10 rows of resultOfFW.

# Combine the data form and result form
newDF = pd.concat([dataOfFW, resultOfFW], axis = 1)
newDF.head(10)

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   
5                5.4               3.9                1.7               0.4   
6                4.6               3.4                1.4               0.3   
7                5.0               3.4                1.5               0.2   
8                4.4               2.9                1.4               0.2   
9                4.9               3.1                1.5               0.1   

   target result  
0              0  
1              0  
2              0  
3              0  
4              0  
5              0

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target result
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
5,5.4,3.9,1.7,0.4,0
6,4.6,3.4,1.4,0.3,0
7,5.0,3.4,1.5,0.2,0
8,4.4,2.9,1.4,0.2,0
9,4.9,3.1,1.5,0.1,0


## Create the train set and test set

In [72]:
from sklearn.model_selection import train_test_split

# We use this function to create train set and test set randomly.
X_train,X_test, y_train, y_test = train_test_split(newDF[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']], newDF[['target result']], test_size = 0.3, random_state = 0)    



from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
# I need to instantiate StandardScaler first, or I would get some errors.

sc.fit(X_train)                        # Standardize the feature set of sample.
X_train_std = sc.transform(X_train)    # Get the standard deviation of train sample set.
X_test_std = sc.transform(X_test)      # # Get the standard deviation of test sample set.

* Before we use the support vector machine (SVM), we need to create train set and test set from the original sample set. Here we use the module "train_test_split" to do it. More information about this module could be found at: https://blog.csdn.net/CherDW/article/details/54881167

* In order to get the data SVM needs, we need to get the standard deviation of features of train sample set and test sample set. Here we use the module "StandardScaler" to do it. More information about this module could be found at: https://www.cnblogs.com/chaosimple/p/4153167.html

## Use SVM to classify the data and Calculate the accuracy

In [112]:
from sklearn.svm import SVC

kernelMatrix = ['linear', 'poly', 'rbf', 'sigmoid']
accuracyMatrix = []

for u in range(0, len(kernelMatrix)):
    svm = SVC(kernel = kernelMatrix[u], probability = True)    # Create (instantiate) the object of SVM.
    svm.fit(X_train_std, y_train['target result'].values)    # Use SVM to fit the data and find the model.

    predictionResult = svm.predict(X_test_std)    # Predict the result with SVM.
    realResult = y_test['target result'].values
    theSameNum = 0

    # Calculate the accuracy of SVM with different kernel
    for i in range(0, len(realResult)):
        if predictionResult[i] == realResult[i]:
            theSameNum += 1
        else:
            pass    # 'pass' means "doing nothing" in python.

    accuracy = theSameNum/(len(y_test['target result'].values))
    accuracyMatrix.append(accuracy)
    
for v in range(0, len(accuracyMatrix)):
    print("The accuracy of SVM with " + kernelMatrix[v] + " kernel is: " + str(accuracyMatrix[v]))

The accuracy of SVM with linear kernel is: 0.9777777777777777
The accuracy of SVM with poly kernel is: 0.8888888888888888
The accuracy of SVM with rbf kernel is: 0.9777777777777777
The accuracy of SVM with sigmoid kernel is: 0.8666666666666667


* SVC is a kind of SVM.
* I also could use another kernel, which included ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’, to create the model/ More information crelated to this issue could be found at: http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
* More information about SVM and machine learning could be found at the website below: 
    https://machine-learning-python.kspax.io/Classification/ex1_Recognizing_hand-written_digits.html
    https://ithelp.ithome.com.tw/articles/10186905
* The tutorial website I refered to in this file is: https://medium.com/@yehjames/資料分析-機器學習-第3-4講-支援向量機-support-vector-machine-介紹-9c6c6925856b