# Support Vector Machine (SVM) Tutorial Using Iris Dataset 

## Introduction

Using [SVM](https://en.wikipedia.org/wiki/Support_vector_machine) on [Iris Dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set) we will classify the flowers into 3 classes. 

>**Note:** Code and Markdown cells can be executed using the **Shift + Enter** keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.


## Importing All Libraries
I prefere that all importings to be first but ofcourse you can import anywhere (This is Python, not as restrected as java :D )

In [1]:
# Import libraries necessary for this project

import numpy as np

from sklearn                 import datasets
from sklearn                 import preprocessing
from sklearn.preprocessing   import StandardScaler
from sklearn.preprocessing   import Normalizer
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.svm             import LinearSVC
from sklearn.svm             import SVC

print("\nImporting ✓\n")


Importing ✓



## Loading the Data
Loading the data and getting information about it.

In [2]:
# Load the Iris dataset

iris     = datasets.load_iris()
features = iris.data
classes  = iris.target

print("\nLoading ✓\n")
print("Data size   : ", classes.size)
print("Types     : ", (iris.target_names ))
print("Features : ", (iris.feature_names))



Loading ✓

Data size   :  150
Types     :  ['setosa' 'versicolor' 'virginica']
Features :  ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


## Normalizing and Splitting the Data

To remove any bias that may affect the output.

In [3]:
# Normalizing the data
scaler     = StandardScaler()
normalizer = scaler.fit_transform(features, classes)

print("\nNormalizing ✓\n")

train_set_1, test_set_1, train_set_2, test_set_2 = train_test_split(normalizer, classes, test_size=0.2, random_state=0)

print("\nSplitting ✓\n")


Normalizing ✓


Splitting ✓



## Selecting a Model
Using this [Cheat sheet](http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html) from [scikit-learn](http://scikit-learn.org/stable/index.html), one can directly chooses [Linear SVC](http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html) and so did I.

### Linear SVC
The data is only 150 which is small (in ML world) so using a Linear classifier we run grid search on it.

In [4]:
LSVC   = LinearSVC(class_weight=None, loss='hinge', multi_class='ovr', random_state=0)

# TODO : Change the range to see the effect on the output ( best result I got (
# The best C is  :  490.7265473233038
# Training Score :  0.9666666666666667
# Testing Score  :  1.0))

CRange = {'C': np.logspace(10^-7, 10^20, base=2)}
GSVC   = GridSearchCV(LSVC, CRange, cv=5)

GSVC.fit(train_set_1, train_set_2)

best_E = GSVC.best_estimator_

print("The best C is  : ", best_E.C)
print("Training Score : ", best_E.score(train_set_1, train_set_2))
print("Testing Score  : ", best_E.score(test_set_1, test_set_2))
print("\nLinear SVC ✓\n")

The best C is  :  490.7265473233038
Training Score :  0.9666666666666667
Testing Score  :  1.0

Linear SVC ✓



### Polynomial Kernel


In [5]:
# TODO change the parameters to see the change in the output 
parameters = {'C': np.logspace(-10, 11, base=2),'degree': np.arange(2,3),'coef0': np.arange(2,3)}
        
GSVC1 = GridSearchCV(SVC(kernel='poly', gamma='auto', random_state=0), parameters, cv=5 )

GSVC1.fit(train_set_1, train_set_2)

best_E = GSVC1.best_estimator_

print("The best C is  : ", best_E.C)
print("Training Score : ", best_E.score(train_set_1, train_set_2))
print("Testing Score  : ", best_E.score(test_set_1, test_set_2))
print("\nPolynomial SVC ✓\n")

The best C is  :  17.665432218780992
Training Score :  0.9833333333333333
Testing Score  :  1.0

Polynomial SVC ✓

