# Iris Data Set Classifier 

The main aim of this tutorial is to develop a classification model for the Iris dataset which involves three types of plants that will help us to create an SVM margine classifier 

Let us start by loading the dataset 

In [94]:
#Using the scikit-learn library to import the data set 
from sklearn.datasets import load_iris
import numpy as np
IrisDataSet = load_iris()
IrisDataSetClasses = IrisDataSet.target
print("Classes", np.unique(IrisDataSetClasses))
IrisDataSetCharacteristics = IrisDataSet.data
print("Charactaristics " ,IrisDataSetCharacteristics.size/ IrisDataSet.target.size)

Classes [0 1 2]
Charactaristics  4.0


From the data printed above it is clear that our data consists of 3 classes indicated by the numbers 0 ,1 & 2

Also the number of characteristics printed above indicates that each instance of the dataset (flower) can be described using four features :
* Sepal Length
* Sepal Width
* Petal Length 
* Petal Width

# Normalizing Data

For the next step after successfully loading and examining the data we will then need to normalize the data

* Using the preprocessing scikit-learn library which mainly helps in preparing datasets before 
  applying any process on it  we will import the Standard Scaler
* Standard Scaler : 
                     - Standardize features by removing the mean and scaling to unit variance 
                     - Centering and scaling happen independently on each feature by computing the 
                       relevant statistics on the samples in the training set.
                     - Mean and standard deviation are then stored to be used on later data using the transform method.
                     - Standardization of a dataset is a common requirement for many machine learning estimators: 
                     they might behave badly if the individual feature do not more or less look like standard
                     normally distributed data (e.g. Gaussian with 0 mean and unit variance).


In [95]:
from sklearn.preprocessing import StandardScaler
# Alternative to using the standard scaler
#from sklearn.preprocessing import MinMaxScaler
# Scaler = MinMaxScaler(feature_range=(0, 1), copy=True)
StandardScaler = StandardScaler()
NormalizedDataSet = StandardScaler.fit_transform(IrisDataSetCharacteristics, IrisDataSetClasses)
print("Data was successfully Normalized ")

Data was successfully Normalized 


The next step is to divide our dataset into 2 subsets and shuffle the data to avoid any preference towards one class more than the others


Training Set : Which is the largest chunk of data 80% which our model will be trained on to develop it 

Test Set : Which is the smallest part of the data 20% to examine our model , portion of the data that the model won't be able to 
           see till it is fully developed to make sure our model will work successfully and with high amount of accuracy & react 
           as required to any foreign data

In [96]:
# import the train_test_split function from the model_selection library 
from sklearn.model_selection import train_test_split
# The main objective of this function (train_test_split) : Split arrays or matrices into random train and test subsets
# first argument : array to be shuffled and split
# second argument : unique classes of data
# third argument : portion of dataset assigned for testing 
# forth argument : enable shuffling the data
# fifth argument : random state is the seed used for random number generator 
# 0.2 => 20% test set 
X_train, X_test, y_train, y_test = train_test_split(NormalizedDataSet, IrisDataSetClasses, test_size=0.2,shuffle = True, random_state=0)
print("Data was successfully split & shuffled ")

Data was successfully split & shuffled 


# How to choose the most suitable machine learning algorithm ? 

Following the scikit-learn algorithm cheat sheet we need to know if the number of data is enough 


In [97]:
print("Number of instances", IrisDataSet.target.size)

Number of instances 150


* Confirmed more than 50 samples , since our aim is to develop an SVM margine classifier and the data is labeled and we can predict it's categories we can safely ignore regression , clustering , dimensionally reduction and move on to classification 

* Going deep in classification we find our data points are less that 100,000 samples so our start point is linear SVC
  

# Linear SVC classifier

Stands for :Linear Support Vector Classification


Parameters : 
* loss => Specifies the loss function. ‘hinge’ is the standard SVM loss 
* multi_class => Determines the multi-class strategy ,if y contains more than two classes then assign its value to "ovr" trains n_classes one-vs-rest classifiers
* random_state => The seed of the pseudo random number generator to use when shuffling the data

In [98]:
from sklearn.svm import LinearSVC
LinearSvcClassifier = LinearSVC(loss='hinge', multi_class='ovr', random_state=0)
print("Linear SVC classifier was successfully developed")

Linear SVC classifier was successfully developed


# Optimizing our Hyperparameters

What are hyperparameters ??
- A kind of parameters that cannot be directly learned from the regular training process ,because they tend
  to express “higher-level” properties of the model such as its complexity or how fast it should learn.
  Hyperparameters are usually fixed before the actual training process even begins.
 

Regularization Parameter 'C'

* When we have a high degree linear polynomial that is used to fit a set of points in a linear regression setup, to prevent       overfitting, we use regularization, and we include a lambda parameter in the cost function. This lambda is then used to update   the theta parameters in the gradient descent algorithm.
* In other words one might consider the C as how do we actually care about violating the developed margine 
* In our case the hyperparameter we want to optimize as much as possible is the regularization paramater 'C'
* In order to obtain the most suitable 'C' we will use the GridSearchCV a hyperparameter optimization algortihm that will take 
  as input our model and a range of the regularization parameter to search and obtain the most suitable C
  

# How to calculate C ?

* Unfortunetly there is no clear method was yet develop to ensure a successful and precise calculation for this parameter.
* We will apply GridSearchCV : 
   Exhaustive search over specified parameter values for an estimator which is way to select the best of a family of models, parametrized by a grid of parameters.Alongside with cross validation.
* Cross Validation: which is a method to robustly estimate test-set performance (generalization) of a model. 



In [99]:
#Using the numpy logspace function : Return numbers spaced evenly on a log scale.
# Argument #1: start of the range 
# Argument #2: end of the range 
# Argument #3: step size between elements 
RegularizationParameter = {'C': np.logspace(10^3, 10^10, base=3)}
#print ("Calculated C :", RegularizationParameter)
print ("Range of C was successfully calculated")

Range of C was successfully calculated


In [100]:
from sklearn.model_selection import GridSearchCV
# Argument #1: our previously developed linear SVC classifier
# Argument #2: range of parameters 
# Argument #3: cross-validation generator or an iterable determines the splitting startegy 
#              assigning it to integer, to specify the number of folds in a (Stratified)KFold
GridSearch = GridSearchCV(LinearSvcClassifier, RegularizationParameter, cv=5)
print ("Search was accomplished successfully" )

Search was accomplished successfully


Finally apply the grid search on our training set the previously split 80% in order to choose the best estimator 

In [105]:
GridSearch.fit(X_train , y_train)
#Obtaining the best classifier
ChosenClassifier = GridSearch.best_estimator_
print("Classifier was successfully chosen")

print ("The most suitable value for C is " ,ChosenClassifier.C)

Classifier was successfully chosen
The most suitable value for C is  347.85717768381505


# Concluding the results 

Our model with reacting to the training set (80%) with accuracy :

In [106]:
print('Accuracy of model with the training set is :', ChosenClassifier.score(X_train, y_train))

Accuracy of model with the training set is : 0.9666666666666667


The developed model results in 96.67% accuracy relatively high 

Our model with reacting to the test set (20%) with accuracy :

In [142]:
print('Accuracy of model with the test set is :', ChosenClassifier.score(X_test, y_test))
print ("When applying the model to foreign data it reacts with ", ChosenClassifier.score(X_test, y_test) *100,"% accuracy remarkably high accuracy when the model is generalized"  )

Accuracy of model with the test set is : 1.0
When applying the model to foreign data it reacts with  100.0 % accuracy remarkably high accuracy when the model is generalized



# Now let's try it with kernel =linear 

In other words training words we are training SVM with a linear function kenel

In [143]:
from sklearn import svm
from sklearn.svm import SVC
LinearKernel = {'kernel': ['linear'], 
                     'C': [1, 10, 50,100, 1000]
                    }
gridSearch2 = GridSearchCV(SVC(), LinearKernel, cv=5)
gridSearch2.fit(X_train ,y_train)
ChosenClassifier2 = gridSearch2.best_estimator_

print ("The chosen value of C is " ,ChosenClassifier2.C)
print ("Training accuracy :",ChosenClassifier2.score(X_train, y_train) )

print ("Testing accuracy :",ChosenClassifier2.score(X_test, y_test) )

The chosen value of C is  10
Training accuracy : 0.9666666666666667
Testing accuracy : 1.0



# kernel =rbf

When training an SVM with the Radial Basis Function (RBF) kernel:
two parameters must be considered: 
#1 :
The parameter C, common to all SVM kernels, trades off misclassification of training examples against simplicity of the decision surface. A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly.
#2 :
Gamma defines how much influence a single training example has. The larger gamma is, the closer other examples must be to be affected.

In [144]:
rbfKernel = {'kernel': ['rbf'], 'gamma': [1e-1, 1e-5],
                     'C': [1, 10, 50,100, 1000]
                    }
gridSearch3 = GridSearchCV(SVC(), rbfKernel, cv=5)
gridSearch3.fit(X_train ,y_train)
ChosenClassifier3 = gridSearch3.best_estimator_

print ("The chosen value of C is " ,ChosenClassifier3.C)
print ("Training accuracy :",ChosenClassifier3.score(X_train, y_train) )

print ("Testing accuracy :",ChosenClassifier3.score(X_test, y_test) )

The chosen value of C is  1
Training accuracy : 0.9666666666666667
Testing accuracy : 1.0


# kernel = poly

Training the SVM with polynomial function kernel

In [158]:
polynomialKernel = {'kernel': ['poly'], 'degree' : [3],
                     'C': [1, 10, 50,100, 1000]
                    }
gridSearch4 = GridSearchCV(SVC(), polynomialKernel, cv=5)
gridSearch4.fit(X_train ,y_train)
ChosenClassifier4 = gridSearch4.best_estimator_

print ("The chosen value of C is " ,ChosenClassifier4.C)
print ("Training accuracy :",ChosenClassifier4.score(X_train, y_train) )

print ("Testing accuracy :",ChosenClassifier.score(X_test, y_test) )





The chosen value of C is  10
Training accuracy : 0.9666666666666667
Testing accuracy : 1.0
