# Machine Learning with Support Vector Machines and Parameter Tuning
In this short micro-project, we'll work on classifying flowers from the famous Iris data set into different categories.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns

import matplotlib.pyplot as plt
%matplotlib inline

## Data
Fisher's Iris data set is a multivariate data set introduced by Sir Ronald Fisher in the 1936 as an example of discriminant analysis.

The iris dataset contains measurements for 150 iris flowers from three different species.

The three classes in the Iris dataset:

    Iris-setosa (n=50)
    Iris-versicolor (n=50)
    Iris-virginica (n=50)

The four features of the Iris dataset:

    sepal length in cm
    sepal width in cm
    petal length in cm
    petal width in cm

The dataset is built into seaborn, so we can use the library to import the data.

In [None]:
iris = sns.load_dataset('iris')

## Exploratory Analysis
Let's check out the dataset.

In [None]:
iris.head()

In [None]:
sns.pairplot(iris,hue='species')

A quick look at the pairplot, and we can see that the setosa species seems to be the most separable of the three.

## Model Building
We'll begin by splitting the data into training and test sets.

In [None]:
from sklearn.model_selection import train_test_split

X = iris.drop('species',axis=1)

y = iris['species']

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)

Now its time to train a Support Vector Machine Classifier.

In [None]:
from sklearn.svm import SVC

sv = SVC()

sv.fit(X_train,y_train)

## Predictions and Evaluations

In [None]:
preds = sv.predict(X_test)

In [None]:
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,preds))

In [None]:
print(classification_report(y_test,preds))

And it seems like our model did pretty well!

We can try and improve the results by tuning the parameters for the classifier. Scikit's inbuilt 'GridSearch' module lets us do that automatically, to an extent. Let's try and use that.

## Parameter Tuning using GridSearch

In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
#Defining the initial parameter grid to search in
param_grid = {'C': [0.1,1, 10, 100], 'gamma': [1,0.1,0.01,0.001]}

In [None]:
grid = GridSearchCV(SVC(),param_grid,refit=True)
grid.fit(X_train,y_train)

### New Predictions and Results

In [None]:
grid_predictions = grid.predict(X_test)

print(confusion_matrix(y_test,grid_predictions))

In [None]:
print(classification_report(y_test,grid_predictions))

A little better this time, with only one point that we weren't able to grab. This might be a good thing in real world applications as we don't want a model that overfits to the taining set completely.

This concludes our project!