# Assignment 4 - Support Vector Machines

Just follow along with the notebook and instructions below. You will analyze the famous iris data set!

## The Data
(http://en.wikipedia.org/wiki/Iris_flower_data_set). 

The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by Sir Ronald Fisher in the 1936 as an example of discriminant analysis. 

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor), so 150 total samples. Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

The three classes in the Iris dataset:

    Iris-setosa (n=50)
    Iris-versicolor (n=50)
    Iris-virginica (n=50)

The four features of the Iris dataset:

    sepal length in cm
    sepal width in cm
    petal length in cm
    petal width in cm

## Get the data


In [1]:
import numpy as np
import pandas as pd

In [2]:
from sklearn import datasets

iris = datasets.load_iris()


In [3]:
X = pd.DataFrame(iris.data, columns=iris.feature_names)
X.shape

(150, 4)

In [4]:
y = pd.DataFrame(iris.target, columns=['type'])
y.shape

(150, 1)

# Train Test Split

**Split your data into a training set and a testing set.**

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# Train a Model

Now its time to train a Support Vector Machine Classifier. 

**Call the SVC() model from sklearn and fit the model to the training data.**

In [6]:
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

In [7]:
svm_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('svm_clf', SVC(kernel='poly', degree=3, coef0=1,C=5))
])

In [8]:
svm_clf.fit(X_train,y_train.values.ravel())

Pipeline(memory=None,
         steps=[('scaler',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('svm_clf',
                 SVC(C=5, break_ties=False, cache_size=200, class_weight=None,
                     coef0=1, decision_function_shape='ovr', degree=3,
                     gamma='scale', kernel='poly', max_iter=-1,
                     probability=False, random_state=None, shrinking=True,
                     tol=0.001, verbose=False))],
         verbose=False)

## Model Evaluation

**Now get predictions from the model and create a confusion matrix and a classification report.**

In [9]:
y_pred = svm_clf.predict(X_train)

In [10]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_train, y_pred)

array([[31,  0,  0],
       [ 0, 34,  1],
       [ 0,  0, 34]], dtype=int64)

In [11]:
from sklearn.metrics import classification_report
print(classification_report(y_train, y_pred, target_names=iris.target_names))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        31
  versicolor       1.00      0.97      0.99        35
   virginica       0.97      1.00      0.99        34

    accuracy                           0.99       100
   macro avg       0.99      0.99      0.99       100
weighted avg       0.99      0.99      0.99       100



Wow! You should have noticed that your model was pretty good! Let's see if we can tune the parameters to try to get even better (unlikely, and you probably would be satisfied with these results in real like because the data set is quite small, but I just want you to practice using GridSearch.

## Gridsearch Practice

**Import GridsearchCV from SciKit Learn.**

In [12]:
from sklearn.model_selection import GridSearchCV

param_grid = [
    {'C': [1, 10], 'kernel': ('linear', 'rbf','poly'), 'degree': [3,15], 'coef0': [1,50]}
  ]

In [13]:
grid_search = GridSearchCV(SVC(), param_grid, cv=2,
                           return_train_score=True)


**Create a GridSearchCV object and fit it to the training data.**

In [14]:
grid_search.fit(X_train, y_train.values.ravel())

GridSearchCV(cv=2, error_score=nan,
             estimator=SVC(C=1.0, break_ties=False, cache_size=200,
                           class_weight=None, coef0=0.0,
                           decision_function_shape='ovr', degree=3,
                           gamma='scale', kernel='rbf', max_iter=-1,
                           probability=False, random_state=None, shrinking=True,
                           tol=0.001, verbose=False),
             iid='deprecated', n_jobs=None,
             param_grid=[{'C': [1, 10], 'coef0': [1, 50], 'degree': [3, 15],
                          'kernel': ('linear', 'rbf', 'poly')}],
             pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
             scoring=None, verbose=0)

In [15]:
grid_search.best_estimator_

SVC(C=1, break_ties=False, cache_size=200, class_weight=None, coef0=1,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [16]:
grid_search.best_params_

{'C': 1, 'coef0': 1, 'degree': 3, 'kernel': 'linear'}

**Now take the best grid model and create some predictions using the test set and create classification reports and confusion matrices for them. Were you able to improve?**

In [17]:
y_pred_gs = grid_search.best_estimator_.predict(X_train)

In [18]:
# from sklearn.metrics import confusion_matrix
confusion_matrix(y_train, y_pred_gs)

array([[31,  0,  0],
       [ 0, 33,  2],
       [ 0,  1, 33]], dtype=int64)

In [19]:
# from sklearn.metrics import classification_report
print(classification_report(y_train, y_pred_gs, target_names=iris.target_names))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        31
  versicolor       0.97      0.94      0.96        35
   virginica       0.94      0.97      0.96        34

    accuracy                           0.97       100
   macro avg       0.97      0.97      0.97       100
weighted avg       0.97      0.97      0.97       100



## Decision Tree Classifier

**Build a decision tree classifier. Repeat the same steps. Which model (SVM or decision tree) is better?**

In [20]:
from sklearn.tree import DecisionTreeClassifier

In [21]:
tree_clf = DecisionTreeClassifier( max_depth=2,random_state=42)
tree_clf.fit(X_train, y_train)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=2, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=42, splitter='best')

In [22]:
y_pred_dt = tree_clf.predict(X_train)

In [23]:
confusion_matrix(y_train, y_pred_dt)

array([[31,  0,  0],
       [ 0, 34,  1],
       [ 0,  4, 30]], dtype=int64)

In [24]:
print(classification_report(y_train, y_pred_dt, target_names=iris.target_names))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        31
  versicolor       0.89      0.97      0.93        35
   virginica       0.97      0.88      0.92        34

    accuracy                           0.95       100
   macro avg       0.95      0.95      0.95       100
weighted avg       0.95      0.95      0.95       100



In [25]:
param_grid = [
    # try 12 (3×4) combinations of hyperparameters
    {'max_features': [1,2,3,4],'max_depth': [2,3,4],}
  ]

# train across 3 folds, that's a total of 12 rounds of training 
grid_search = GridSearchCV(DecisionTreeClassifier(random_state=42), param_grid, cv=3)


In [26]:
grid_search.fit(X_train, y_train.values.ravel())

GridSearchCV(cv=3, error_score=nan,
             estimator=DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None,
                                              criterion='gini', max_depth=None,
                                              max_features=None,
                                              max_leaf_nodes=None,
                                              min_impurity_decrease=0.0,
                                              min_impurity_split=None,
                                              min_samples_leaf=1,
                                              min_samples_split=2,
                                              min_weight_fraction_leaf=0.0,
                                              presort='deprecated',
                                              random_state=42,
                                              splitter='best'),
             iid='deprecated', n_jobs=None,
             param_grid=[{'max_depth': [2, 3, 4],
                          'max

In [27]:
grid_search.best_estimator_

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=3, max_features=3, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=42, splitter='best')

In [28]:
grid_search.best_params_

{'max_depth': 3, 'max_features': 3}

In [29]:
y_pred_gs_dt = grid_search.best_estimator_.predict(X_train)

In [30]:
confusion_matrix(y_train, y_pred_gs_dt)

array([[31,  0,  0],
       [ 0, 34,  1],
       [ 0,  2, 32]], dtype=int64)

In [31]:
print(classification_report(y_train, y_pred_gs_dt, target_names=iris.target_names))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        31
  versicolor       0.94      0.97      0.96        35
   virginica       0.97      0.94      0.96        34

    accuracy                           0.97       100
   macro avg       0.97      0.97      0.97       100
weighted avg       0.97      0.97      0.97       100



In [32]:
print(classification_report(y_train, y_pred, target_names=iris.target_names))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        31
  versicolor       1.00      0.97      0.99        35
   virginica       0.97      1.00      0.99        34

    accuracy                           0.99       100
   macro avg       0.99      0.99      0.99       100
weighted avg       0.99      0.99      0.99       100



SVM Perfroms better