<h1>Iris DataSet</h1>

<h3>Loading the dataset</h3>

<p>Load the iris dataset and its corresponding data and targets</p>

In [219]:
from sklearn.datasets import load_iris
iris = load_iris()

In [220]:
# Load the data into the variable x
X = iris.data

# Load the target into the variable y
y = iris.target

Now we split the data into two sets, one for training and one for testing for later use 

In [221]:
from sklearn.model_selection  import train_test_split


# use train/test split with different random_state values 
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=4, test_size=0.2)

<h3>Learning data using LinearSVC Model </h3>

<h5>Choosing paramaters using grid search</h5>

<p>We are going to brute force the hyperparamaters of the LinearSVC model using the gridSearch class<br>First, we need to choose the set of hyper-paramaters for the grid to bruteforce upon<br>Then, we pass these paramters along with the model type to the grid class.</p>

In [222]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import LinearSVC
import numpy as np


# choose possible paramaters
param_grid = {
                'C':[1,10,100,300,500,700,1000],
                'tol':[1e-4,1e-5,1e-6]
             }

# instantiate the grid
# the cv param indicates the number of folds and the scoring param indicates the scoring stratetgy
grid = GridSearchCV(LinearSVC(random_state=1), param_grid, cv=5, scoring='accuracy', return_train_score=True)


# fit the grid with data
# note that we don't need to do any splitting since the gridSearch does that for us
grid.fit(X, y)

GridSearchCV(cv=5, error_score='raise',
       estimator=LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=1, tol=0.0001,
     verbose=0),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'C': [1, 10, 100, 300, 500, 700, 1000], 'tol': [0.0001, 1e-05, 1e-06]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring='accuracy', verbose=0)

<p>Next, we print out the results</p>

In [223]:
# print the mean scores of the 12 different combinations of hyperparamaters over 10 folds

print("mean_test_score:",grid.cv_results_['mean_test_score'],'\n')
print("std_test_score:",grid.cv_results_['std_test_score'],'\n')
print("mean_train_score:",grid.cv_results_['mean_train_score'],'\n')
print("std_train_score:",grid.cv_results_['std_train_score'])

mean_test_score: [ 0.96666667  0.96666667  0.96666667  0.94666667  0.94        0.96        0.94
  0.92666667  0.96        0.93333333  0.94        0.88666667  0.89333333
  0.87333333  0.93333333  0.9         0.88666667  0.92666667  0.90666667
  0.87333333  0.88666667] 

std_test_score: [ 0.0421637   0.0421637   0.0421637   0.06182412  0.04898979  0.03887301
  0.05734884  0.08793937  0.03265986  0.03651484  0.05734884  0.07483315
  0.09285592  0.06798693  0.04714045  0.0843274   0.05811865  0.04898979
  0.09043107  0.06798693  0.10873004] 

mean_train_score: [ 0.96666667  0.96666667  0.96833333  0.955       0.94166667  0.95666667
  0.93333333  0.94833333  0.94666667  0.935       0.94666667  0.89333333
  0.91        0.91833333  0.93333333  0.90166667  0.92333333  0.92833333
  0.90666667  0.925       0.885     ] 

std_train_score: [ 0.01178511  0.01178511  0.01105542  0.01545603  0.00912871  0.01779513
  0.04013865  0.03265986  0.04459696  0.03045944  0.02718251  0.07644897
  0.05661763  0

<p>Then we show the best hyper-paramaters for our model along with the best scores </p>

In [225]:
# print the best hyper-paramaters
print("best paramaters : " ,grid.best_params_)
print("best score : ",grid.best_score_)

best paramaters :  {'C': 1, 'tol': 0.0001}
best score :  0.966666666667


In [226]:
model = grid.best_estimator_
model.fit(X_train,y_train)
model.score(X_test,y_test)

0.93333333333333335

<h3>Learning data using SVC Model</h3>

<p>We will repeat the whole process, and the only difference is the hyperparamaters given to the gridSearch instance.</p>

In [238]:
from sklearn.svm import SVC

# choose possible paramaters
param_grid = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                    {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

# instantiate the grid
# the cv param indicates the number of folds and the scoring param indicates the scoring stratetgy
grid = GridSearchCV(SVC(random_state=1), param_grid, cv=10, scoring='accuracy', return_train_score=True)


# fit the grid with data
grid.fit(X, y)

GridSearchCV(cv=10, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=1, shrinking=True,
  tol=0.001, verbose=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid=[{'kernel': ['rbf'], 'gamma': [0.001, 0.0001], 'C': [1, 10, 100, 1000]}, {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}],
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring='accuracy', verbose=0)

In [239]:
# print the best hyper-paramaters
print("best paramaters : " ,grid.best_params_)
print("best score : ",grid.best_score_)

best paramaters :  {'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}
best score :  0.98


In [240]:
model = grid.best_estimator_
model.fit(X_train,y_train)
model.score(X_test,y_test)

0.96666666666666667