**<center><h1>Iris Data Set</h1></center>**

We will use the scikit-learn library to build the decision tree model. We will be using the iris dataset to build a decision tree classifier. The data set contains information of 3 classes of the iris plant with the following attributes: 
    - sepal length 
    - sepal width 
    - petal length 
    - petal width 
- class: 

        Iris Setosa 
        Iris Versicolour 
        Iris Virginica

The task is to predict the class of the iris plant based on the attributes.

In [3]:
# Importing the required Package
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings("ignore")

from sklearn import datasets

from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

from sklearn.metrics import accuracy_score,confusion_matrix
from sklearn.metrics import classification_report

#Loading the iris data
data = datasets.load_iris()
print('Classes to predict: ', data.target_names)

Classes to predict:  ['setosa' 'versicolor' 'virginica']


There are three classes of iris plants: 'setosa', 'versicolor' and 'virginica'. Now, we have imported the iris data in the variable 'data'. We will now extract the attribute data and the corresponding labels. We can extract the attributes and labels by calling .data and .target as shown below:

In [4]:
#Extracting data attributes
X = data.data
### Extracting target/ class labels
y = data.target

print('Number of examples in the data:', X.shape[0])

Number of examples in the data: 150


There are 150 examples/ samples in the data. The variable 'X' contains the attributes to the iris plant. The cell below shows the 4 attributes of the first four iris plants.

Now that we have extracted the data attributes and corresponding labels, we will split them to form train and test datasets. For this purpose, we will use the scikit-learn's 'train_test_split' function, which takes in the attributes and labels as inputs and produces the train and test sets.

In [5]:
#Using the train_test_split to create train and test sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 47, test_size = 0.25)

<h1><center>Parameter Tunning</center></h1>

<h3>Grid Search</h3>

<h3>Parameters:</h3>

- n_neighbors: int, default=5:
    - Number of neighbors to use by default for kneighbors queries.

- leaf_size: int, default=30:
    - Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

- p : int, default=2
    - Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

In [6]:

#List Hyperparameters that we want to tune.
leaf_size = list(range(1,50))
n_neighbors = list(range(1,30))
p=[1,2]

#Convert to dictionary
hyperparameters = dict(leaf_size=leaf_size, n_neighbors=n_neighbors, p=p)

#Create new KNN object
GridSearchKNNClassifier = KNeighborsClassifier()

#Use GridSearch
clf = GridSearchCV(GridSearchKNNClassifier, hyperparameters, cv=10)
#Fit the model
best_model = clf.fit(X_train,y_train)

#Print The value of best Hyperparameters
print('Best leaf_size:', 
      best_model.best_estimator_.get_params()['leaf_size'])
print('Best p:', 
      best_model.best_estimator_.get_params()['p'])
print('Best n_neighbors:', 
      best_model.best_estimator_.get_params()['n_neighbors'])

Best leaf_size: 1
Best p: 2
Best n_neighbors: 10


In [12]:
# Train the model from best parameter obtained from grid
# search optimization technique.
model = KNeighborsClassifier(n_neighbors=10,leaf_size = 1, p = 2)

model.fit(X_train,y_train)

y_predict=model.predict(X_test)
print("Accuracy of KNN model for classifying iris species",
      accuracy_score(y_test,y_predict))

Accuracy of KNN model for classifying iris species 0.9736842105263158


In [13]:
print(classification_report(y_test,y_predict))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       0.89      1.00      0.94         8
           2       1.00      0.93      0.97        15

    accuracy                           0.97        38
   macro avg       0.96      0.98      0.97        38
weighted avg       0.98      0.97      0.97        38



<h3>Random Search</h3>

In [14]:
# Applying Random Search
random_search_knn = RandomizedSearchCV(KNeighborsClassifier(), 
                                      hyperparameters, cv=10)

random_search_knn.fit(X_train,y_train)
            
    
#Print The value of best Hyperparameters
print('Best leaf_size:', 
      best_model.best_estimator_.get_params()['leaf_size'])
print('Best p:', 
      best_model.best_estimator_.get_params()['p'])
print('Best n_neighbors:', 
      best_model.best_estimator_.get_params()['n_neighbors'])

Best leaf_size: 1
Best p: 2
Best n_neighbors: 10


In [15]:
# Train the model from best parameter obtained from grid search 
# optimization technique.
model=KNeighborsClassifier(n_neighbors=10,leaf_size = 1, p = 2)

model.fit(X_train,y_train)

y_predict=model.predict(X_test)
print("Accuracy of KNN model for classifying iris species",
      accuracy_score(y_test,y_predict))

Accuracy of KNN model for classifying iris species 0.9736842105263158


In [16]:
print(classification_report(y_test,y_predict))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       0.89      1.00      0.94         8
           2       1.00      0.93      0.97        15

    accuracy                           0.97        38
   macro avg       0.96      0.98      0.97        38
weighted avg       0.98      0.97      0.97        38

