### Load Data

Loading the Iris dataset from scikit-learn. Here, the third column represents the petal length, and the fourth column the petal width of the flower samples. The classes are converted to integer labels where 0=Iris-Setosa, 1=Iris-Versicolor, 2=Iris-Virginica.

In [1]:
# Ignore warnings to keep the output clean
import warnings
warnings.filterwarnings('ignore')

# Import libraries
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
from sklearn import datasets
import numpy as np
import pandas as pd
from sklearn import tree
from sklearn import metrics

# Command to display plots inline in Jupyter notebooks
%matplotlib inline

# Load the Iris dataset from scikit-learn
iris = datasets.load_iris()

# Extract features (X) and target variable (y) from the dataset
X = iris.data     #create the features(predictors - input columns)
y = iris.target   #the class column which plays the target role

# Print the unique class labels in the target variable y
print('Class labels:', np.unique(y))

Class labels: [0 1 2]


In [2]:
iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [3]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

Normalize data: the unit of measurement might differ so let’s normalize the data before building the model

In [4]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()

sc.fit(X)
X = sc.transform(X)

Split data into train and test. Whenever we are using the random function, it's advised to use a seed to ensure the reproducibility of the results.

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

Build the KNN classifier and generate the evaluation metrics based on training data and testing data

In [6]:
from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier(n_neighbors=5, p=2, metric='minkowski')
clf.fit(X_train, y_train)

# generate evaluation metrics
print("Train - Accuracy :", metrics.accuracy_score(y_train, clf.predict(X_train)))
print("Train - Confusion matrix :",metrics.confusion_matrix(y_train, clf.predict(X_train)))
print("Train - classification report :", metrics.classification_report(y_train, clf.predict(X_train)))

print("Test - Accuracy :", metrics.accuracy_score(y_test, clf.predict(X_test)))
print("Test - Confusion matrix :",metrics.confusion_matrix(y_test, clf.predict(X_test)))
print("Test - classification report :", metrics.classification_report(y_test, clf.predict(X_test)))


Train - Accuracy : 0.9714285714285714
Train - Confusion matrix : [[34  0  0]
 [ 0 31  1]
 [ 0  2 37]]
Train - classification report :               precision    recall  f1-score   support

           0       1.00      1.00      1.00        34
           1       0.94      0.97      0.95        32
           2       0.97      0.95      0.96        39

    accuracy                           0.97       105
   macro avg       0.97      0.97      0.97       105
weighted avg       0.97      0.97      0.97       105

Test - Accuracy : 0.9777777777777777
Test - Confusion matrix : [[16  0  0]
 [ 0 17  1]
 [ 0  0 11]]
Test - classification report :               precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       1.00      0.94      0.97        18
           2       0.92      1.00      0.96        11

    accuracy                           0.98        45
   macro avg       0.97      0.98      0.98        45
weighted avg       0.98     