# **K-Nearest Neighbour Algorithm for Classification**

In this lab, we will implement the K-Nearest Neighbour algorithm to classify a given dataset into different classes. The K-Nearest Neighbour algorithm is a popular machine learning algorithm used for classification and regression tasks. It is a non-parametric algorithm that does not make any assumptions about the underlying distribution of the data. Instead, it uses the distance between the data points to determine their similarity and classify them into different classes.

***About Dataset***

We are using **Iris dataset** in this lab. The Iris dataset contains information about three different species of iris flowers - Iris Setosa, Iris Versicolour, and Iris Virginica. The dataset has four features - sepal length, sepal width, petal length, and petal width. Each data point represents an iris flower and is labeled with its corresponding species.

**Import necessary libraries**

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
from sklearn.metrics import accuracy_score, confusion_matrix,classification_report

***Load the iris dataset***

In [None]:
iris = load_iris()

**Preview the data**

In [None]:
data = pd.DataFrame(iris["data"], columns=iris["feature_names"])
data

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [None]:
data.shape

(150, 4)

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
dtypes: float64(4)
memory usage: 4.8 KB


***Split data into training and testing set***

In [None]:
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(120, 4)
(120,)
(30, 4)
(30,)


***Train K-Nearest Neighbour model***

In [None]:
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

***Make predictions on the test data***

In [None]:
predictions = knn.predict(X_test)

***Evaluate Model Performance***

In [None]:
# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, predictions)

# Print the accuracy of the classifier
print("Accuracy:", accuracy)

Accuracy: 0.9666666666666667


In [None]:
# Print the confusion matrix of the classifier
print("Confusion Matrix:")
print(confusion_matrix(y_test, predictions))


Confusion Matrix:
[[ 7  0  0]
 [ 0 14  0]
 [ 0  1  8]]


In [None]:
clas = classification_report(y_test, predictions)
print(clas)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         7
           1       0.93      1.00      0.97        14
           2       1.00      0.89      0.94         9

    accuracy                           0.97        30
   macro avg       0.98      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30



In [None]:
# Print the correct and wrong predictions
for i in range(len(predictions)):
    if predictions[i] == y_test[i]:
        print(f"Correct prediction: Actual label = {y_test[i]}, Predicted label = {predictions[i]}")
    else:
        print(f"Wrong prediction: Actual label = {y_test[i]}, Predicted label = {predictions[i]}")


Correct prediction: Actual label = 1, Predicted label = 1
Correct prediction: Actual label = 1, Predicted label = 1
Wrong prediction: Actual label = 2, Predicted label = 1
Correct prediction: Actual label = 0, Predicted label = 0
Correct prediction: Actual label = 1, Predicted label = 1
Correct prediction: Actual label = 0, Predicted label = 0
Correct prediction: Actual label = 2, Predicted label = 2
Correct prediction: Actual label = 1, Predicted label = 1
Correct prediction: Actual label = 0, Predicted label = 0
Correct prediction: Actual label = 0, Predicted label = 0
Correct prediction: Actual label = 1, Predicted label = 1
Correct prediction: Actual label = 1, Predicted label = 1
Correct prediction: Actual label = 1, Predicted label = 1
Correct prediction: Actual label = 2, Predicted label = 2
Correct prediction: Actual label = 0, Predicted label = 0
Correct prediction: Actual label = 0, Predicted label = 0
Correct prediction: Actual label = 2, Predicted label = 2
Correct predicti