# kNN Implementation and Practice

This is for me to practice my implementation of the k-Nearest-Neighbors algorithm. I will be using a preloaded dataset from the `scikit-learn` library. I will be doing EDA in `pandas` and I will be visualizing the data with `plotly.express`.

In [1]:
# importing the libraries

import pandas as pd
import plotly.express as px
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.datasets import load_iris

In [2]:
# setting load_iris as a dataframe

iris = load_iris()

In [18]:
# converting the data to a dataframe

df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

In [19]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [5]:
# Making a correlation heatmap

corr_matrix = df.corr()

fig = px.imshow(corr_matrix,
                labels = dict(color="correlation"),
                x = list(corr_matrix.columns),
                y = list(corr_matrix.index),
                color_continuous_scale=px.colors.sequential.Magenta,
                zmin = -1, zmax = 1,
                text_auto = True,
                aspect='auto',
                origin = "lower")
fig.update_layout(title="Iris Correlation")
fig.show()

### kNN Implementation

I will now be spliting the data and then creating the model.

In [20]:
# feature/ label selection

X = df.drop(columns='target') # Features
y = df['target'] # Label

In [21]:
# Ensuring that both the feaure and the label are in df form and series form

if isinstance(X, pd.Series):
    X = X.to_frame()
    
if isinstance(y, pd.Series):
    y = pd.Series(y)

In [22]:
# Spliting the data 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

In [23]:
# Creating the model

kNN = KNeighborsClassifier(n_neighbors = 3)
kNN.fit(X_train, y_train)

In [24]:
y_pred = kNN.predict(X_test)
y_pred

array([1, 0, 0, 1, 2, 0, 2, 1, 0, 2, 2, 2, 0, 2, 0, 1, 0, 0, 2, 0, 2, 2,
       2, 2, 1, 1, 2, 2, 2, 1])

In [25]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         9
           1       1.00      0.88      0.93         8
           2       0.93      1.00      0.96        13

    accuracy                           0.97        30
   macro avg       0.98      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30



### Interpretation of Classification Report

The classification report provides detailed metrics for evaluating the performance of a classification model. Here’s a breakdown of the results for each class (0, 1, and 2) as well as the overall performance metrics.

Class-wise Performance
Class 0:

Precision: 1.00
The model correctly identified all instances of class 0 without any false positives.
Recall: 1.00
The model identified all actual instances of class 0 without missing any (no false negatives).
F1-Score: 1.00
The harmonic mean of precision and recall, indicating perfect performance for class 0.
Support: 9
The number of actual instances of class 0 in the test set.
Class 1:

Precision: 0.93
The model correctly identified 93% of the instances it predicted as class 1.
Recall: 0.93
The model correctly identified 93% of the actual instances of class 1.
F1-Score: 0.93
The harmonic mean of precision and recall, indicating very good performance for class 1.
Support: 14
The number of actual instances of class 1 in the test set.
Class 2:

Precision: 0.86
The model correctly identified 86% of the instances it predicted as class 2.
Recall: 0.86
The model correctly identified 86% of the actual instances of class 2.
F1-Score: 0.86
The harmonic mean of precision and recall, indicating good performance for class 2.
Support: 7
The number of actual instances of class 2 in the test set.
Overall Performance
Accuracy: 0.93
The model correctly classified 93% of the instances in the test set.
Macro Average:
Precision: 0.93
Recall: 0.93
F1-Score: 0.93
These metrics are the unweighted mean of the per-class metrics, giving equal weight to each class.
Weighted Average:
Precision: 0.93
Recall: 0.93
F1-Score: 0.93
These metrics take into account the support (the number of true instances) of each class, giving a better indication of the overall model performance, especially when there is an imbalance in class distribution.
Conclusion
The model performs exceptionally well for class 0, with perfect precision, recall, and F1-score.
The performance for classes 1 and 2 is also very good, though slightly lower than for class 0.
Overall, the model has high accuracy and balanced performance across all classes, as indicated by the macro and weighted averages.
This indicates that the k-Nearest Neighbors (kNN) classifier is effective for the Iris dataset, providing reliable and accurate classifications.