# 🚀 Objective: KNN Classification Implementation
# 💡 Own Code: Python Implementation for Grasping the Concept
# 🎯 Purpose: Mastering the Essentials of KNN Classification!"

![image-2.png](attachment:image-2.png)

![image.png](attachment:image.png)

# 1. Load Python Modules

In [89]:
import pandas as pd
import numpy as np
from tabulate import tabulate

# 2. Read the Dataset from CSV file  - Using Pandas

In [90]:
# Load the Iris dataset
data = pd.read_csv("Iris.csv")
data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


# 3. Basic Inspection on given dataset

In [91]:
def basic_table_inspection(table):
    """baisc data inspection of given dataset"""
    print("Top 5 Sample of dataset")
    print(table.head())
    
    print()
    print("Bottom 5 Sample of dataset")
    print(table.tail())
    
    print()
    print("Column - Names of Given dataset")
    print(table.columns)
    
    print()
    print("Shape(rows x columns) - of Given dataset")
    print(table.shape)
    
    print()
    print("Data types - Column Names")
    print(table.dtypes)
    
    print()
    print("Summry of dataset")
    print(table.info())
    
    print()
    print("To see the count of null/nan values in columns of dataset")
    print(table.isnull().value_counts())
    
    print()
    print("Dataset Summary ")
    print(table.describe())
    print()
basic_table_inspection(data)

Top 5 Sample of dataset
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

Bottom 5 Sample of dataset
     sepal_length  sepal_width  petal_length  petal_width    species
145           6.7          3.0           5.2          2.3  virginica
146           6.3          2.5           5.0          1.9  virginica
147           6.5          3.0           5.2          2.0  virginica
148           6.2          3.4           5.4          2.3  virginica
149           5.9          3.0           5.1          1.8  virginica

Column - Names of Given dataset
Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')

Shape(

# 4. KNN classification with Euclidean distance - Own Code

![image.png](attachment:image.png)

In [92]:
import math
from statistics import mode
def euclidean_distance(point1 , point2):
    p1 = np.array(point1)
    p2 = np.array(point2)
    diff = p1-p2
    t=np.square(diff)
    s = np.sum(t)
    return np.sqrt(s)

def knn_classification(df, test_point, k=3):
    distance = []
    train = df.copy()
    for i in range(len(train)):
        x=train.iloc[i,:-1]
        d=euclidean_distance(x,test_point)
        distance.append(d)
        
    train['distance']=distance
    train=train.sort_values(by=['distance'])
    k_nearest_neighbors = list(train.iloc[:,-2].head(k))
    #print(k_nearest_neighbors)

    predicted_class = mode(k_nearest_neighbors)
    return predicted_class

# 5. Test the Code / Model

In [93]:
predicted_class = knn_classification(data, [5.1 ,3.5 , 1.4 ,0.2],k=5)
print(predicted_class)

setosa


# 6. Conclusion
1. Numerical data should be scaled before applying KNN.
2. Loading the dataset into RAM is necessary for efficient processing.
3. K, the number of nearest neighbors, is a critical hyperparameter.
4. For classification tasks, it's advisable to choose an odd value for K.
5. A typical value for K is often the square root of the number of observations.