# **AWFERA : Machine Learning Coures**

# **Topic: K- Nearest Neighbors (K-NN)**

# **Understanding K-Nearest Neighbors (KNN)**

**K-Nearest Neighbors (KNN)** is a supervised machine learning algorithm used for both classification and regression tasks. It operates based on the concept of "neighbors," where the prediction for a new data point is made based on the majority class or the average value of its closest data points

# **Key Concepts in KNN**

# Distance Metrics

**Euclidean Distance:**

The shortest straight-line distance between two points.
Manhattan Distance: The sum of the absolute differences of the coordinates.

**Majority Voting Method:**

For classification, the class with the majority votes is chosen.
For regression, the predicted value is the average of the closest data points.

**Choosing the Value of K:**

The value of K is typically an odd number to avoid ties.
The choice of K affects the model’s performance:

**Small K values**: Lower bias but higher variance, causing potential overfitting.
**Large K values**: Higher bias but lower variance, leading to **underfitting.**

## **Bias and Variance:**

**Bias** : The error introduced by assuming a simplified model.

**Variance**: The model's sensitivity to fluctuations in the training data.
A balance between bias and variance is crucial for a model that performs well on both seen and unseen data.

# **Implementing KNN for Classification**

**Dataset Example:**
The Iris Dataset is often used for KNN classification.

**Steps:**

1. Load the dataset using sklearn.

2. Split the data into training and testing sets.

3. Fit the KNN model with a chosen number of neighbors.

4. Evaluate model performance using accuracy, classification reports, and confusion matrices

In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [1]:
from sklearn import neighbors, datasets

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# **About Dataset**
The Iris Dataset contains four features (length and width of sepals and petals) of 50 samples of three species of Iris (Iris setosa, Iris virginica and Iris versicolor)

In [3]:
# Load Iris dataset

iris = datasets.load_iris()

X = iris.data

y = iris.target

In [4]:

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [5]:
type(iris)

In [13]:
df = pd.DataFrame(data = iris.data, columns=iris.feature_names)

# Add the Target column
df['target'] = iris.target

#Display the first few rows of the DataFrame
print(df.head())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   target  
0       0  
1       0  
2       0  
3       0  
4       0  


In [14]:
from sklearn.model_selection import train_test_split
X = df.drop('target', axis=1)
y = df['target']

#Assuming X and Y are defined
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [15]:
# Initialize KNN model

model = neighbors.KNeighborsClassifier(n_neighbors=3)

 # Train the model

model.fit(X_train, y_train)

In [16]:
 from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Predictions on the test data
y_pred = model.predict(X_test)


# Calacution Evaluation matrics accuracy
accuracy = (accuracy_score(y_test, y_pred))
print(f" Model Accuracy: {accuracy:2f}")

#Classification Report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Confusion Matrix
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))



 Model Accuracy: 1.000000

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30


Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
