# K-Nearest Neighbors (KNN)

In this notebook, we will explore the **K-Nearest Neighbors (KNN)** algorithm, a simple yet powerful supervised learning method used for both classification and regression.

KNN works by finding the *k* closest data points to a given input and assigning the majority label (for classification) or averaging the values (for regression).

## 1. Importing Libraries

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

## 2. Load Dataset
We’ll use the famous **Iris dataset** for this example.

In [None]:
iris = load_iris()
X = iris.data
y = iris.target

print("Feature names:", iris.feature_names)
print("Target names:", iris.target_names)
print("Shape of dataset:", X.shape)

## 3. Train-Test Split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print("Training set size:", X_train.shape)
print("Test set size:", X_test.shape)

## 4. Train KNN Classifier
We’ll try with **k=5 neighbors**.

In [None]:
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

## 5. Make Predictions

In [None]:
y_pred = knn.predict(X_test)
print("Predicted:", y_pred[:10])
print("Actual:", y_test[:10])

## 6. Model Evaluation

In [None]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, cmap="Blues", xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

## 7. Choosing the Right K
The choice of **k** is important. Too small → model is noisy. Too large → model becomes less flexible. A common way is to test accuracy for different values of k.

In [None]:
accuracies = []
for k in range(1, 21):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    acc = accuracy_score(y_test, knn.predict(X_test))
    accuracies.append(acc)

plt.plot(range(1, 21), accuracies, marker='o')
plt.xlabel('K Value')
plt.ylabel('Accuracy')
plt.title('K vs Accuracy')
plt.show()

## 8. Key Notes
- KNN is simple and effective for small datasets.
- Computationally expensive for large datasets.
- Sensitive to feature scaling → normalization often required.
- Choice of k affects bias-variance tradeoff.