# k-Nearest Neighbors (k-NN)

The k-Nearest Neighbors (k-NN) algorithm is a simple, non-parametric, and lazy learning algorithm. It is used for both classification and regression tasks, but it is mostly used for classification problems.

In this notebook, we will cover the intuition behind k-NN, its advantages, disadvantages, and demonstrate its implementation in Python using the `scikit-learn` library.

---

## Table of Contents

1. [What is k-Nearest Neighbors?](#1-what-is-k-nearest-neighbors)
2. [How k-NN Works](#2-how-k-nn-works)
3. [Advantages and Disadvantages of k-NN](#3-advantages-and-disadvantages-of-k-nn)
4. [Use Cases of k-NN](#4-use-cases-of-k-nn)
5. [Implementing k-NN in Python](#5-implementing-k-nn-in-python)
6. [Evaluating the k-NN Model](#6-evaluating-the-k-nn-model)

---

## 1. What is k-Nearest Neighbors?

The k-Nearest Neighbors (k-NN) algorithm is one of the simplest and most intuitive algorithms in machine learning. It works by finding the "k" closest data points (neighbors) to a given input and assigning the majority class label (for classification) or averaging the values (for regression).

Key features:
- **Lazy Learning**: k-NN does not learn an explicit model; instead, it memorizes the training data.
- **Non-parametric**: It makes no assumptions about the underlying data distribution.
  
---

## 2. How k-NN Works

The basic steps of the k-NN algorithm are as follows:

1. Choose the number of neighbors (k).
2. Calculate the distance between the new data point and all other points in the dataset.
3. Identify the k closest points (neighbors) to the new data point.
4. For classification, assign the label that is most common among these neighbors.
5. For regression, compute the average of the labels (values) of the k closest points.

Distance metrics used:
- **Euclidean Distance** (most common)
- Manhattan Distance
- Minkowski Distance

Choosing the right value for **k** is critical. Small values for **k** can make the model sensitive to noise, while large values may lead to over-smoothing.

---


## 3. Advantages and Disadvantages of k-NN

### Advantages:

- **Simplicity**: k-NN is easy to understand and implement.
- **No Training Phase**: As a lazy learner, it doesn’t require training, which can be an advantage for real-time systems.
- **Flexibility**: It can be used for both classification and regression problems.
- **Adaptability**: It can handle multi-class classification problems.

### Disadvantages:

- **Computational Complexity**: As the dataset grows, k-NN becomes slow, especially for large datasets.
- **Sensitive to Outliers**: k-NN can be influenced by noisy data points.
- **Feature Scaling Required**: Since k-NN relies on distance metrics, feature scaling (standardization or normalization) is necessary.
- **Choice of k**: Finding the optimal k can be challenging and may require cross-validation.

---


## 4. Use Cases of k-NN

k-NN can be applied in the following scenarios:

- **Image Recognition**: k-NN can be used to classify images based on similarities in pixel intensity values.
- **Recommendation Systems**: It helps recommend items to users by finding similar users based on their preferences.
- **Anomaly Detection**: k-NN can identify outliers in a dataset by analyzing the distance to the nearest neighbors.
- **Healthcare**: It can be used to predict disease outcomes based on patient history and data from other patients with similar symptoms.

---

## 5. Implementing k-NN in Python

Below is a Python implementation of the k-NN algorithm using `scikit-learn`. We will use the famous **Iris** dataset for this example.

In [7]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [8]:
iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [9]:
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)

## 6. Evaluating the k-NN Model

In [10]:
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of k-NN model: {accuracy:.2f}')

conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)

class_report = classification_report(y_test, y_pred)
print("Classification Report:\n", class_report)

Accuracy of k-NN model: 1.00
Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

