# 🤝 K-Nearest Neighbors (K-NN)

## 📘 Definition

**K-Nearest Neighbors (K-NN)** is a **supervised learning algorithm** used for both **classification** and **regression**.  
It predicts the label of a new data point based on the labels of the **'K' closest training samples** in the feature space.

The key idea is:
> "Similar data points are likely to have similar outcomes."

---

## 🔑 How It Works
1. Choose the number of neighbors **K**.
2. Compute the distance (usually **Euclidean**) from the new data point to all training points.
3. Select the **K nearest neighbors**.
4. For classification: take the **majority class** of the neighbors.  
   For regression: take the **average value** of the neighbors.

---

## ⚙️ Key Parameters
- **K**: Number of neighbors (odd numbers are common in classification).
- **Distance metric**: Euclidean, Manhattan, etc.
- **Weighting**: Uniform or distance-based weighting of neighbors.

---

## ✅ Pros
- Simple and intuitive
- No training phase (lazy learner)
- Works well with small, clean datasets

## ❌ Cons
- Computationally expensive for large datasets
- Sensitive to irrelevant features and data scaling
- Struggles in high-dimensional spaces (curse of dimensionality)

---

## 📌 Use Cases
- Handwriting recognition
- Recommendation systems
- Medical diagnosis


## ⚙️ How K-NN Works (with Formula)

**K-Nearest Neighbors (K-NN)** works based on measuring the **distance** between the query point and all points in the training dataset.

---

### 📏 Step 1: Compute Distance

Most commonly used distance metric:
### 🧮 Euclidean Distance Formula

Given two points:

- Query point: **Q = (q₁, q₂, ..., qₙ)**
- Data point: **P = (p₁, p₂, ..., pₙ)**

$[
\text{Distance}(Q, P) = \sqrt{(q₁ - p₁)^2 + (q₂ - p₂)^2 + \dots + (qₙ - pₙ)^2}
$]

You calculate this distance for **all** training points.

---

### 📚 Step 2: Select K Nearest Neighbors

- Sort all training samples by distance to the query point.
- Select the **K points** with the smallest distance.

---

### 🏷️ Step 3: Predict the Output

- **For Classification:**  
  Predict the **majority class** among the K neighbors.

- **For Regression:**  
  Predict the **average (mean)** value of the K neighbors:

$[
\hat{y} = \frac{1}{K} \sum_{i=1}^{K} y_i
$]

Where $( y_i $) is the target value of the i-th nearest neighbor.

---

### 🎯 Summary

- **Distance** is key to determining neighbors.
- **K** controls bias vs variance.
- Output depends on the type of task (classification or regression).

---


![image.png](attachment:image.png)

## 🕵️‍♂️ When to Choose K-Nearest Neighbors (K-NN)

You should consider using **K-NN** when:

### ✅ Suitable Conditions
- Your dataset is **small to medium-sized** (because K-NN is slow on large data).
- There’s **no need for model interpretability** or mathematical transparency.
- You want a **baseline model** that's easy to implement.
- Your data has **clear clusters or local structure**.
- You have **labeled data** for supervised learning tasks.

---

### 📈 K-NN is a good choice for:
- **Classification tasks** (e.g., image or text categorization)
- **Regression tasks** (predicting house prices, etc.)
- **Anomaly detection**
- Situations where the **decision boundary is irregular or non-linear**

---

### ❌ Avoid K-NN if:
- The dataset is **large** or **high-dimensional** (slow and inaccurate).
- You need **real-time predictions** (K-NN is lazy and slow at inference).
- Your features are not properly **scaled or normalized**.
- The dataset is **very noisy** — K-NN is sensitive to outliers.

---

### 💡 Tip:
Always **scale** your features (e.g., using `StandardScaler`) before using K-NN to ensure accurate distance calculations.
