## 📘 K-Nearest Neighbors (KNN) - Step-by-Step Guide

---

### 🔍 What is KNN?

**K-Nearest Neighbors (KNN)** is a simple, yet powerful **supervised machine learning algorithm** used for **classification** and **regression** problems.

It classifies a data point based on how its **neighbors are classified**.

---

### 🧠 How KNN Works - Step-by-Step

1. **Choose the number of neighbors (K)**

   * K is the number of closest data points used to make the prediction.

2. **Calculate the distance** between the test data and all training points.

3. **Sort the distances** and identify the **K nearest neighbors**.

4. **Vote for classes** (for classification) or **average the values** (for regression).

5. **Assign the class or value** to the test data.

---

## 📏 Distance Metrics in KNN

### 1️⃣ **Euclidean Distance** (default and most common)

$$
D = \sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2 + ... + (x_n - y_n)^2}
$$

* Measures the straight-line distance.
* Best for continuous variables.

### 2️⃣ **Manhattan Distance**

$$
D = |x_1 - y_1| + |x_2 - y_2| + ... + |x_n - y_n|
$$

* Measures distance in grid-like paths (L1 norm).
* Suitable when features are not correlated.

### 3️⃣ **Minkowski Distance**

$$
D = \left(\sum |x_i - y_i|^p \right)^{1/p}
$$

* Generalization of Euclidean and Manhattan.
* When **p = 1** → Manhattan, **p = 2** → Euclidean

### 4️⃣ **Hamming Distance**

* Used for categorical variables (like strings or binary).
* Counts the number of positions with differing values.

---

## 🧪 Choosing the Right K

* **Low K (e.g., K=1)**:

  * Highly sensitive to noise (overfitting).

* **High K**:

  * More generalized, but may underfit.

### ⚖️ Tip:

* Use **cross-validation** to choose the best value of K.

---

## 🔧 KNN in Python (Classification Example)

```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# KNN model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Predictions
y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
```

---

## 📈 Pros and Cons

### ✅ Pros:

* Simple and easy to implement
* No assumptions about data distribution
* Effective with small datasets

### ❌ Cons:

* Slow with large datasets (needs to compute distance for all points)
* Sensitive to irrelevant or scaled features
* Doesn’t work well with high-dimensional data (curse of dimensionality)

---

## 📊 When to Use KNN?

* You need a **baseline classifier**
* Data is **low-dimensional**
* Problem is **non-linear and you want flexibility**

---

## 🧠 Summary Table

| Feature          | Description                              |
| ---------------- | ---------------------------------------- |
| Algorithm Type   | Supervised (classification & regression) |
| Model Type       | Instance-based (lazy learner)            |
| Distance Metrics | Euclidean, Manhattan, Minkowski, Hamming |
| Best K           | Determined via cross-validation          |
| Good For         | Small datasets, non-linear relationships |

---


# Euclidean Distance