# K-Nearest Neighbors (KNN)

KNN is one of the simplest and most intuitive supervised learning algorithms. It is often called a **"Lazy Learner"** because it doesn't learn a discriminative function from the training data but memorizes the training dataset instead.

**Core Philosophy:** *"Tell me who your neighbors are, and I'll tell you who you are."*



### 1. How It Works
KNN assumes that similar things exist in close proximity.

**The Algorithm Steps:**
1.  **Choose K:** Select the number of neighbors (e.g., $K=5$).
2.  **Calculate Distance:** Find the distance between the new data point and every point in the training set.
3.  **Find Neighbors:** Sort distances and pick the $K$ nearest points.
4.  **Vote (Classification):** Assign the class that is most frequent among the neighbors (Mode).
    * *Regression:* Assign the average value of the neighbors (Mean).

---

### 2. Distance Metrics (The Math)
How do we measure "closeness"?

* **Euclidean Distance (L2 Norm):** The straight line distance (standard).
    $$d(p, q) = \sqrt{\sum_{i=1}^{n} (q_i - p_i)^2}$$

* **Manhattan Distance (L1 Norm):** The distance traveling along grid lines (blocks). Good for high dimensions.
    $$d(p, q) = \sum_{i=1}^{n} |q_i - p_i|$$

* **Minkowski Distance:** Generalized form.
    * $p=1 \rightarrow$ Manhattan.
    * $p=2 \rightarrow$ Euclidean.

---

### 3. Choosing the Right 'K'
The value of $K$ controls the Bias-Variance Tradeoff.

* **Small K (e.g., $K=1$):**
    * **Overfitting (High Variance):** The decision boundary is very jagged. The model memorizes noise. If the nearest point is an outlier, the prediction is wrong.
* **Large K (e.g., $K=100$):**
    * **Underfitting (High Bias):** The decision boundary becomes overly smooth. It ignores local patterns and just predicts the majority class of the entire dataset.
* **Optimal K:** Usually found via **Cross-Validation** (Square root of $N$ is a common starting rule of thumb).
    * *Tip:* Choose an **Odd Number** for $K$ to avoid ties in binary classification.



---

### 4. Critical Requirement: Feature Scaling
KNN calculates distances based on absolute numbers.

* **Scenario:**
    * Feature A: Age (18–90)
    * Feature B: Salary (20,000–200,000)
* **Problem:** A difference of 50 in Salary is tiny, but a difference of 50 in Age is massive. However, the Euclidean formula treats "50" equally. The Salary feature will dominate the distance calculation.
* **Solution:** You **MUST** use **StandardScaler** or **MinMaxScaler** before running KNN.

---

### 5. Pros & Cons

| Advantages | Disadvantages |
| :--- | :--- |
| Simple to understand and implement. | **Slow Prediction:** It must calculate the distance to *every* training point for *every* prediction ($O(N)$ complexity). |
| No training period (Lazy Learner). | **Memory Intensive:** Must store the entire dataset. |
| Non-parametric (makes no assumptions about data distribution). | **Curse of Dimensionality:** Performance drops drastically as features increase (space becomes sparse). |

---

### 6. FAQ (Interview Questions)

**Q: Why is KNN called a "Lazy Learner"?**
**A:** Because it does not build a model during the training phase. It essentially does nothing until you ask it to make a prediction.

**Q: How does KNN handle outliers?**
**A:** Poorly, especially with small $K$. If $K=1$, a single outlier can change the decision boundary. Increasing $K$ smooths out the effect of outliers.

**Q: What is the "Curse of Dimensionality" in KNN?**
**A:** In high-dimensional space (many features), data points become very sparse. The distance between the "nearest" neighbor and the "farthest" neighbor becomes negligible, making the distance metric meaningless.

---



In [None]:

### Code Implementation


from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# Always scale data for KNN!
knn = make_pipeline(
    StandardScaler(),
    KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2)
)

knn.fit(X_train, y_train)