##### Hyperparameters

focusing on specific parameters will help us move from understanding the theory to tuning a high-performance model.

---

##### 1. `n_neighbors` (The "Scope" Parameter)

This is the most impactful hyperparameter. It determines how many neighbors contribute to the prediction.

- **Small $k$ (e.g., $1$ or $3$):** Captures local patterns but is highly sensitive to noise and outliers. This leads to **High Variance** (Overfitting).
    
- **Large $k$ (e.g., $50$ or $100$):** Smoothes out the prediction by looking at a wider area. This leads to **High Bias** (Underfitting), as the model may ignore local nuances.
    
- **Key Tip:** For classification, always use an **odd number** for $k$ to avoid "tie votes" between classes.
    

---

##### 2. `weights` (The "Influence" Parameter)

This decides whether all neighbors are equal or if "closer is better."

- **`uniform` (Default):** Every neighbor gets one equal vote.
    
- **`distance`:** Closer neighbors have more influence. The weight is usually calculated as $1/\text{distance}$.
    

Example Comparison:

Imagine you are predicting the price of a house. Your $k=3$ nearest neighbors are:

1. **House A:** 10 meters away, price $500k.
    
2. **House B:** 50 meters away, price $400k.
    
3. **House C:** 100 meters away, price $400k.
    

- **Uniform Weight:** $(500 + 400 + 400) / 3 = \mathbf{\$433k}$.
    
- **Distance Weight:** House A is much closer, so it might contribute 70% of the weight, while C contributes only 5%. The prediction will be much closer to **$490k**.
    

---

##### 3. `metric` and `p` (The "Distance" Parameters)

These define how the "closeness" between points is actually calculated. By default, Scikit-Learn uses the **Minkowski distance**, which is a generalized formula controlled by the parameter `p`.

|**p Value**|**Distance Metric**|**Best Use Case**|
|---|---|---|
|**`p=1`**|**Manhattan** ($L_1$ Norm)|When you have many features (high dimensionality) or data follows a grid-like structure (city blocks).|
|**`p=2`**|**Euclidean** ($L_2$ Norm)|The standard "as-the-crow-flies" distance. Best for physical/spatial data.|
|**`p > 2`**|**Minkowski**|Used rarely, but it places even higher "penalties" on larger differences between individual features.|

---

##### Crucial "Must-Know" Info

> [!IMPORTANT]
>
> 
> Feature Scaling is Mandatory: Because KNN relies on distance, features with larger scales (e.g., Salary in $100,000s) will completely drown out features with smaller scales (e.g., Age in 10s) unless you use StandardScaler or MinMaxScaler.


> [!TIP]
>
> 
> The Curse of Dimensionality: As you add more features (dimensions), the "distance" between points becomes less meaningful because everything starts looking far apart. Always try to keep your feature set lean when using KNN.

##### Types of Distances

In KNN, distance is the "similarity ruler." To understand the specific metrics, it is easiest to look at **Minkowski Distance** first, as it is the mathematical parent of the others.

##### 1. Minkowski Distance (The Umbrella)

Minkowski distance is a generalized metric. By changing a single parameter, **$p$**, you can transform it into other distance types. It is only considered a "true" metric when $p \ge 1$.

Formula:

$$D(x, y) = \left( \sum_{i=1}^{n} |x_i - y_i|^p \right)^{1/p}$$

- $n$ = number of dimensions (features).
    
- $x_i, y_i$ = coordinates of the two points in the $i^{th}$ dimension.
    
- $p$ = the Minkowski parameter (determines the distance type).
    

---

##### 2. Manhattan Distance ($p=1$)

Also known as **Taxicab distance** or **$L_1$ Norm**, it calculates distance as if you were traveling along a grid (like the streets of Manhattan). You can only move horizontally or vertically; no diagonals allowed.

Formula:

$$D(x, y) = \sum_{i=1}^{n} |x_i - y_i|$$

Example:

Point A: $(1, 2)$ | Point B: $(4, 6)$

- Difference in $x$: $|1 - 4| = 3$
    
- Difference in $y$: $|2 - 6| = 4$
    
- **Total Distance:** $3 + 4 = \mathbf{7}$
    

---

##### 3. Euclidean Distance ($p=2$)

This is the most common metric, often called the **"as-the-crow-flies"** distance. It uses the Pythagorean theorem to find the direct straight-line distance between two points.

Formula:

$$D(x, y) = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2}$$

Example:

Using the same points A: $(1, 2)$ and B: $(4, 6)$

- Square of $x$ difference: $(1 - 4)^2 = 9$
    
- Square of $y$ difference: $(2 - 6)^2 = 16$
    
- Sum: $9 + 16 = 25$
    
- **Total Distance:** $\sqrt{25} = \mathbf{5}$
    

---

##### Comparison Summary

|**Metric**|**Minkowski p**|**Logic**|**Best For...**|
|---|---|---|---|
|**Manhattan**|$1$|Sum of absolute differences|High-dimensional data or discrete/grid-based features.|
|**Euclidean**|$2$|Straight line (Pythagorean)|Physical distance or when features are dense and continuous.|
|**Chebyshev**|$\infty$|Maximum difference|Scenarios where only the single largest attribute difference matters.|



> **Why it matters:** In KNN, if you use Euclidean distance on high-dimensional data, the "distance" can become distorted (all points start looking equally far away). Switching to Manhattan ($p=1$) often yields better results in those cases.

---