<center><h1 style="color:green">K-Nearest Neighbors (KNN) Algorithm</center>

**Key Characteristics:**
- **Non-parametric:** No assumptions about data distribution.
- **Flexible:** Handles both numerical and categorical data.
- **Simplicity:** Easy to implement with minimal hyperparameters.
- **Versatility:** Suitable for both classification (majority vote) and regression (average or weighted average).

**Mathematical Basis:**

**Distance Metrics:**
KNN uses distance metrics to measure the similarity between data points. Common metrics include:

**Euclidean Distance:**
$$\small d(x, y) = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2}$$

**Manhattan Distance:**
$$\small d(x, y) = \sum_{i=1}^{n} |x_i - y_i|$$

**Minkowski Distance:**
$$\small d(x, y) = \left( \sum_{i=1}^{n} |x_i - y_i|^p \right)^{\frac{1}{p}}$$

**Special cases:**
- When $p=2$, it is equivalent to Euclidean distance.
- When $p=1$, it is equivalent to Manhattan distance.

**Hamming Distance (for categorical data):**
$$\small d(x, y) = \sum_{i=1}^{n} \text{1 if } x_i \neq y_i \text{ else 0}$$

**Majority Voting (Classification):**
For classification tasks, the predicted class $\hat{y}$ for a query point is determined by the majority class among its $k$-nearest neighbors:
$$\small \hat{y} = \text{mode}(y_1, y_2, \dots, y_k)$$

**Averaging (Regression):**
For regression tasks, the predicted value $\hat{y}$ for a query point is the average value of its $k$-nearest neighbors:
$$\small \hat{y} = \frac{1}{k} \sum_{i=1}^{k} y_i$$

**Weighted KNN:**
Neighbors closer to the query point can be given more weight to reduce the effect of outliers:
$$\small \hat{y} = \frac{\sum_{i=1}^{k} w_i y_i}{\sum_{i=1}^{k} w_i}, \quad \text{where } w_i = \frac{1}{d(x, x_i) + \epsilon}$$
Here, $\epsilon$ is a small value to prevent division by zero.

**How KNN Works:**

**Choosing the Value of $k$:**
- Larger $k$: Reduces noise sensitivity, but loses detail.
- Odd $k$: Prevents ties in classification.
- Cross-validation can help determine the optimal $k$.

**Distance Calculation:**
Compute the distance between the test data point and each training data point using one of the above metrics.

**Find Nearest Neighbors:**
Identify the $k$-closest points based on the smallest distances.

**Prediction:**
- **Classification:** Perform majority voting to assign a class.
- **Regression:** Compute the average (or weighted average) of the $k$-nearest neighbors’ values.

**Advantages:**
- Easy to implement with minimal computation during training.
- Adaptive to new data, as it considers all training examples for predictions.

**Disadvantages:**
- Computationally expensive for large datasets, as it requires calculating distances for every prediction.
- Suffers from the **curse of dimensionality**, making it less effective in high-dimensional spaces.
- Sensitive to noisy and imbalanced data without preprocessing.


<img src="2.png">

<img src="4.jpg" height=300>

<img src="3.png">  <img src="5.png">