# W7 : K-Nearest Neighbours(KNN) for Classification
KNN is a non-parameteric and instance based learning algorithm unlike parameteric models like Naive Bays does't assume any distribution on the data instead it predicts based on similarity between examples.

### Problem Setup
Let the training dataset be
$$
\mathcal{D} = {(x_i, y_i)}_{i=1}^n, \quad x_i \in \mathbb{R}^m, \ y_i \in \{1, 2, \dots, K\}
$$
Our aim for given a **new example** $x_{new}$, and we want to predict its class label $\hat{y}$.


### Distance Metric

KNN uses a **distance metric** to measure similarity between the test sample $x_{new}$ and each training sample $x_i$.
The most common choice is **Euclidean distance**
$$
d(x_i, x_{new}) = \sqrt{\sum_{j=1}^m (x_{ij} - x_{new,j})^2}
$$

It computes distances between all training samples and the new example.

### K-Nearest Neighbors(KNN)
After computing distances, we select the **k closest samples** (the nearest neighbors):

$$
\text{KNN}(x_{new}) = { i_1, i_2, \dots, i_k } \quad \text{such that} \quad d(x_{i_1}, x_{new}) \le d(x_{i_2}, x_{new}) \le \dots
$$

### Majority Voting (Classification)

For **classification**, the predicted label is the **most frequent** class among the (k) neighbors:

$$
\hat{y} = \underset{c \in \mathcal{C}}{\arg\max} \sum_{i \in \text{KNN}(x_{new})} \mathbb{I}(y_i = c)
$$

where $\mathbb{I}$ is the indicator function.


### Evaluation Metric

* **Classification:**
  Misclassification rate (accuracy complement):

$$
\text{Accuracy} = \frac{1}{n_{test}} \sum_{i=1}^{n_{test}} \mathbb{I}(\hat{y}_i = y_i)
$$

### Summary Table

| Concept                 | Formula                                                                 | Code                        |
| ----------------------- | ----------------------------------------------------------------------- | --------------------------- |
| Distance                | $d(x_i, x_{new}) = \sqrt{\sum_j (x_{ij} - x_{new,j})^2}$                | `distance_metric`           |
| K Nearest Neighbors     | $\text{KNN}(x_{new}) = { i_1, \dots, i_k}$                              | `np.argpartition()`         |
| Classification Rule     | $\hat{y} = \arg\max_c \sum_{i \in KNN} \mathbb{I}(y_i = c)$             | `stats.mode(knn)[0]`        |
| Regression Rule         | $\hat{y} = \frac{1}{k} \sum_{i \in KNN} y_i$                            | `knn.mean()`                |
| Classification Accuracy | $\text{Accuracy} = \frac{1}{n_{test}} \sum \mathbb{I}(\hat{y}_i = y_i)$ | `np.mean(y_test == y_pred)` |
| RMSE (Regression)       | $\sqrt{\frac{1}{n_{test}} \sum (\hat{y}_i - y_i)^2}$                   | `error_vector`              |


* **Choice of (k):**
  * Small (k): model becomes sensitive to noise (high variance).
  * Large (k): model becomes too smooth (high bias).

* **Normalization:** Always scale features before applying KNN, as distance is sensitive to feature scales.

* **Complexity:**
  Prediction is (O(n \times m)) per test point (can be optimized via KD-trees or Ball trees).