# W7 : K-Nearest Neighbours(KNN) for Regression
KNN is a non-parameteric and instance based learning algorithm unlike parameteric models like Naive Bays does't assume any distribution on the data instead it predicts based on similarity between examples.

It estimates the target value for a new input $x_{\text{new}}$ as the **average of its nearest neighbors** in the feature space.

### Problem Setup
$$
\mathcal{D} = {(x_i, y_i)}_{i=1}^n, \quad x_i \in \mathbb{R}^m, ; y_i \in \mathbb{R}
$$

For a new input $x_{\text{new}}$, we wish to estimate:

$$
\hat{y}(x_{\text{new}}) \approx f(x_{\text{new}}) = \mathbb{E} [Y | X] = x_{\text{new}}$$

However, since we do not assume any parametric form of ( f ), we approximate it **empirically** using local averaging.


### Distance Metric

To find neighbors, we compute distances between the new example and all training examples

$$
d(x_i, x_{\text{new}}) = \sqrt{\sum_{j=1}^m (x_{ij} - x_{\text{new},j})^2}
$$

Then, we sort all points by increasing distance and select the indices of the (k) nearest ones

$$
\text{KNN}(x_{\text{new}}) = {i_1, i_2, \dots, i_k} \text{ where } d(x_{i_1}, x_{\text{new}}) \le \dots \le d(x_{i_k}, x_{\text{new}})
$$

### Prediction Rule (Local Averaging)

The predicted value is the **mean of target values** of these (k) nearest neighbors:

$$
\hat{y}(x_{\text{new}}) = \frac{1}{k} \sum_{i \in \text{KNN}(x_{\text{new}})} y_i
$$

This can also be seen as a **weighted average** if we use distance-based weights:

$$
\hat{y}(x_{\text{new}}) = \frac{\sum_{i=1}^n w_i(x_{\text{new}}) y_i}{\sum_{i=1}^n w_i(x_{\text{new}})}, \quad
w_i(x_{\text{new}}) =
\begin{cases}
1, & \text{if } i \in \text{KNN}(x_{\text{new}}) \
0, & \text{otherwise}
\end{cases}
$$

> Weighted KNN can use $w_i = \frac{1}{d(x_i, x_{\text{new}}) + \epsilon}$ to give closer points more importance.


### Bias–Variance Tradeoff
* Small (k): low bias, high variance → overfits to local noise
* Large (k): high bias, low variance → overly smooths the curve

Thus, (k) controls the **locality** of the regression.

### Evaluation Metric — Mean Squared Error (MSE)

The **loss function** for regression is typically Mean Squared Error (MSE):

$$
\text{MSE} = \frac{1}{n_{\text{test}}} \sum_{i=1}^{n_{\text{test}}} (y_i - \hat{y}_i)^2
$$

or equivalently, the **Root Mean Squared Error (RMSE)**

$$
\text{RMSE} = \sqrt{\frac{1}{n_{\text{test}}} \sum_{i=1}^{n_{\text{test}}} (y_i - \hat{y}_i)^2}
$$

### Summary
| Concept               | Mathematical Expression                                                  | Python Code Equivalent                                                    |
| --------------------- | ------------------------------------------------------------------------ | ------------------------------------------------------------------------- |
| Distance              | $d(x_i, x_{\text{new}}) = \sqrt{\sum_j (x_{ij} - x_{\text{new},j})^2}$ | `distance_vector = self._distance_metric(self._X, new_example)`           |
| Nearest Neighbors     | $\text{KNN}(x_{\text{new}})$ = k smallest distances                      | `knn_indices = np.argpartition(distance_vector, self._k)[:self._k]`       |
| Regression Prediction | $\hat{y} = \frac{1}{k}\sum_{i\in KNN} y_i$                               | `label = knn.mean()`                                                      |
| MSE Loss              | $\sum (y - \hat{y})^2$                                                   | `np.sum(np.power(y_test - y_test_predicted, 2))`                          |
| RMSE                  | $\sqrt{\frac{1}{n}\sum (y - \hat{y})^2}$                                 | `np.sqrt((error_vector.T @ error_vector)/ error_vector.ravel().shape[0])` |


how increasing (k) affects model smoothness:

* (k=1): exact local fit (overfitting)
* (k=16): smooth average (underfitting)
Thus, How to identify the (k) that minimizes prediction error — giving the **best bias-variance tradeoff**.  Here is the $E(k) = \sum_i (y_i - \hat{y}_i(k))^2$ error agaist k.

| Aspect          | Description                    |
| --------------- | ------------------------------ |
| Model Type      | Non-parametric, Instance-based |
| Assumption      | Locally smooth function (f(x)) |
| Prediction      | Mean of k nearest neighbors    |
| Hyperparameter  | Number of neighbors (k)        |
| Distance Metric | Euclidean (commonly)           |
| Loss            | MSE or RMSE                    |
| Bias-Variance   | Controlled by (k)              |