# **KNN Regressor Model Theory**


## Theory
K-Nearest Neighbors (KNN) Regressor is a simple, non-parametric algorithm that makes predictions based on the average of the target values of the nearest neighbors in the feature space. For a given input point, the algorithm identifies the `K` closest data points in the training set and predicts the target variable as the mean (or weighted mean) of their corresponding target values.

The model function for KNN Regressor is:

$$ f(x) = \frac{1}{K} \sum_{i=1}^{K} y_i $$

Where:
- $f(x)$ is the predicted output.
- $K$ is the number of nearest neighbors considered for the prediction.
- $y_i$ is the target value of the $i$-th nearest neighbor.

The main idea behind KNN is that similar data points (in terms of feature similarity) are likely to have similar target values.

## Model Training

### Forward Pass

The forward pass in KNN is simple: given a new input point $x$, the algorithm calculates the distances between $x$ and all points in the training set. The K closest training points are selected, and their target values are averaged (or weighted, if using weighted KNN) to produce the prediction.

### Distance Metric

The performance of KNN heavily depends on the choice of distance metric used to measure similarity between points. Common distance metrics include:
- **Euclidean Distance**: $ d(x_1, x_2) = \sqrt{\sum_{i=1}^{n} (x_{1i} - x_{2i})^2} $
- **Manhattan Distance**: $ d(x_1, x_2) = \sum_{i=1}^{n} |x_{1i} - x_{2i}| $
- **Minkowski Distance**: A generalization of both Euclidean and Manhattan distances.

### Cost Function

KNN does not have an explicit cost function or model parameters like other regression algorithms (e.g., linear regression or SVM). Instead, it makes predictions by evaluating the similarity between data points. The model relies on the concept of neighbors to calculate predictions, and the error is typically measured using a performance metric like Mean Squared Error (MSE) after making predictions.

The prediction error is given by:

$$ \text{MSE} = \frac{1}{m} \sum_{i=1}^{m} (f(x^{(i)}) - y^{(i)})^2 $$

Where:
- $m$ is the number of test examples.
- $f(x^{(i)})$ is the predicted output for the $i$-th test example.
- $y^{(i)}$ is the actual target value for the $i$-th test example.

## Training Process

The training process in KNN is relatively simple because it does not involve explicit training or model fitting. Instead, the training set is stored, and predictions are made based on the proximity of data points at prediction time. The steps involved are:

1. **Store the training set**: No actual model training occurs. All training data is stored and used at prediction time.
2. **Determine K and the distance metric**: Select the number of nearest neighbors $K$ and the distance metric to be used (e.g., Euclidean or Manhattan distance).
3. **For each test point**:
   - Compute the distances from the test point to all points in the training set.
   - Sort the distances and select the $K$ nearest neighbors.
   - Calculate the average of the target values of the $K$ nearest neighbors to make the prediction.

KNN is sensitive to the choice of $K$, as well as to the scale of the features. Feature scaling (e.g., normalization or standardization) is often necessary to prevent features with larger scales from dominating the distance calculation.

## Hyperparameters

The key hyperparameters of KNN Regressor are:
- **K (number of neighbors)**: Controls how many nearest neighbors to consider when making a prediction. A small $K$ can lead to overfitting, while a large $K$ can lead to underfitting.
- **Distance Metric**: The method used to compute distances between points. Common options include Euclidean, Manhattan, or Minkowski.
- **Weights**: Determines how the neighbors contribute to the prediction. Options include:
  - **Uniform**: All neighbors contribute equally.
  - **Distance**: Neighbors closer to the query point have a higher weight.

By selecting the right hyperparameters, KNN can provide accurate regression predictions. However, KNN is computationally expensive, especially with large datasets, because it requires calculating the distance from every point in the training set to the test point during prediction.


## **Model Evaluation**

### 1. Mean Squared Error (MSE)

**Formula:**
$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_{\text{true}_i} - y_{\text{pred}_i})^2
$$

**Description:**
- **Mean Squared Error (MSE)** is a widely used metric for evaluating the accuracy of regression models.
- It measures the average squared difference between the predicted values ($y_{\text{pred}}$) and the actual target values ($y_{\text{true}}$).
- The squared differences are averaged across all data points in the dataset.

**Interpretation:**
- A lower MSE indicates a better fit of the model to the data, as it means the model's predictions are closer to the actual values.
- MSE is sensitive to outliers because the squared differences magnify the impact of large errors.
- **Limitations:**
  - MSE can be hard to interpret because it is in squared units of the target variable.
  - It disproportionately penalizes larger errors due to the squaring process.

---

### 2. Root Mean Squared Error (RMSE)

**Formula:**
$$
\text{RMSE} = \sqrt{\text{MSE}}
$$

**Description:**
- **Root Mean Squared Error (RMSE)** is a variant of MSE that provides the square root of the average squared difference between predicted and actual values.
- It is often preferred because it is in the same unit as the target variable, making it more interpretable.

**Interpretation:**
- Like MSE, a lower RMSE indicates a better fit of the model to the data.
- RMSE is also sensitive to outliers due to the square root operation.
- **Advantages over MSE:**
  - RMSE provides a more intuitive interpretation since it is in the same scale as the target variable.
  - It can be more directly compared to the values of the actual data.

---

### 3. R-squared ($R^2$)

**Formula:**
$$
R^2 = 1 - \frac{\text{SSR}}{\text{SST}}
$$

**Description:**
- **R-squared ($R^2$)**, also known as the coefficient of determination, measures the proportion of the variance in the dependent variable ($y_{\text{true}}$) that is predictable from the independent variable(s) ($y_{\text{pred}}$) in a regression model.
- It ranges from 0 to 1, where 0 indicates that the model does not explain any variance, and 1 indicates a perfect fit.

**Interpretation:**
- A higher $R^2$ value suggests that the model explains a larger proportion of the variance in the target variable.
- However, $R^2$ does not provide information about the goodness of individual predictions or whether the model is overfitting or underfitting.
- **Limitations:**
  - $R^2$ can be misleading in cases of overfitting, especially with polynomial regression models. Even if $R^2$ is high, the model may not generalize well to unseen data.
  - It doesn’t penalize for adding irrelevant predictors, so adjusted $R^2$ is often preferred for models with multiple predictors.

---

### 4. Adjusted R-squared

**Formula:**
$$
\text{Adjusted } R^2 = 1 - \left(1 - R^2\right) \frac{n-1}{n-p-1}
$$
where \(n\) is the number of data points and \(p\) is the number of predictors.

**Description:**
- **Adjusted R-squared** adjusts the R-squared value to account for the number of predictors in the model, helping to prevent overfitting when adding more terms to the model.
- Unlike $R^2$, it can decrease if the additional predictors do not improve the model significantly.

**Interpretation:**
- A higher adjusted $R^2$ suggests that the model is not just overfitting, but has genuine explanatory power with the number of predictors taken into account.
- It is especially useful when comparing models with different numbers of predictors.

---

### 5. Mean Absolute Error (MAE)

**Formula:**
$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_{\text{true}_i} - y_{\text{pred}_i}|
$$

**Description:**
- **Mean Absolute Error (MAE)** measures the average of the absolute errors between the predicted and actual values.
- Unlike MSE and RMSE, MAE is not sensitive to outliers because it does not square the errors.

**Interpretation:**
- MAE provides a straightforward understanding of the average error magnitude.
- A lower MAE suggests better model accuracy, but it may not highlight the impact of large errors as much as MSE or RMSE.

## sklearn template [scikit-learn: KNeighborsRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html)

### class sklearn.neighbors.KNeighborsRegressor(n_neighbors=5, *, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)

| **Parameter**        | **Description**                                                                                          | **Default**  |
|---------------------|----------------------------------------------------------------------------------------------------------|--------------|
| `n_neighbors`       | Number of neighbors to use for queries                                                                    | `5`          |
| `weights`           | Weight function: 'uniform' (equal weights), 'distance' (inverse distance weights), or callable             | `'uniform'`  |
| `algorithm`         | Algorithm for computing nearest neighbors: 'auto', 'ball_tree', 'kd_tree', or 'brute'                     | `'auto'`     |
| `leaf_size`         | Leaf size for BallTree or KDTree. Affects construction speed and memory usage                            | `30`         |
| `p`                 | Power parameter for Minkowski metric (p=1 Manhattan, p=2 Euclidean)                                       | `2`          |
| `metric`            | Distance metric to use ('minkowski', 'manhattan', 'euclidean', etc.)                                      | `'minkowski'`|
| `metric_params`     | Additional parameters for the metric function                                                             | `None`       |
| `n_jobs`           | Number of parallel jobs for neighbor search. None=1, -1=all processors                                    | `None`       |

---

| **Attribute**           | **Description**                                                                                        |
|------------------------|--------------------------------------------------------------------------------------------------------|
| `effective_metric_`     | The distance metric used (may be synonym of specified metric)                                          |
| `effective_metric_params_` | Additional parameters for the effective metric                                                        |
| `n_features_in_`       | Number of features seen during fit                                                                     |
| `feature_names_in_`    | Names of features seen during fit (if X has feature names)                                             |
| `n_samples_fit_`       | Number of samples in the fitted data                                                                   |

---

| **Method**             | **Description**                                                                                        |
|------------------------|--------------------------------------------------------------------------------------------------------|
| `fit(X, y)`            | Fit the KNN regressor model                                                                           |
| `predict(X)`           | Predict regression target for input samples                                                            |
| `score(X, y)`          | Return R² score (coefficient of determination)                                                         |
| `kneighbors(X)`        | Find K-neighbors of a point                                                                           |
| `kneighbors_graph(X)`  | Compute the (weighted) graph of k-Neighbors                                                           |
| `get_params()`         | Get parameters of the regressor                                                                        |
| `set_params(**params)` | Set parameters of the regressor                                                                        |


# XXXXXXXX regression - Example

## Data loading

##  Data processing

## Plotting data

## Model definition

## Model evaulation