# K-Nearest Neighbors (KNN)

The **K-Nearest Neighbors (KNN)** algorithm is a simple, yet powerful supervised learning algorithm used for both **classification** and **regression** tasks. It operates based on the principle of proximity: the prediction for a new input is made based on the outputs of the *k* closest data points in the training set.

## How KNN Works

Given a test sample, the algorithm follows these steps:

1. **Compute the distance** between the test sample and all training samples.
2. **Select the k-nearest neighbors** (based on distance).
3. **Aggregate the output** values from the nearest neighbors:
   - For **classification**, predict the **most frequent class** (majority vote).
   - For **regression**, predict the **mean** of the neighbor outputs.

## Distance Metric

KNN typically uses **Euclidean distance**:

$$
d(\mathbf{x}, \mathbf{x}') = \sqrt{\sum_{i=1}^{n}(x_i - x'_i)^2}
$$

Where:
- $\mathbf{x}$ is the new (test) point,
- $\mathbf{x}$ is a point from the training data,
- `n` is the number of features.

Other distance metrics such as Manhattan or Minkowski can also be used.

## Advantages
- Simple and intuitive.
- No training phase (lazy learning).
- Naturally handles multi-class classification.

## Disadvantages
- Slow inference for large datasets (requires distance computation to all training samples).
- Performance is sensitive to the choice of \( k \) and the distance metric.
- Doesn't perform well on high-dimensional data (curse of dimensionality).

## Import libraries

In [None]:
import numpy as np
from scipy.stats import mode

In [None]:
class KNN:
    def __init__(self, n_neighbors=3, kind='classification'):
        """
        Initializes the KNN model.

        Parameters:
        - n_neighbors (int): Number of nearest neighbors to use.
        - kind (str): Type of prediction task, either 'classification' or 'regression'.
        """
        self.data = None
        self.y = None
        self.labels = None
        self.n_neighbors = n_neighbors
        self.kind = kind

    def fit(self, X: np.ndarray, y: np.ndarray):
        """
        Stores the training data and target values.

        Parameters:
        - X (np.ndarray): Training feature matrix of shape (n_samples, n_features).
        - y (np.ndarray): Target values or class labels of shape (n_samples,).
        """
        self.data = X
        self.y = y
    
    def predict(self, X: np.ndarray):
        """
        Predicts target values for the given input samples.

        Parameters:
        - X (np.ndarray): Test data of shape (n_samples_test, n_features).

        Returns:
        - np.ndarray: Predicted class labels or regression values.
        """
        diff = np.linalg.norm(X[:, np.newaxis] - self.data, axis=2)
        k_nearest = np.argpartition(diff, self.n_neighbors, axis=1)[:, :self.n_neighbors]
        self.labels = self.y[k_nearest]
        if self.kind == 'classification':
            return mode(self.labels, axis=1).mode
        # Handling regression case
        return np.mean(self.labels, axis=1)
