# Working of K-Nearest Neighbors (KNN)

KNN is one of the simplest yet surprisingly powerful algorithms in Machine Learning.  
Instead of learning an explicit function, it makes predictions by **looking at the closest data points in the training set**.

---

## Intuition
Imagine moving into a new neighborhood.  
If you want to know which sports you might like, you could simply ask your *k* nearest neighbors.  
- If most of them play football, chances are you will too.  
- If most of them play basketball, you’ll likely pick that.  

KNN works in the same way: **the majority class (for classification) or the average value (for regression) of the closest neighbors determines the prediction**.

---

## Steps of KNN
1. **Store all the training data** (KNN is a "lazy learner" – it doesn’t build a model in advance).  
2. **Choose a value of `k`** (number of neighbors to consider).  
3. **For a new data point:**
   - Compute its distance from all training points.  
   - Pick the *k* nearest neighbors.  
4. **Prediction:**
   - For **classification** → majority vote of neighbors.  
   - For **regression** → take the average of neighbors.  

---
## Distance Metric

The most common is **Euclidean distance**:

$$
d(x, y) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}
$$

Other choices include **Manhattan distance**, **Minkowski distance**, or **Cosine similarity**, depending on the problem.

---

## Key Points to Remember
- Small `k` → high variance, more sensitive to noise (risk of overfitting).  
- Large `k` → smoother boundaries, but risk of underfitting.  
- Feature scaling (e.g., StandardScaler/MinMaxScaler) is essential since distances are affected by different scales.  

---

Now that we understand how KNN works, let’s **implement it from scratch** step by step!


In [None]:
import numpy as np
from collections import Counter

class CustomKNN:
    """
    A simple K-Nearest Neighbors (KNN) implementation from scratch.
    Supports both classification and regression.

    Example:
        knn = CustomKNN(k=3, task="classification", weights="uniform")
        knn.fit(X_train, y_train)
        preds = knn.predict(X_test)
    """

    def __init__(self, k=5, task="classification", weights="uniform", metric="euclidean"):
        """
        Parameters:
        - k: number of neighbors to consider
        - task: "classification" or "regression"
        - weights: "uniform" (all neighbors equal) or "distance" (closer neighbors count more)
        - metric: "euclidean" or "manhattan"
        """
        self.k = k
        self.task = task
        self.weights = weights
        self.metric = metric
        self.X_train = None
        self.y_train = None

    def fit(self, X_train, y_train):
        """
        Just store the training data.
        KNN is a lazy learner, so no training actually happens here.
        """
        self.X_train = X_train
        self.y_train = y_train

    def predict(self, X_test):
        """
        Predict labels/values for test points.
        """
        predictions = []

        for test_point in X_test:
            # Step 1: calculate distances from this test point to all training points
            distances = [self.calculate_distance(test_point, x) for x in self.X_train]

            # Step 2: get indices of k nearest neighbors
            k_indices = np.argsort(distances)[:self.k]

            # Step 3: find labels and distances of these neighbors
            k_neighbor_labels = [self.y_train[i] for i in k_indices]
            k_neighbor_distances = [distances[i] for i in k_indices]

            # Step 4: predict based on task
            if self.task == "classification":
                label = self.majority_vote(k_neighbor_labels, k_neighbor_distances)
                predictions.append(label)
            elif self.task == "regression":
                value = self.regression_average(k_neighbor_labels, k_neighbor_distances)
                predictions.append(value)
            else:
                raise ValueError("Task must be either 'classification' or 'regression'.")

        return np.array(predictions)

    def calculate_distance(self, point_A, point_B):
        """
        Distance between two points.
        """
        if self.metric == "euclidean":
            return np.linalg.norm(point_A - point_B)
        elif self.metric == "manhattan":
            return np.sum(np.abs(point_A - point_B))
        else:
            raise ValueError("Unsupported distance metric. Use 'euclidean' or 'manhattan'.")

    def majority_vote(self, neighbor_labels, neighbor_distances):
        """
        Return the most common label among neighbors.
        Supports uniform voting and distance-weighted voting.
        """
        if self.weights == "uniform":
            votes = Counter(neighbor_labels)
        elif self.weights == "distance":
            votes = {}
            for label, dist in zip(neighbor_labels, neighbor_distances):
                votes[label] = votes.get(label, 0) + 1 / (dist + 1e-5)  # avoid division by zero
        else:
            raise ValueError("Weights must be 'uniform' or 'distance'.")

        return max(votes, key=votes.get)

    def regression_average(self, neighbor_values, neighbor_distances):
        """
        Average of neighbors for regression.
        Supports uniform and distance-weighted averaging.
        """
        neighbor_values = np.array(neighbor_values)
        neighbor_distances = np.array(neighbor_distances)

        if self.weights == "uniform":
            return np.mean(neighbor_values)
        elif self.weights == "distance":
            weights = 1 / (neighbor_distances + 1e-5)
            return np.sum(weights * neighbor_values) / np.sum(weights)
        else:
            raise ValueError("Weights must be 'uniform' or 'distance'.")


---

🙏 **Thank you for taking the time to read through this notebook!**  
I hope you found it both **useful and enjoyable**.  

If you have any questions or suggestions, feel free to **connect with me**:  
[Email](mailto:amanak52141@gmail.com)  
[LinkedIn](https://www.linkedin.com/in/aman-kumar-a65133246/)  
[X (Twitter)](https://x.com/Amanncode)  

You can also check out my other projects here:  
[My GitHub](https://github.com/Aman-sys-ui/Machine_Learning)

---