<a href="https://colab.research.google.com/github/Aman-sys-ui/Machine_Learning/blob/main/KNN/KNN_from_Scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Working of K-Nearest Neighbors (KNN)

KNN is one of the simplest yet surprisingly powerful algorithms in Machine Learning.  
Instead of learning an explicit function, it makes predictions by **looking at the closest data points in the training set**.

---

## Intuition
Imagine moving into a new neighborhood.  
If you want to know which sports you might like, you could simply ask your *k* nearest neighbors.  
- If most of them play football, chances are you will too.  
- If most of them play basketball, you’ll likely pick that.  

KNN works in the same way: **the majority class (for classification) or the average value (for regression) of the closest neighbors determines the prediction**.

---

## Steps of KNN
1. **Store all the training data** (KNN is a "lazy learner" – it doesn’t build a model in advance).  
2. **Choose a value of `k`** (number of neighbors to consider).  
3. **For a new data point:**
   - Compute its distance from all training points.  
   - Pick the *k* nearest neighbors.  
4. **Prediction:**
   - For **classification** → majority vote of neighbors.  
   - For **regression** → take the average of neighbors.  

---

## Distance Metric
The most common is **Euclidean distance**:

\[
d(x, y) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}
\]

Other choices include Manhattan distance, Minkowski distance, or Cosine similarity, depending on the problem.

---

## Key Points to Remember
- Small `k` → high variance, more sensitive to noise (risk of overfitting).  
- Large `k` → smoother boundaries, but risk of underfitting.  
- Feature scaling (e.g., StandardScaler/MinMaxScaler) is essential since distances are affected by different scales.  

---

Now that we understand how KNN works, let’s **implement it from scratch** step by step!


In [None]:
import numpy as np
from collections import Counter

class CustomKNN:
    """
    A simple K-Nearest Neighbors (KNN) implementation from scratch.
    Written for learning purposes.
    """

    def __init__(self, k=5):
        # number of neighbors to look at
        self.k = k
        self.X_train = None
        self.y_train = None

    def fit(self, X_train, y_train):
        """
        Just store the training data.
        KNN doesn’t actually "learn" a model,
        it just remembers the data.
        """
        self.X_train = X_train
        self.y_train = y_train

    def predict(self, X_test):
        """
        Predict labels for test points.
        For each test sample:
          1. calculate distances to all training points
          2. pick the k closest neighbors
          3. return the majority label
        """
        predictions = []

        for test_point in X_test:
            # Step 1: calculate distances from this test point to all training points
            distances = [self.calculate_distance(test_point, x) for x in self.X_train]

            # Step 2: sort distances and get indices of k nearest neighbors
            k_indices = np.argsort(distances)[:self.k]

            # Step 3: find the labels of these k neighbors
            k_neighbor_labels = [self.y_train[i] for i in k_indices]

            # Step 4: majority vote
            label = self.majority_vote(k_neighbor_labels)
            predictions.append(label)

        return np.array(predictions)

    def calculate_distance(self, point_A, point_B):
        """
        Euclidean distance between two points.
        """
        return np.linalg.norm(point_A - point_B)

    def majority_vote(self, neighbor_labels):
        """
        Return the most common label among the k neighbors.
        """
        votes = Counter(neighbor_labels)
        return votes.most_common(1)[0][0]
