🧠 Explanation (for Notebook)
🔹 What is KNN?

K-Nearest Neighbors (KNN) is a simple supervised learning algorithm used for classification (and sometimes regression).
It predicts the label of a new data point by looking at the K closest samples in the training data.

⚙️ How it Works (Step-by-Step)

1️⃣ Store the data
Unlike other algorithms, KNN doesn’t “learn” during training.
It just stores the dataset in memory (this is why it’s called a lazy learner).

2️⃣ Measure distances
When we predict for a new sample, the algorithm calculates the Euclidean distance between the test point and all training points:

distance = √((x₁ - x₂)² + (y₁ - y₂)² + ... )


3️⃣ Find nearest neighbors
It then picks the K smallest distances — i.e., the closest points in the dataset.

4️⃣ Majority voting
Among those K neighbors, the algorithm checks which label appears most frequently and assigns that label to the new sample.

💡 Key Intuition

KNN assumes that similar data points exist close to each other in space.
So, if most of your nearby samples belong to class “A”, your new sample is probably “A” too.

🚀 Example Use Case

If you have labeled data (e.g., images of cats 🐱 and dogs 🐶),
KNN can classify a new image by checking which category its closest neighbors belong to.

✅ In short:
KNN doesn’t build a model — it relies purely on distance and similarity, making it one of the most intuitive ML algorithms to understand and implement.

In [1]:
import numpy as np
from collections import Counter

class KNN:
    def __init__(self, k=3):
        # Number of neighbors to consider
        self.k = k

    def fit(self, X, y):
        # Just store the training data — KNN is a lazy learner
        self.X_train = X
        self.y_train = y

    def predict(self, X):
        # Predict labels for all test samples
        predictions = [self._predict(x) for x in X]
        return np.array(predictions)

    def _predict(self, x):
        # Compute Euclidean distances between x and all training samples
        distances = [np.linalg.norm(x - x_train) for x_train in self.X_train]
        # Get the indices of the k nearest neighbors
        k_indices = np.argsort(distances)[:self.k]
        # Get the labels of those neighbors
        k_nearest_labels = [self.y_train[i] for i in k_indices]
        # Return the most common label (majority vote)
        most_common = Counter(k_nearest_labels).most_common(1)
        return most_common[0][0]


In [2]:
# --- Simple Test for KNN ---

# Sample dataset (X: features, y: labels)
X_train = np.array([
    [1, 2],
    [2, 3],
    [3, 4],
    [6, 7],
    [7, 8],
    [8, 9]
])
y_train = np.array([0, 0, 0, 1, 1, 1])  # 0 and 1 are the two classes

# Initialize and fit the model
model = KNN(k=3)
model.fit(X_train, y_train)

# Test points
X_test = np.array([
    [2, 3],  # closer to class 0
    [7, 7]   # closer to class 1
])

# Predict
predictions = model.predict(X_test)
print("Predictions:", predictions)


Predictions: [0 1]
