# k-Nearest Neighbors (k-NN)

- k-NN is a simple, non-parametric learning algorithm used for classification and regression tasks.

- It does not have a training stage in the traditional sense — there is no model fitting or weight learning and it does not assume a specific mathematical form for the data, instead, it stores all the training examples and makes predictions by looking at nearby points.

- To classify or predict a new data point:
    - Find the k nearest (most similar) points in the training set.
    - Use those k points to make a prediction (either by *majority voting* for **classification** or by *averaging* for **regression**).

In [None]:
import numpy as np
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor

In [None]:
def knn(X_train, y_train, X_test, k=3, task='classification'):
    """ Manual implementation of k-NN for classification """
    y_pred = []
    for test_point in X_test:
        distances = [np.linalg.norm(test_point - x) for x in X_train]  # Euclidean distance
        k_nearest_indices = np.argsort(distances)[:k]
        k_nearest_labels = [y_train[i] for i in k_nearest_indices]

        if task == 'classification':
            predicted_label = max(set(k_nearest_labels), key=k_nearest_labels.count)  # Majority voting
        else:
            predicted_label = np.mean(k_nearest_labels)  # Averaging for regression
        
        y_pred.append(predicted_label)
    
    return np.array(y_pred)

### 1) Classification

**Dataset:** Students

| Study Hours *x* | Pass (1) / Fail (0) *y* |
|----------------|-----------------------|
| 1              | 0 (Fail)              |
| 2              | 0 (Fail)              |
| 3              | 1 (Pass)              |
| 4              | 1 (Pass)              |
| 5              | 1 (Pass)              |

- We want to predict whether a student who studied for **3.2 hours** will pass.

In [109]:
#-- Sample Data for Classification
X_train = np.array([[1], [2], [3], [4], [5]])
y_train = np.array([0, 0, 1, 1, 1])
X_test = np.array([[3.2]])

In [110]:
# Manual k-NN Classification
manual_pred = knn(X_train, y_train, X_test, k=3, task='classification')
print("Manual k-NN Classification Prediction:", manual_pred)

Manual k-NN Classification Prediction: [1]


In [111]:
# Sklearn k-NN Classification
knn_clf = KNeighborsClassifier(n_neighbors=3)
knn_clf.fit(X_train, y_train)
sklearn_pred = knn_clf.predict(X_test)
print("Sklearn k-NN Classification Prediction:", sklearn_pred)

Sklearn k-NN Classification Prediction: [1]


### 2) Regression

**Dataset:** Houses

| Square Feet (*x*) | House Price ($K) (*y*) |
|------------------|---------------------|
| 1000            | 200                 |
| 1200            | 220                 |
| 1300            | 250                 |
| 1500            | 275                 |
| 1600            | 300                 |

- We want to predict the price of a **1400 sq. ft. house**.

In [113]:
#-- Sample Data for Regression
X_train_reg = np.array([[1000], [1200], [1300], [1500], [1600]])
y_train_reg = np.array([200, 220, 250, 275, 300])
X_test_reg = np.array([[1400]])

In [114]:
# Manual k-NN Regression
manual_pred_reg = knn(X_train_reg, y_train_reg, X_test_reg, k=3, task='regression')
print("Manual k-NN Regression Prediction:", manual_pred_reg)

Manual k-NN Regression Prediction: [248.33333333]


In [115]:
# Sklearn k-NN Regression
knn_reg = KNeighborsRegressor(n_neighbors=3)
knn_reg.fit(X_train_reg, y_train_reg)
sklearn_pred_reg = knn_reg.predict(X_test_reg)
print("Sklearn k-NN Regression Prediction:", sklearn_pred_reg)

Sklearn k-NN Regression Prediction: [248.33333333]


In [None]:
#-- Curse of Dimensionality Example
np.random.seed(42)
dimensions = [1, 5, 10, 50, 100]  # increasing feature dimensions
for d in dimensions:
    X_rand = np.random.rand(100, d)
    distances = [np.linalg.norm(X_rand[i] - X_rand[j]) for i in range(99) for j in range(i+1, 100)]
    print(f"Dim: {d}, Avg Distance: {np.mean(distances):.4f}, Std: {np.std(distances):.4f}")

Dim: 1, Avg Distance: 0.3445, Std: 0.2416
Dim: 5, Avg Distance: 0.9092, Std: 0.2543
Dim: 10, Avg Distance: 1.2774, Std: 0.2475
Dim: 50, Avg Distance: 2.8744, Std: 0.2399
Dim: 100, Avg Distance: 4.0685, Std: 0.2478
