The k-Nearest Neighbors (k-NN) algorithm is a simple and intuitive machine learning algorithm used for classification and regression tasks. The basic idea behind k-NN is to classify or predict a new data point based on the majority class or average value of its k-nearest neighbors in the feature space. Here's a numeric example to illustrate the k-NN algorithm for classification:

## Example:

Let's consider a dataset with two features, X1 and X2, and a binary classification label (Class) indicating whether a point belongs to class 0 or class 1.

```
+-----------------+
| X1 | X2 | Class |
+-----------------+
| 2  | 3  |   0   |
| 5  | 4  |   1   |
| 9  | 6  |   1   |
| 4  | 2  |   0   |
| 7  | 5  |   1   |
+-----------------+
```

## New Point:

Now, suppose we have a new data point with X1 = 6 and X2 = 4, and we want to determine its class.

# Steps:

1. Choose k:

Choose a value for k: Let's say k = 3.

2. Calculate Distances:

Calculate the Euclidean distance between the new point and all existing points in the dataset.

    Distance to point (2, 3): sqrt((6-2)^2 + (4-3)^2) = sqrt(16 + 1) = 4.12
    Distance to point (5, 4): sqrt((6-5)^2 + (4-4)^2) = sqrt(1) = 1
    Distance to point (9, 6): sqrt((6-9)^2 + (4-6)^2) = sqrt(9 + 4) = 5
    Distance to point (4, 2): sqrt((6-4)^2 + (4-2)^2) = sqrt(4 + 4) = 2.83
    Distance to point (7, 5): sqrt((6-7)^2 + (4-5)^2) = sqrt(1 + 1) = 1.41


3. Find Nearest Neighbors:

For k = 3, find the three nearest neighbors based on the calculated distances.

    Nearest neighbors: (5, 4), (7, 5), (4, 2)

4. Majority Vote:

Determine the majority class among the k-nearest neighbors.

    Classes of the neighbors: 1, 1, 0

Since two out of three neighbors belong to class 1, the new point is classified as class 1.

So, using the k-NN algorithm with k = 3, the classification for the new point F is Class 1.

## Conclusion:

In this example, the K-NN algorithm predicts that the new point with X1 = 6 and X2 = 4 belongs to class 1 based on the majority class of its three nearest neighbors.

# Implemntation of k-NN Algorithm in Python

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.utils import resample

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Introduce missing values (setting 20% of the values to NaN)
import numpy as np
np.random.seed(42)
mask = np.random.rand(X.shape[0], X.shape[1]) < 0.2
X[mask] = np.nan

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Preprocess the data
# Step 1: Impute missing values
imputer = SimpleImputer(strategy="mean")
X_train_imputed = imputer.fit_transform(X_train)
X_test_imputed = imputer.transform(X_test)

# Step 2: Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_imputed)
X_test_scaled = scaler.transform(X_test_imputed)

# Step 3: Handle outliers (using robust scaler)
from sklearn.preprocessing import RobustScaler
robust_scaler = RobustScaler()
X_train_scaled_robust = robust_scaler.fit_transform(X_train_scaled)
X_test_scaled_robust = robust_scaler.transform(X_test_scaled)

# Step 4: Address class imbalance (upsample minority class)
X_train_balanced, y_train_balanced = \
resample(X_train_scaled_robust[y_train == 2], y_train[y_train == 2],
replace=True, n_samples=X_train_scaled_robust[y_train == 0].shape[0],
                                              random_state=42)

X_train_scaled_robust_balanced = \
np.vstack((X_train_scaled_robust[y_train != 2], X_train_balanced))
y_train_balanced = np.concatenate((y_train[y_train != 2], 
                                   y_train_balanced))

# Initialize k-NN classifier with k=3
knn = KNeighborsClassifier(n_neighbors=3)

# Fit the model to the scaled and balanced training data
knn.fit(X_train_scaled_robust_balanced, y_train_balanced)

# Make predictions on the scaled testing data
y_pred = knn.predict(X_test_scaled_robust)

# Evaluate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 0.73
