# Module 10: K-Nearest Neighbors

**Difficulty**: ⭐ Beginner  
**Estimated Time**: 60 minutes  
**Prerequisites**: [Module 02 - Data Preparation](02_data_preparation_train_test_split.ipynb), [Module 06 - Model Evaluation](06_model_evaluation_metrics.ipynb)

## Learning Objectives
By the end of this notebook, you will be able to:
1. Understand how the K-Nearest Neighbors (KNN) algorithm works
2. Apply different distance metrics (Euclidean, Manhattan, Minkowski)
3. Choose the optimal K value using cross-validation
4. Understand the critical importance of feature scaling for KNN
5. Compare weighted vs uniform neighbor voting
6. Recognize the curse of dimensionality and its impact on KNN
7. Know when to use KNN vs other algorithms

## 1. Introduction: The KNN Intuition

### The Core Idea

**"You are the average of your K closest neighbors"**

K-Nearest Neighbors is one of the simplest machine learning algorithms. The idea is beautifully intuitive:

- **Real-world analogy**: If you want to predict whether someone likes a movie, look at the preferences of people similar to them. If most similar people liked it, they probably will too!

### How KNN Works

**For Classification:**
1. Choose a value for K (number of neighbors)
2. Find the K closest training examples to your new data point
3. Take a majority vote among those K neighbors
4. Assign the most common class

**For Regression:**
1. Choose a value for K
2. Find the K closest training examples
3. Take the average of their target values
4. That average is your prediction

### The "Lazy Learner" Concept

KNN is called a **lazy learner** or **instance-based learner** because:
- **No training phase**: It simply stores the training data
- **All computation at prediction time**: When you want a prediction, it calculates distances to all training points
- **Memory-intensive**: Must keep entire training set in memory

This is different from algorithms like logistic regression that learn parameters during training.

## 2. Setup and Data Loading

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.metrics import mean_squared_error, r2_score
import warnings

# Configuration
warnings.filterwarnings('ignore')
np.random.seed(42)
%matplotlib inline

# Set plot style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print('✓ All libraries imported successfully!')
print(f'✓ Random seed set to 42 for reproducibility')

In [None]:
# Load the Iris dataset
# This classic dataset contains measurements of iris flowers from 3 different species
iris_df = pd.read_csv('data/sample/iris.csv')

print("Iris Dataset Shape:", iris_df.shape)
print("\nFirst few rows:")
print(iris_df.head())
print("\nSpecies distribution:")
print(iris_df['species'].value_counts())

## 3. Visualizing the KNN Concept

Let's visualize how KNN makes predictions using 2 features so we can plot it easily.

In [None]:
# Use only 2 features for visualization
# Petal length and petal width are the most discriminative features for iris species
X_visual = iris_df[['petal length (cm)', 'petal width (cm)']].values
y_visual = iris_df['species'].values

# Create a scatter plot
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
# Plot each species with a different color
for species in iris_df['species'].unique():
    mask = iris_df['species'] == species
    plt.scatter(
        iris_df.loc[mask, 'petal length (cm)'],
        iris_df.loc[mask, 'petal width (cm)'],
        label=species,
        alpha=0.7,
        s=100
    )

# Add a new point to classify
new_point = [4.5, 1.5]
plt.scatter(new_point[0], new_point[1], 
           color='red', marker='*', s=500, 
           label='New Point to Classify',
           edgecolors='black', linewidth=2)

plt.xlabel('Petal Length (cm)', fontsize=12)
plt.ylabel('Petal Width (cm)', fontsize=12)
plt.title('KNN Classification: Finding Nearest Neighbors', fontsize=14, fontweight='bold')
plt.legend(loc='upper left')
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
# Show decision boundaries
from matplotlib.colors import ListedColormap

# Create a mesh to plot decision boundaries
h = 0.02  # Step size in the mesh
x_min, x_max = X_visual[:, 0].min() - 0.5, X_visual[:, 0].max() + 0.5
y_min, y_max = X_visual[:, 1].min() - 0.5, X_visual[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Train KNN with K=5
knn_visual = KNeighborsClassifier(n_neighbors=5)
knn_visual.fit(X_visual, y_visual)

# Predict for all points in the mesh
Z = knn_visual.predict(np.c_[xx.ravel(), yy.ravel()])
Z = pd.factorize(Z)[0].reshape(xx.shape)

# Plot decision boundaries
plt.contourf(xx, yy, Z, alpha=0.3, cmap='viridis')

# Plot training points
for species in iris_df['species'].unique():
    mask = iris_df['species'] == species
    plt.scatter(
        iris_df.loc[mask, 'petal length (cm)'],
        iris_df.loc[mask, 'petal width (cm)'],
        label=species,
        alpha=0.7,
        s=100,
        edgecolors='black',
        linewidth=0.5
    )

plt.xlabel('Petal Length (cm)', fontsize=12)
plt.ylabel('Petal Width (cm)', fontsize=12)
plt.title('KNN Decision Boundaries (K=5)', fontsize=14, fontweight='bold')
plt.legend(loc='upper left')

plt.tight_layout()
plt.show()

print("\nInterpretation:")
print("- Left plot: Shows the data points and a new point (red star) to classify")
print("- Right plot: Shows decision boundaries - regions where KNN predicts each class")
print("- KNN creates non-linear, flexible boundaries based on nearby points")

## 4. Distance Metrics: How to Measure "Closeness"

KNN needs to calculate distance between points. Different distance metrics can give different results!

### Common Distance Metrics

**1. Euclidean Distance (default)** - "As the crow flies"
- Formula: $\sqrt{(x_1-x_2)^2 + (y_1-y_2)^2 + ...}$
- Straight-line distance
- Most commonly used

**2. Manhattan Distance** - "City block distance"
- Formula: $|x_1-x_2| + |y_1-y_2| + ...$
- Distance along axis-aligned paths (like navigating city streets)
- Useful when features are not continuous

**3. Minkowski Distance** - Generalization
- Formula: $(\sum |x_i-y_i|^p)^{1/p}$
- p=1: Manhattan distance
- p=2: Euclidean distance
- p>2: Increasingly emphasizes larger differences

In [None]:
# Prepare data for all features
X = iris_df.drop('species', axis=1).values
y = iris_df['species'].values

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# IMPORTANT: Scale features for KNN (we'll explain why soon)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Training set: {X_train_scaled.shape}")
print(f"Test set: {X_test_scaled.shape}")
print("\n✓ Data prepared and scaled!")

In [None]:
# Compare different distance metrics
distance_metrics = {
    'Euclidean (p=2)': {'p': 2, 'metric': 'minkowski'},
    'Manhattan (p=1)': {'p': 1, 'metric': 'minkowski'},
    'Minkowski (p=3)': {'p': 3, 'metric': 'minkowski'}
}

results = {}

print("Comparing Distance Metrics (K=5):\n")
print("-" * 50)

for name, params in distance_metrics.items():
    # Train KNN with this distance metric
    knn = KNeighborsClassifier(n_neighbors=5, **params)
    knn.fit(X_train_scaled, y_train)
    
    # Evaluate
    train_score = knn.score(X_train_scaled, y_train)
    test_score = knn.score(X_test_scaled, y_test)
    
    results[name] = {'train': train_score, 'test': test_score}
    
    print(f"{name:20} | Train: {train_score:.3f} | Test: {test_score:.3f}")

print("-" * 50)
print("\nInsight: Different distance metrics can lead to different performance.")
print("Euclidean distance is usually a good default choice.")

## 5. Choosing the Optimal K Value

**K is the most important hyperparameter in KNN.** How do we choose it?

### Effect of K on Model Complexity

**Small K (e.g., K=1, K=3)**:
- ✅ Captures fine patterns in data
- ❌ Very sensitive to noise and outliers
- ❌ Can overfit - complex, wiggly decision boundaries
- Example: K=1 predicts exactly like the single nearest neighbor

**Large K (e.g., K=50, K=100)**:
- ✅ More robust to noise
- ✅ Smoother decision boundaries
- ❌ May underfit - too simplistic
- ❌ Loses local patterns

**Finding Optimal K**: Use cross-validation to test different K values!

In [None]:
# Test K values from 1 to 30
k_values = range(1, 31)
train_scores = []
cv_scores = []

print("Testing different K values...\n")

for k in k_values:
    # Create KNN classifier with K neighbors
    knn = KNeighborsClassifier(n_neighbors=k)
    
    # Fit on training data
    knn.fit(X_train_scaled, y_train)
    
    # Training accuracy
    train_score = knn.score(X_train_scaled, y_train)
    train_scores.append(train_score)
    
    # Cross-validation accuracy (5-fold)
    # This is more reliable than simple train/test split
    cv_score = cross_val_score(knn, X_train_scaled, y_train, cv=5).mean()
    cv_scores.append(cv_score)

# Find best K
best_k = k_values[np.argmax(cv_scores)]
best_cv_score = max(cv_scores)

print(f"✓ Testing complete!")
print(f"\nBest K value: {best_k}")
print(f"Best CV score: {best_cv_score:.3f}")

In [None]:
# Visualize the effect of K on performance
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(k_values, train_scores, 'o-', label='Training Accuracy', linewidth=2)
plt.plot(k_values, cv_scores, 's-', label='CV Accuracy', linewidth=2)
plt.axvline(best_k, color='red', linestyle='--', label=f'Best K={best_k}', alpha=0.7)
plt.xlabel('K (Number of Neighbors)', fontsize=12)
plt.ylabel('Accuracy', fontsize=12)
plt.title('Finding Optimal K with Cross-Validation', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
# Show the bias-variance tradeoff
plt.plot(k_values, train_scores, 'o-', linewidth=2, label='Train (Lower = More Variance)')
plt.plot(k_values, cv_scores, 's-', linewidth=2, label='CV (Lower = More Bias)')
plt.axvline(best_k, color='red', linestyle='--', label=f'Optimal Tradeoff K={best_k}', alpha=0.7)
plt.xlabel('K (Number of Neighbors)', fontsize=12)
plt.ylabel('Accuracy', fontsize=12)
plt.title('Bias-Variance Tradeoff in KNN', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nObservations:")
print("- As K increases, training accuracy decreases (model becomes simpler)")
print("- CV accuracy is highest at intermediate K values")
print("- Very small K (1-3): Risk of overfitting")
print("- Very large K (25+): Risk of underfitting")

## 6. Feature Scaling: CRITICAL for KNN!

**Why is feature scaling essential for KNN?**

KNN uses distances to find neighbors. If features are on different scales, the feature with the largest range will dominate the distance calculation!

### Example Problem

Imagine predicting house prices using:
- Number of bedrooms: 1-5 (small range)
- Square footage: 500-5000 (large range)

Without scaling:
- Distance will be dominated by square footage
- Number of bedrooms will have almost no effect
- Even if bedrooms is more predictive!

**Solution**: Scale all features to similar ranges (e.g., StandardScaler, MinMaxScaler)

In [None]:
# Demonstrate the impact of feature scaling
print("Feature Ranges BEFORE Scaling:")
print("=" * 50)
for i, col in enumerate(['sepal length', 'sepal width', 'petal length', 'petal width']):
    print(f"{col:15} : [{X_train[:, i].min():.2f}, {X_train[:, i].max():.2f}]")

print("\nFeature Ranges AFTER Scaling:")
print("=" * 50)
for i, col in enumerate(['sepal length', 'sepal width', 'petal length', 'petal width']):
    print(f"{col:15} : [{X_train_scaled[:, i].min():.2f}, {X_train_scaled[:, i].max():.2f}]")

print("\nNotice: After scaling, all features are on similar scales (roughly -2 to +2)")

In [None]:
# Compare performance with and without scaling
print("Performance Comparison:\n")
print("=" * 50)

# Without scaling
knn_unscaled = KNeighborsClassifier(n_neighbors=5)
knn_unscaled.fit(X_train, y_train)
unscaled_score = knn_unscaled.score(X_test, y_test)

# With scaling
knn_scaled = KNeighborsClassifier(n_neighbors=5)
knn_scaled.fit(X_train_scaled, y_train)
scaled_score = knn_scaled.score(X_test_scaled, y_test)

print(f"WITHOUT scaling: {unscaled_score:.3f}")
print(f"WITH scaling:    {scaled_score:.3f}")
print(f"\nImprovement:     {(scaled_score - unscaled_score):.3f}")
print("=" * 50)

print("\n⚠️  ALWAYS scale features when using KNN!")
print("   - Use StandardScaler for normally distributed features")
print("   - Use MinMaxScaler for bounded features")
print("   - Use RobustScaler for features with outliers")

## 7. Weighted vs Uniform Neighbors

When making predictions, should all K neighbors have equal influence?

**Uniform Weighting** (default):
- All K neighbors vote equally
- Majority class wins (classification)
- Simple average (regression)

**Distance Weighting**:
- Closer neighbors have more influence
- Weight = 1 / distance
- More intuitive: trust closer neighbors more
- Often performs better!

In [None]:
# Compare uniform vs distance weighting
print("Comparing Voting Schemes:\n")
print("=" * 50)

# Uniform weights
knn_uniform = KNeighborsClassifier(n_neighbors=5, weights='uniform')
knn_uniform.fit(X_train_scaled, y_train)
uniform_score = knn_uniform.score(X_test_scaled, y_test)

# Distance weights
knn_distance = KNeighborsClassifier(n_neighbors=5, weights='distance')
knn_distance.fit(X_train_scaled, y_train)
distance_score = knn_distance.score(X_test_scaled, y_test)

print(f"Uniform weighting:   {uniform_score:.3f}")
print(f"Distance weighting:  {distance_score:.3f}")
print("=" * 50)

print("\nInsight: Distance weighting often (but not always) performs better.")
print("Try both and use cross-validation to choose!")

## 8. The Curse of Dimensionality

**Critical limitation of KNN**: Performance degrades in high-dimensional spaces!

### Why?

As the number of dimensions (features) increases:
1. **Distance becomes meaningless**: All points become roughly equidistant
2. **Data becomes sparse**: Points are very far apart
3. **Need exponentially more data**: To maintain the same density

**Rule of thumb**: KNN works best with < 20-30 features. Beyond that, consider:
- Dimensionality reduction (PCA, feature selection)
- Other algorithms (tree-based methods, neural networks)

Let's demonstrate this phenomenon:

In [None]:
# Demonstrate curse of dimensionality
from sklearn.datasets import make_classification

print("Testing KNN performance across different numbers of dimensions...\n")

dimensions = [2, 5, 10, 20, 50, 100]
scores = []

# Fixed sample size
n_samples = 500

for n_features in dimensions:
    # Create synthetic dataset with n_features dimensions
    X_dim, y_dim = make_classification(
        n_samples=n_samples,
        n_features=n_features,
        n_informative=min(n_features, 5),  # Only 5 features are actually useful
        n_redundant=0,
        n_classes=3,
        random_state=42
    )
    
    # Split and scale
    X_tr, X_te, y_tr, y_te = train_test_split(X_dim, y_dim, test_size=0.3, random_state=42)
    scaler_dim = StandardScaler()
    X_tr_scaled = scaler_dim.fit_transform(X_tr)
    X_te_scaled = scaler_dim.transform(X_te)
    
    # Train KNN
    knn_dim = KNeighborsClassifier(n_neighbors=5)
    knn_dim.fit(X_tr_scaled, y_tr)
    score = knn_dim.score(X_te_scaled, y_te)
    scores.append(score)
    
    print(f"Dimensions: {n_features:3d} | Accuracy: {score:.3f}")

print("\n" + "=" * 50)

In [None]:
# Visualize the curse of dimensionality
plt.figure(figsize=(10, 6))
plt.plot(dimensions, scores, 'o-', linewidth=2, markersize=10)
plt.xlabel('Number of Dimensions (Features)', fontsize=12)
plt.ylabel('Test Accuracy', fontsize=12)
plt.title('The Curse of Dimensionality in KNN', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.axhline(y=scores[0], color='red', linestyle='--', alpha=0.5, label='Performance with 2D')
plt.legend()

# Add annotation
plt.annotate('Performance degrades\nin high dimensions', 
             xy=(50, scores[-2]), xytext=(30, 0.55),
             arrowprops=dict(arrowstyle='->', color='red', lw=2),
             fontsize=11, color='red')

plt.tight_layout()
plt.show()

print("\nKey Insight:")
print("- Performance is good with few dimensions (2-10)")
print("- Performance drops significantly with many dimensions (50+)")
print("- This is the 'curse of dimensionality' - distances become less meaningful")

## 9. When to Use KNN

### KNN is Great When:

✅ **Small to medium-sized datasets** (< 10,000 samples)
   - Prediction time is O(n) per prediction
   - Can be slow with millions of samples

✅ **Low to moderate dimensions** (< 20-30 features)
   - Suffers from curse of dimensionality

✅ **Non-linear decision boundaries**
   - Can capture complex patterns
   - No assumptions about data distribution

✅ **No training time constraints**
   - Zero training time (just stores data)
   - Great for online learning

✅ **Multi-class problems**
   - Naturally handles multiple classes
   - No need for one-vs-rest

### Avoid KNN When:

❌ **Large datasets** (millions of samples)
   - Consider tree-based methods or neural networks

❌ **High-dimensional data** (hundreds/thousands of features)
   - Use dimensionality reduction first, or choose different algorithm

❌ **Prediction speed is critical**
   - Each prediction requires scanning all training data
   - Consider logistic regression or SVM

❌ **Imbalanced datasets**
   - Majority class can dominate voting
   - Need to handle class imbalance carefully

❌ **Mixed data types** (categorical + numerical)
   - Distance metrics work best with numerical data
   - Need careful preprocessing

## 10. Final Evaluation with Best Model

In [None]:
# Train final model with optimal hyperparameters
final_knn = KNeighborsClassifier(
    n_neighbors=best_k,
    weights='distance',  # Use distance weighting
    metric='minkowski',
    p=2  # Euclidean distance
)

# Fit on training data
final_knn.fit(X_train_scaled, y_train)

# Make predictions
y_pred = final_knn.predict(X_test_scaled)

# Evaluate
test_accuracy = accuracy_score(y_test, y_pred)

print("Final Model Configuration:")
print("=" * 50)
print(f"K (neighbors):      {best_k}")
print(f"Weights:            distance")
print(f"Distance metric:    Euclidean")
print(f"Feature scaling:    StandardScaler")
print("=" * 50)
print(f"\nTest Accuracy:      {test_accuracy:.3f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

In [None]:
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=np.unique(y_test),
            yticklabels=np.unique(y_test))
plt.xlabel('Predicted', fontsize=12)
plt.ylabel('Actual', fontsize=12)
plt.title(f'Confusion Matrix - KNN (K={best_k})', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("Excellent! The model correctly classifies nearly all test samples.")

## Exercises

Now it's your turn to practice! Complete these exercises to reinforce your understanding.

### Exercise 1: Distance Metric Comparison on Wine Dataset

Load the wine dataset and compare the performance of Euclidean, Manhattan, and Minkowski (p=3) distance metrics.

**Tasks:**
1. Load `wine.csv` dataset
2. Split into train/test (70/30)
3. Scale the features
4. Train KNN with K=5 using each distance metric
5. Compare test accuracies
6. Which distance metric works best for this dataset?

In [None]:
# Your code here
# Hint: Use similar code structure as the iris example



### Exercise 2: Finding Optimal K with GridSearchCV

Use GridSearchCV to find the optimal K value for the wine dataset.

**Tasks:**
1. Create a parameter grid testing K from 1 to 20
2. Test both 'uniform' and 'distance' weights
3. Use 5-fold cross-validation
4. Report the best parameters and score
5. Visualize how CV score changes with K

In [None]:
# Your code here
# Hint: from sklearn.model_selection import GridSearchCV
# param_grid = {'n_neighbors': [...], 'weights': [...]}



### Exercise 3: Impact of Feature Scaling

Demonstrate the importance of feature scaling using the breast cancer dataset.

**Tasks:**
1. Load `breast_cancer.csv`
2. Train KNN (K=5) WITHOUT scaling
3. Train KNN (K=5) WITH StandardScaler
4. Train KNN (K=5) WITH MinMaxScaler
5. Compare test accuracies
6. Explain why scaling matters (look at feature ranges)

In [None]:
# Your code here
# Hint: from sklearn.preprocessing import StandardScaler, MinMaxScaler



### Exercise 4: KNN for Regression

Apply KNN to a regression problem using the California housing dataset.

**Tasks:**
1. Load `california_housing.csv` (first 1000 rows only for speed)
2. Target variable: 'MedHouseVal'
3. Split into train/test and scale features
4. Train KNeighborsRegressor with K=5
5. Calculate RMSE and R² score
6. Compare with K=10 and K=20
7. Which K works best for regression?

In [None]:
# Your code here
# Hint: from sklearn.neighbors import KNeighborsRegressor
# Use mean_squared_error and r2_score for evaluation



## Summary

### Key Concepts Learned

1. **KNN Algorithm Basics**
   - "You are the average of your K nearest neighbors"
   - Classification: majority vote | Regression: average
   - Lazy learner: no training phase, all work at prediction time

2. **Distance Metrics**
   - Euclidean: straight-line distance (default, usually best)
   - Manhattan: city-block distance
   - Minkowski: generalization (p=1 Manhattan, p=2 Euclidean)

3. **Choosing K Value**
   - Small K: complex boundaries, risk of overfitting
   - Large K: smooth boundaries, risk of underfitting
   - Use cross-validation to find optimal K

4. **Feature Scaling is CRITICAL**
   - KNN is distance-based
   - Features on different scales will dominate distance
   - ALWAYS use StandardScaler or MinMaxScaler

5. **Weighted vs Uniform Neighbors**
   - Uniform: all K neighbors vote equally
   - Distance: closer neighbors have more influence
   - Distance weighting often performs better

6. **Curse of Dimensionality**
   - Performance degrades with many features (>20-30)
   - Distances become less meaningful in high dimensions
   - Use dimensionality reduction or choose different algorithm

7. **When to Use KNN**
   - ✅ Small-medium datasets, low dimensions, non-linear patterns
   - ❌ Large datasets, high dimensions, need fast predictions

### Best Practices

- **ALWAYS scale features** before using KNN
- **Use cross-validation** to choose K
- **Start with K=5** as a reasonable default
- **Try distance weighting** - often better than uniform
- **Check feature count** - if >30 features, consider dimensionality reduction
- **Monitor prediction time** - can be slow with large training sets

### Common Pitfalls to Avoid

- ❌ Forgetting to scale features
- ❌ Using KNN with hundreds of features
- ❌ Not using cross-validation to choose K
- ❌ Applying KNN to huge datasets without considering speed
- ❌ Ignoring class imbalance (majority class can dominate)

### What's Next

In **Module 11: Naive Bayes**, you'll learn:
- Probability-based classification
- Bayes' theorem and the "naive" assumption
- When Naive Bayes excels (text classification, spam detection)
- Gaussian, Multinomial, and Bernoulli variants
- Handling zero probabilities with Laplace smoothing

### Additional Resources

**Videos:**
- [StatQuest: KNN Explained](https://www.youtube.com/watch?v=HVXime0nQeI)
- [KNN Algorithm - Step by Step](https://www.youtube.com/watch?v=4HKqjENq9OU)

**Documentation:**
- [scikit-learn KNN User Guide](https://scikit-learn.org/stable/modules/neighbors.html)
- [KNeighborsClassifier API](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)

**Articles:**
- [K-Nearest Neighbors: Dangerously Simple](https://mathbabe.org/2013/04/04/k-nearest-neighbors-dangerously-simple/)
- [The Curse of Dimensionality](https://www.kdnuggets.com/2017/04/curse-dimensionality-explained.html)