# K-Nearest Neighbors (KNN) - Scikit-Learn Implementation

Multi-class classification on the **Covertype (Forest Cover Type)** dataset.

**Dataset**: 581,012 samples, 54 features, 7 forest cover types  
**Task**: Predict forest cover type from cartographic variables  
**Key Concept**: KNN is a "lazy learner" - no training phase, expensive at prediction time

## What Makes KNN Different?
- **Non-parametric**: Doesn't learn weights/coefficients like Logistic Regression
- **Instance-based**: Stores entire training set, compares at prediction time
- **Distance-based**: Classification depends on "nearest" training examples
- **No training**: All computation happens during prediction (O(n) per sample)


In [None]:
# Standard libraries
import numpy as np
import sys

# Scikit-Learn KNN
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV

# Add utils to path
sys.path.append('../..')
from utils.data_loader import load_processed_data
from utils.metrics import accuracy, macro_f1_score, confusion_matrix_multiclass
from utils.visualization import (
    plot_confusion_matrix_multiclass,
    plot_validation_curve,
    plot_per_class_f1
)
from utils.performance import track_performance

print("Imports complete!")

Imports complete!
