# K-Nearest Neighbors (KNN) - TensorFlow Implementation

Multi-class classification on the **Covertype (Forest Cover Type)** dataset using TensorFlow tensor operations.

**Dataset**: 581,012 samples, 54 features, 7 forest cover types  
**Task**: Predict forest cover type from cartographic variables  
**Key Concept**: KNN is a "lazy learner" - no training phase, expensive at prediction time

## TensorFlow Approach for KNN
- **Tensor operations**: Broadcasting for pairwise distance computation
- **`tf.math.top_k`**: Efficient K-nearest selection
- **Batched processing**: Memory management for large datasets

## Important Note: CPU-Only Execution
- TensorFlow 2.11+ dropped native Windows GPU support. This implementation runs on CPU.
- For GPU acceleration on Windows, options include WSL2 or TensorFlow 2.10 with Python â‰¤3.10.
- GPU setup will be configured when we reach neural network models (DNNs, CNNs).


In [1]:
# Standard libraries
import numpy as np
import sys

# TensorFlow for GPU-accelerated tensor operations
import tensorflow as tf

# Add utils to path
sys.path.append('../..')
from utils.data_loader import load_processed_data
from utils.metrics import accuracy, macro_f1_score
from utils.visualization import (
    plot_confusion_matrix_multiclass,
    plot_per_class_f1
)
from utils.performance import track_performance

# Check device
print(f"TensorFlow version: {tf.__version__}")
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print(f"GPU available: {gpus[0].name}")
else:
    print("Running on CPU (TF 2.11+ dropped native Windows GPU support)")

print("Imports complete!")

TensorFlow version: 2.20.0
Running on CPU (TF 2.11+ dropped native Windows GPU support)
Imports complete!


In [2]:
# Load preprocessed Data
"""
Load the same Covertype dataset used by Scikit-learn, No-Framework, and PyTorch.
This ensures fair comparison across all 4 frameworks.
"""

X_train, X_test, y_train, y_test, metadata = load_processed_data('knn')

# Extract metadata for reference
class_names = metadata['class_names']
n_classes = metadata['n_classes']

print(f"Training set: {X_train.shape[0]:,} samples, {X_train.shape[1]} features")
print(f"Test set: {X_test.shape[0]:,} samples")
print(f"Classes ({n_classes}): {class_names}")

Training set: 464,809 samples, 54 features
Test set: 116,203 samples
Classes (7): ['Spruce/Fir', 'Lodgepole Pine', 'Ponderosa Pine', 'Cottonwood/Willow', 'Aspen', 'Douglas-fir', 'Krummholz']


In [4]:
# Convert Data to TensorFlow Tensors
"""
Convert NumPy arrays to TensorFlow constant tensors.
Unlike pytorch, tensorflow tensors are immutable by default (tf.constant).
Running on CPU since TF 2.11+ dropped native windows gpu support.
"""

# Convert to tensorflow tensors
# tf.constant creates immutable tensors (vs pytorchs mutable torch.tensor)
X_train_t = tf.constant(X_train, dtype=tf.float32)
X_test_t = tf.constant(X_test, dtype=tf.float32)
y_train_t = tf.constant(y_train, dtype=tf.float32)
y_test_t = tf.constant(y_test, dtype=tf.float32)

print(f"X_train tensor: {X_train_t.shape}, dtype={X_train_t.dtype}")
print(f"X_test tensor:  {X_test_t.shape}, dtype={X_test_t.dtype}")
print(f"Device: {X_train_t.device}")

X_train tensor: (464809, 54), dtype=<dtype: 'float32'>
X_test tensor:  (116203, 54), dtype=<dtype: 'float32'>
Device: /job:localhost/replica:0/task:0/device:CPU:0
