# KNN Basics - Complete Guide
## Understanding K-Nearest Neighbors Algorithm

This notebook covers the foundational concepts of KNN algorithm with detailed explanations and implementations.

## What is K-Nearest Neighbors?

KNN is a simple, non-parametric, lazy learning algorithm that:
1. **Stores training data** - doesn't learn a model, just memorizes
2. **Makes predictions** - by finding K nearest neighbors in training data
3. **Votes on result** - for classification, uses majority voting
4. **Calculates average** - for regression, uses mean of K neighbors

In [None]:
# STEP 1: Import Required Libraries
# These libraries provide tools for data handling and ML

import numpy as np              # Numerical operations
import pandas as pd             # Data manipulation
import matplotlib.pyplot as plt # Plotting
from sklearn.datasets import load_iris  # Sample dataset
from sklearn.neighbors import KNeighborsClassifier  # Scikit-learn KNN
from sklearn.model_selection import train_test_split  # Data splitting
from sklearn.metrics import accuracy_score, confusion_matrix  # Metrics

print('All libraries imported successfully!')

## Distance Metrics in KNN

The key to KNN is calculating distance between points:

### 1. Euclidean Distance (Most Common)
```
Distance = sqrt((x1-x2)² + (y1-y2)²)
```

### 2. Manhattan Distance
```
Distance = |x1-x2| + |y1-y2|
```

### 3. Minkowski Distance
```
Distance = (|x1-x2|^p + |y1-y2|^p)^(1/p)
```

In [None]:
# STEP 2: Implement Distance Calculation from Scratch
# This is to understand the mechanics behind KNN

def euclidean_distance(point1, point2):
    """
    Calculate Euclidean distance between two points
    
    Parameters:
    - point1, point2: arrays or lists of coordinates
    
    Returns:
    - float: distance between points
    """
    # Convert to numpy arrays for efficient computation
    point1 = np.array(point1)
    point2 = np.array(point2)
    
    # Calculate sum of squared differences
    sum_squared_diff = np.sum((point1 - point2) ** 2)
    
    # Return square root
    return np.sqrt(sum_squared_diff)

# Test the function
point_a = [0, 0]
point_b = [3, 4]
distance = euclidean_distance(point_a, point_b)
print(f'Distance between {point_a} and {point_b}: {distance}')

In [None]:
# STEP 3: KNN Implementation from Scratch
# Complete working KNN classifier without sklearn

class KNNClassifier:
    def __init__(self, k=3):
        """Initialize KNN with K value"""
        self.k = k
        self.X_train = None
        self.y_train = None
    
    def fit(self, X, y):
        """Store training data (lazy learning)"""
        self.X_train = X
        self.y_train = y
    
    def predict(self, X):
        """Predict for new data points"""
        predictions = []
        
        # For each test point
        for test_point in X:
            # Calculate distances to all training points
            distances = [euclidean_distance(test_point, train_point)
                         for train_point in self.X_train]
            
            # Get indices of K nearest neighbors
            k_indices = np.argsort(distances)[:self.k]
            
            # Get labels of K nearest neighbors
            k_labels = [self.y_train[i] for i in k_indices]
            
            # Majority voting
            from collections import Counter
            prediction = Counter(k_labels).most_common(1)[0][0]
            predictions.append(prediction)
        
        return np.array(predictions)

print('KNN Classifier class created successfully!')

In [None]:
# STEP 4: Test with Iris Dataset

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split data: 70% training, 30% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print(f'Training samples: {len(X_train)}')
print(f'Testing samples: {len(X_test)}')
print(f'Features per sample: {X.shape[1]}')

In [None]:
# STEP 5: Train and Evaluate Custom KNN

# Create and train custom KNN with K=3
knn_custom = KNNClassifier(k=3)
knn_custom.fit(X_train, y_train)

# Make predictions
y_pred = knn_custom.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Custom KNN Accuracy (K=3): {accuracy:.4f}')
print(f'Correct predictions: {sum(y_pred == y_test)}/{len(y_test)}')

In [None]:
# STEP 6: Using Scikit-learn KNN
# Production-ready implementation

# Create KNN classifier with K=3
knn = KNeighborsClassifier(n_neighbors=3)

# Train the model
knn.fit(X_train, y_train)

# Make predictions
y_pred_sklearn = knn.predict(X_test)

# Evaluate
accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)
print(f'Scikit-learn KNN Accuracy (K=3): {accuracy_sklearn:.4f}')

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred_sklearn)
print(f'\nConfusion Matrix:\n{cm}')

## Summary

In this notebook, we learned:
1. **What KNN is** - A lazy learning algorithm that stores training data
2. **Distance metrics** - How to measure similarity between data points
3. **From scratch implementation** - Understanding the algorithm mechanics
4. **Scikit-learn usage** - Production-ready KNN implementation
5. **Evaluation** - Using accuracy and confusion matrix

Next: Move to 02_KNN_Classification.ipynb to learn advanced classification techniques!