### **K-Nearest Neighbors (KNN) Classification**  

K-Nearest Neighbors (KNN) is a simple, non-parametric classification algorithm that assigns a class to a data point based on the majority class of its nearest neighbors. It is widely used for classification and regression tasks due to its ease of implementation and intuitive approach.  

### **How It Works:**  
- **Instance-Based Learning**: KNN does not explicitly learn a model but stores the entire dataset and classifies new points based on proximity to existing data.  
- **Distance Metric**: Computes the distance between points using metrics like **Euclidean distance**, **Manhattan distance**, or **Minkowski distance**.  
- **Voting Mechanism**: Assigns a class based on the majority vote of the **k** nearest neighbors. A smaller **k** is more sensitive to noise, while a larger **k** smooths decision boundaries.  
- **Weighted Voting (Optional)**: Some versions weight neighbors by distance, giving closer points more influence in classification.  

### **Advantages:**  
✅ **Simple & Intuitive**: Easy to understand and implement without making strong assumptions about data distribution.  
✅ **Non-Parametric**: Works well with complex decision boundaries since it does not assume a specific functional form.  
✅ **Handles Multi-Class Problems**: Naturally supports multiple classes without modification.  
✅ **Adaptable to Different Distance Metrics**: Can be customized using different distance functions to suit various data types.  

### **Disadvantages:**  
❌ **Computationally Expensive**: Requires storing the entire dataset and computing distances at prediction time, making it slow for large datasets.  
❌ **Sensitive to Irrelevant Features**: Performance degrades if irrelevant or redundant features dominate meaningful ones.  
❌ **Imbalanced Classes Issue**: May favor majority classes unless weighting techniques are applied.  
❌ **Curse of Dimensionality**: High-dimensional data can make distance calculations less meaningful, reducing effectiveness.  


### KNN using Baseline Predictors (refer /Data/Data_Formatting.ipynb)

In [7]:
def make_yearly_predictions_knn_base(Train, Test):
    # Define predictors
    static_predictors = parameters_base(Train, Test)
    best_k = find_best_k_base(Train)
    
    # Train a KNN model
    knn_clf = KNeighborsClassifier(n_neighbors=best_k)
    knn_clf.fit(Train[static_predictors], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
            preds = knn_clf.predict(test_year[static_predictors])

            # Calculate precision and accuracy
            precision = precision_score(test_year["Target"], preds, average="weighted", zero_division=1)
            accuracy = accuracy_score(test_year["Target"], preds)

            # Append results to list
            results.append({
                "Model": "K-Nearest Neighbors",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)
    
    return results_df


### KNN using Baseline Predictors + Rolling Predictors (refer /Data/Data_Formatting.ipynb)

In [8]:
def make_yearly_predictions_knn_roll(Train, Test):
    # Convert 'Date' columns to datetime and sort data
    best_k = find_best_k_roll(Train) 
    all_predictors = parameters_roll(Train,Test)
    Train = roll(Train)
    Test  = roll(Test)

      # Train a KNN model
    knn_clf = KNeighborsClassifier(n_neighbors=best_k)
    knn_clf.fit(Train[all_predictors], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
            preds = knn_clf.predict(test_year[all_predictors])

            # Calculate precision and accuracy
            precision = precision_score(test_year["Target"], preds, average="weighted", zero_division=1)
            accuracy = accuracy_score(test_year["Target"], preds)

            # Append results to list
            results.append({
                "Model": "K-Nearest Neighbors",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)
    
    return results_df


### KNN using  Full Feature Set (refer /Data/Data_Formatting.ipynb)

In [9]:
def make_yearly_predictions_knn_full(Train, Test):
    # Convert 'Date' columns to datetime and sort data
    best_k = find_best_k_full(Train) 
    all_predictors = parameters_full(Train,Test)
    Train = roll(Train)
    Test  = roll(Test)

      # Train a KNN model
    knn_clf = KNeighborsClassifier(n_neighbors=best_k)
    knn_clf.fit(Train[all_predictors], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
            preds = knn_clf.predict(test_year[all_predictors])

            # Calculate precision and accuracy
            precision = precision_score(test_year["Target"], preds, average="weighted", zero_division=1)
            accuracy = accuracy_score(test_year["Target"], preds)

            # Append results to list
            results.append({
                "Model": "K-Nearest Neighbors",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)
    
    return results_df
