# Model Judge Challenge
Welcome to the machine learning model comparison tutorial! In this notebook, we'll compare K-Nearest Neighbors (KNN) and Decision Tree classifiers on the wine dataset.
Let's learn how to load data, train models, evaluate their performance, and decide which one performs better.

## Step 1: Load and Explore the Dataset

In [None]:
from sklearn.datasets import load_wine
import pandas as pd

# Load the wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Convert to DataFrame for easier exploration
df = pd.DataFrame(X, columns=wine.feature_names)
df['target'] = y

# Display first few rows
df.head()

The dataset contains 178 samples with 13 features each, labeled into 3 different wine classes.
Next, let's split the data into training and testing sets.

In [None]:
from sklearn.model_selection import train_test_split

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f"Training samples: {X_train.shape[0]}")
print(f"Testing samples: {X_test.shape[0]}")

## Step 2: Train K-Nearest Neighbors (KNN) and Decision Tree models

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

# Initialize models
knn = KNeighborsClassifier(n_neighbors=5)
dt = DecisionTreeClassifier(random_state=42)

# Train models
knn.fit(X_train, y_train)
dt.fit(X_train, y_train)

## Step 3: Evaluate the models with classification metrics

In [None]:
from sklearn.metrics import classification_report, accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Predict on test data
knn_pred = knn.predict(X_test)
dt_pred = dt.predict(X_test)

# Define a function to evaluate a model
def evaluate_model(y_true, y_pred):
    report = classification_report(y_true, y_pred, output_dict=True)
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average='macro')
    recall = recall_score(y_true, y_pred, average='macro')
    f1 = f1_score(y_true, y_pred, average='macro')
    return {
        "accuracy": accuracy,
        "precision": precision,
        "recall": recall,
        "f1": f1,
        "report": report
    }

# Evaluate KNN
knn_results = evaluate_model(y_test, knn_pred)
# Evaluate Decision Tree
dt_results = evaluate_model(y_test, dt_pred)

# Display results
print("=== KNN Performance ===")
print(f"Accuracy: {knn_results['accuracy']:.3f}")
print(f"Precision: {knn_results['precision']:.3f}")
print(f"Recall: {knn_results['recall']:.3f}")
print(f"F1 Score: {knn_results['f1']:.3f}")

print("\n=== Decision Tree Performance ===")
print(f"Accuracy: {dt_results['accuracy']:.3f}")
print(f"Precision: {dt_results['precision']:.3f}")
print(f"Recall: {dt_results['recall']:.3f}")
print(f"F1 Score: {dt_results['f1']:.3f}")

## Step 4: Visualize confusion matrices for comparison

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Function to plot confusion matrix
def plot_confusion(matrix, title):
    plt.figure(figsize=(6,4))
    sns.heatmap(matrix, annot=True, fmt='d', cmap='Blues')
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.title(title)
    plt.show()

# Plot for KNN
cm_knn = confusion_matrix(y_test, knn_pred)
plot_confusion(cm_knn, 'Confusion Matrix for KNN')

# Plot for Decision Tree
cm_dt = confusion_matrix(y_test, dt_pred)
plot_confusion(cm_dt, 'Confusion Matrix for Decision Tree')

## Step 5: Perform cross-validation to get an overall performance perspective

In [None]:
from sklearn.model_selection import cross_val_score

# Cross-validation scores for KNN
knn_cv_scores = cross_val_score(knn, X, y, cv=5, scoring='accuracy')

# Cross-validation scores for Decision Tree
dt_cv_scores = cross_val_score(dt, X, y, cv=5, scoring='accuracy')

# Display mean and std
print(f"KNN Cross-Validation Accuracy: {knn_cv_scores.mean():.3f} ± {knn_cv_scores.std():.3f}")
print(f"Decision Tree Cross-Validation Accuracy: {dt_cv_scores.mean():.3f} ± {dt_cv_scores.std():.3f}")

## Final Step: Judge's Verdict

Based on the evaluation metrics and cross-validation results, the model with the better overall performance is the clear winner.

### Example verdict:
🏆 **Winner: KNN** because it achieved higher accuracy and F1 score on the test set, and maintained solid performance during cross-validation.

Great job comparing models objectively!