# Model Judge Challenge
Welcome to this beginner-friendly tutorial on comparing machine learning models!
We'll learn how to evaluate two models — K-Nearest Neighbors (KNN) and Decision Tree — using the breast cancer dataset.

Let's dive in!

## Step 1: Load & Split the Data
First, we'll load the breast cancer dataset from sklearn and split it into training and test sets.

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split


In [None]:
# Load the dataset
data = load_breast_cancer()

# Split into training and testing data (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.2, random_state=42
)

## Step 2: Create and Train Models
We'll create two models — KNN and Decision Tree — and train them on the training data.

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier


In [None]:
# Initialize models
models = {
    'KNN': KNeighborsClassifier(),
    'Decision Tree': DecisionTreeClassifier(random_state=42)
}

## Step 3: Cross-Validation & Evaluation
We'll evaluate the models using 5-fold cross-validation and then test their performance on the unseen test data.

In [None]:
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


In [None]:
# Prepare a dictionary to store results
results = {}

for name, model in models.items():
    print(f"Training and evaluating {name}...")
    # Perform 5-fold cross-validation on training data
    cv_scores = cross_val_score(model, X_train, y_train, cv=5)
    cv_mean = cv_scores.mean()
    cv_std = cv_scores.std()

    # Train the model on full training data
    model.fit(X_train, y_train)
    # Predict on test data
    y_pred = model.predict(X_test)
    # Calculate evaluation metrics
    test_accuracy = accuracy_score(y_test, y_pred)
    test_precision = precision_score(y_test, y_pred)
    test_recall = recall_score(y_test, y_pred)
    test_f1 = f1_score(y_test, y_pred)

    # Store the results
    results[name] = {
        'cv_mean': cv_mean,
        'cv_std': cv_std,
        'test_accuracy': test_accuracy,
        'test_precision': test_precision,
        'test_recall': test_recall,
        'test_f1': test_f1
    }

## Step 4: Display Results & Decide the Winner
Let's compare the models based on the evaluation metrics and decide which one performs better overall.

In [None]:
print("MODEL COMPARISON RESULTS:")
print("="*25)
for name, res in results.items():
    print(f"\n{name}:")
    print(f"- CV Accuracy: {res['cv_mean']:.3f} ± {res['cv_std']:.3f}")
    print(f"- Test Accuracy: {res['test_accuracy']:.3f}")
    print(f"- Test Precision: {res['test_precision']:.3f}")
    print(f"- Test Recall: {res['test_recall']:.3f}")
    print(f"- Test F1: {res['test_f1']:.3f}")

# Decide the winner based on test accuracy and stability
knn = results['KNN']
dt = results['Decision Tree']

if knn['test_accuracy'] > dt['test_accuracy']:
    winner = 'KNN Classifier'
    reason = 'Higher overall accuracy and more stable performance'
elif dt['test_accuracy'] > knn['test_accuracy']:
    winner = 'Decision Tree'
    reason = 'Higher overall accuracy'
else:
    winner = 'Both models perform similarly'
    reason = 'Comparable accuracy, consider other metrics'

print(f"\nWINNER: {winner}")
print(f"REASON: {reason}")

## Final Thoughts
In this exercise, you learned how to: 
- Load and split data for training and testing
- Create and evaluate multiple models
- Use cross-validation for fair comparison
- Consider various metrics to judge model performance

Remember: Good data scientists judge models fairly by looking at multiple evaluation aspects!