# Model Evaluation

In this notebook, we will evaluate the performance of the trained models. We will:
1. Load the saved models and test data.
2. Generate classification reports for each model.
3. Plot confusion matrices.
4. Plot ROC curves and compare AUC scores.
5. Create a summary of model performance.

In [None]:
import numpy as np
import pandas as pd
import joblib
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import (classification_report, confusion_matrix, roc_curve, auc, roc_auc_score, accuracy_score, precision_score, recall_score, f1_score)

## 1. Load Data and Models

In [None]:
# Load test data
X_test = np.load('X_test_processed.npy', allow_pickle=True)
y_test = np.load('y_test.npy', allow_pickle=True)

# Load models
log_reg = joblib.load('logistic_regression_model.joblib')
rf = joblib.load('random_forest_model.joblib')
gb = joblib.load('gradient_boosting_model.joblib')
xgb_clf = joblib.load('xgboost_model.joblib')
best_rf = joblib.load('best_random_forest_model.joblib')

models = {
    'Logistic Regression': log_reg,
    'Random Forest': rf,
    'Gradient Boosting': gb,
    'XGBoost': xgb_clf,
    'Tuned Random Forest': best_rf
}

## 2. Confusion Matrices and Classification Reports

In [None]:
for name, model in models.items():
    y_pred = model.predict(X_test)
    print(f'--- {name} ---')
    print(classification_report(y_test, y_pred))
    
    # Plot confusion matrix
    cm = confusion_matrix(y_test, y_pred)
    plt.figure(figsize=(5, 4))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Not Churned', 'Churned'], yticklabels=['Not Churned', 'Churned'])
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.title(f'Confusion Matrix for {name}')
    plt.show()


## 3. ROC Curves and AUC Scores

In [None]:
plt.figure(figsize=(10, 8))

for name, model in models.items():
    # Check if the model has predict_proba method
    if hasattr(model, 'predict_proba'):
        y_pred_proba = model.predict_proba(X_test)[:, 1]
    else: # For models like SVM that might not have it by default
        y_pred_proba = model.decision_function(X_test)
        y_pred_proba = (y_pred_proba - y_pred_proba.min()) / (y_pred_proba.max() - y_pred_proba.min())
        
    fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
    roc_auc = auc(fpr, tpr)
    plt.plot(fpr, tpr, label=f'{name} (AUC = {roc_auc:.3f})')

plt.plot([0, 1], [0, 1], 'k--', label='Chance') # Dashed diagonal
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curves')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()

## 4. Model Performance Summary

In [None]:
performance_summary = []

for name, model in models.items():
    y_pred = model.predict(X_test)
    y_pred_proba = model.predict_proba(X_test)[:, 1]
    performance_summary.append({
        'Model': name,
        'Accuracy': accuracy_score(y_test, y_pred),
        'Precision': precision_score(y_test, y_pred),
        'Recall': recall_score(y_test, y_pred),
        'F1-Score': f1_score(y_test, y_pred),
        'AUC': roc_auc_score(y_test, y_pred_proba)
    })

summary_df = pd.DataFrame(performance_summary).set_index('Model')
summary_df.sort_values(by='F1-Score', ascending=False, inplace=True)

print('Model Performance Summary:')
display(summary_df)

### Conclusion

The **Tuned Random Forest** model demonstrates the best overall performance, achieving the highest F1-Score and a strong AUC. This is critical for a churn problem, as the F1-score provides a balance between precision and recall, and recall is particularly important for identifying as many potential churners as possible.

The base Random Forest and XGBoost models also show very competitive performance. The Gradient Boosting model is slightly behind them, and the Logistic Regression model, while a good baseline, is clearly outperformed by the more complex ensemble methods. Based on these results, the **Tuned Random Forest model is the recommended model** for this churn prediction task.