# Model Comparison and Final Evaluation

This notebook compares all trained models and performs final evaluation on the test set.

## Objectives
- Load all trained models
- Compare model performance on validation set
- Select best model(s)
- Evaluate on test set (final evaluation)
- Generate comprehensive performance report
- Create visualizations for thesis

## 1. Setup and Imports

In [None]:
# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
from pathlib import Path

# Machine learning libraries
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, classification_report
)

# Custom modules
import sys
sys.path.append('..')
from src.models.evaluation import (
    calculate_classification_metrics,
    plot_confusion_matrix,
    plot_roc_curve,
    compare_models
)

%matplotlib inline

## 2. Load Data and Models

In [None]:
# TODO: Load test data
# X_test = ...
# y_test = ...

# TODO: Load all trained models
# models = {
#     'Random Forest': joblib.load('../models/random_forest_model.pkl'),
#     'XGBoost': joblib.load('../models/xgboost_model.pkl'),
#     'Logistic Regression': joblib.load('../models/logistic_regression_model.pkl'),
#     'SVM': joblib.load('../models/svm_model.pkl'),
#     'KNN': joblib.load('../models/knn_model.pkl'),
#     'Naive Bayes': joblib.load('../models/naive_bayes_model.pkl')
# }

## 3. Model Performance Comparison

In [None]:
# TODO: Evaluate all models and collect metrics
# results = {}
# for name, model in models.items():
#     y_pred = model.predict(X_test)
#     y_pred_proba = model.predict_proba(X_test)[:, 1] if hasattr(model, 'predict_proba') else None
#     
#     results[name] = {
#         'Accuracy': accuracy_score(y_test, y_pred),
#         'Precision': precision_score(y_test, y_pred, average='weighted'),
#         'Recall': recall_score(y_test, y_pred, average='weighted'),
#         'F1-Score': f1_score(y_test, y_pred, average='weighted'),
#         'ROC-AUC': roc_auc_score(y_test, y_pred_proba) if y_pred_proba is not None else None
#     }

In [None]:
# TODO: Create comparison DataFrame
# results_df = pd.DataFrame(results).T
# results_df = results_df.sort_values('F1-Score', ascending=False)
# print(results_df)

# TODO: Save results
# results_df.to_csv('../reports/results/model_comparison.csv')

## 4. Visualize Model Comparison

In [None]:
# TODO: Create bar plot comparing models
# fig, axes = plt.subplots(2, 2, figsize=(15, 10))
# metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score']
# for idx, metric in enumerate(metrics):
#     ax = axes[idx // 2, idx % 2]
#     results_df[metric].plot(kind='bar', ax=ax)
#     ax.set_title(f'{metric} Comparison')
#     ax.set_ylabel(metric)
# plt.tight_layout()
# plt.savefig('../reports/figures/model_comparison.png', dpi=300, bbox_inches='tight')

## 5. Detailed Evaluation of Best Model

In [None]:
# TODO: Select best model based on F1-score or domain requirements
# best_model_name = results_df['F1-Score'].idxmax()
# best_model = models[best_model_name]
# print(f"Best Model: {best_model_name}")

In [None]:
# TODO: Generate classification report
# y_pred = best_model.predict(X_test)
# print(classification_report(y_test, y_pred))

In [None]:
# TODO: Plot confusion matrix for best model
# plot_confusion_matrix(y_test, y_pred, normalize=True)
# plt.savefig('../reports/figures/best_model_confusion_matrix.png', dpi=300, bbox_inches='tight')

In [None]:
# TODO: Plot ROC curve for best model
# y_pred_proba = best_model.predict_proba(X_test)[:, 1]
# plot_roc_curve(y_test, y_pred_proba, model_name=best_model_name)
# plt.savefig('../reports/figures/best_model_roc_curve.png', dpi=300, bbox_inches='tight')

## 6. ROC Curves Comparison

In [None]:
# TODO: Plot ROC curves for all models on the same plot
# plt.figure(figsize=(10, 8))
# for name, model in models.items():
#     if hasattr(model, 'predict_proba'):
#         y_pred_proba = model.predict_proba(X_test)[:, 1]
#         fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
#         auc = roc_auc_score(y_test, y_pred_proba)
#         plt.plot(fpr, tpr, label=f'{name} (AUC = {auc:.3f})')
# plt.plot([0, 1], [0, 1], 'k--', label='Random Classifier')
# plt.xlabel('False Positive Rate')
# plt.ylabel('True Positive Rate')
# plt.title('ROC Curves Comparison')
# plt.legend()
# plt.savefig('../reports/figures/all_models_roc_curves.png', dpi=300, bbox_inches='tight')

## 7. Key Findings and Recommendations

TODO: Summarize key findings:

1. **Best Performing Model:**
   - Model name and performance metrics
   - Why it performs best

2. **Model Insights:**
   - Important features driving predictions
   - Patterns discovered in AMR data

3. **Recommendations:**
   - Which model to deploy
   - Areas for improvement
   - Future work suggestions

## 8. Export Final Results

In [None]:
# TODO: Save all figures and results for thesis
# Ensure all plots are saved in reports/figures/
# Ensure all tables are saved in reports/results/