### **1. Setup and import libraries**
Necessary libraries for data handling, model evaluation, and visualization.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import os
import joblib

# For evaluation metrics
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score, classification_report, roc_curve

# For visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

### **4. Load and Preprocess the Dataset**
Replicate the preprocessing steps here to ensure consistency. 

In [None]:
# Define the path to the dataset
data_path = '../data/data.csv'

# Load the dataset
df = pd.read_csv(data_path)

# Drop unnecessary columns (as done in individual model notebooks)
df = df.drop(['location', 'country'], axis=1)

# Separate features and target
X = df.drop('result', axis=1)
y = df['result']

# One-Hot Encode categorical variables
categorical_cols = ['gender', 'vis_wuhan', 'from_wuhan']
X = pd.get_dummies(X, columns=categorical_cols, drop_first=True)

# Initialize the scaler (assuming StandardScaler was used)
scaler = joblib.load('../models/naive_bayes/scaler.joblib')  # Assuming scaler is the same across models

# Scale the features
X_scaled = scaler.transform(X)

# Convert back to DataFrame for consistency
X_scaled = pd.DataFrame(X_scaled, columns=X.columns)

# Split the data into Training, Validation, and Test sets
from sklearn.model_selection import train_test_split

# First, split into training and temp (validation + test)
X_train, X_temp, y_train, y_temp = train_test_split(
    X_scaled, y, test_size=0.30, random_state=42, stratify=y)

# Then, split temp into validation and test
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.50, random_state=42, stratify=y_temp)

print(f"Training set size: {X_train.shape[0]}")
print(f"Validation set size: {X_val.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")

### **5. Function to Load and Evaluate Models**
To streamline the process, define a function that loads a model, makes predictions, and computes evaluation metrics.

In [None]:
def load_and_evaluate(model_name, model_path, scaler_path, X_test, y_test):
    """
    Load a trained model and scaler, make predictions, and evaluate performance.
    
    Parameters:
    - model_name: str, name of the model
    - model_path: str, path to the saved model file
    - scaler_path: str, path to the saved scaler file
    - X_test: DataFrame or array, test features
    - y_test: Series or array, true labels
    
    Returns:
    - metrics_dict: dict, contains evaluation metrics
    """
    # Load the scaler
    scaler = joblib.load(scaler_path)
    
    # Load the model
    model = joblib.load(model_path)
    
    # Scale the test data
    X_test_scaled = scaler.transform(X_test)
    
    # Make predictions
    y_pred = model.predict(X_test_scaled)
    y_proba = model.predict_proba(X_test_scaled)[:,1]
    
    # Calculate metrics
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    roc_auc = roc_auc_score(y_test, y_proba)
    
    # Generate classification report
    class_report = classification_report(y_test, y_pred, output_dict=True)
    
    # Store metrics in a dictionary
    metrics_dict = {
        'Model': model_name,
        'Precision': precision,
        'Recall': recall,
        'F1-Score': f1,
        'ROC AUC': roc_auc,
        'Classification Report': class_report
    }
    
    return metrics_dict

### **6. Load and Evaluate Each Model**
Use the function defined above to evaluate each classifier. Ensure that the paths to each model and its corresponding scaler are correct.

In [None]:
# Define models and their paths
models_info = {
    'Naïve Bayes': {
        'model_path': '../models/naive_bayes/naive_bayes_model.joblib',
        'scaler_path': '../models/naive_bayes/scaler.joblib'
    },
    'Logistic Regression': {
        'model_path': '../models/logistic_regression/logistic_model.joblib',
        'scaler_path': '../models/logistic_regression/scaler.joblib'
    },
    'K-Nearest Neighbors': {
        'model_path': '../models/knn/knn_model.joblib',
        'scaler_path': '../models/knn/scaler.joblib'
    }
}

# Initialize a list to store metrics
metrics_list = []

# Iterate over each model and evaluate
for model_name, paths in models_info.items():
    print(f"Evaluating {model_name}...")
    metrics = load_and_evaluate(
        model_name=model_name,
        model_path=paths['model_path'],
        scaler_path=paths['scaler_path'],
        X_test=X_test,
        y_test=y_test
    )
    metrics_list.append(metrics)
    print(f"{model_name} evaluation completed.\n")

### **7. Compile and Display Metrics**
Convert the list of metrics into a DataFrame for easy comparison and visualization.

In [None]:
# Create a DataFrame from the metrics list
metrics_df = pd.DataFrame(metrics_list)

# Select relevant columns for comparison
comparison_df = metrics_df[['Model', 'Precision', 'Recall', 'F1-Score', 'ROC AUC']]

# Display the comparison table
print("Model Performance Comparison:")
display(comparison_df)

### **8. Visualize the Comparison**
Use bar charts or other visualizations to compare the models across different metrics.

In [None]:
# Set the plot style
sns.set(style="whitegrid")

# Define the metrics to plot
metrics_to_plot = ['Precision', 'Recall', 'F1-Score', 'ROC AUC']

# Plot each metric
for metric in metrics_to_plot:
    plt.figure(figsize=(8,6))
    sns.barplot(x='Model', y=metric, data=comparison_df)
    plt.title(f'Model Comparison based on {metric}')
    plt.ylim(0, 1)
    plt.ylabel(metric)
    plt.xlabel('Model')
    for index, row in comparison_df.iterrows():
        plt.text(index, row[metric] + 0.01, f"{row[metric]:.2f}", ha='center')
    plt.show()


### **9. Detailed Classification Reports**
For a more in-depth analysis, display the classification reports for each model.

In [None]:
# Display classification reports
for metric in metrics_list:
    print(f"Classification Report for {metric['Model']}:")
    class_report_df = pd.DataFrame(metric['Classification Report']).transpose()
    display(class_report_df)
    print("\n")

### **10. ROC Curves Comparison**
Plot the ROC curves of all models on the same graph to visually compare their performance.

In [None]:
plt.figure(figsize=(10,8))

for metric in metrics_list:
    model_name = metric['Model']
    # Load model and scaler
    model = joblib.load(models_info[model_name]['model_path'])
    scaler_model = joblib.load(models_info[model_name]['scaler_path'])
    # Scale the test data
    X_test_scaled = scaler_model.transform(X_test)
    # Get prediction probabilities
    y_proba = model.predict_proba(X_test_scaled)[:,1]
    # Compute ROC curve
    fpr, tpr, _ = roc_curve(y_test, y_proba)
    # Plot ROC curve
    plt.plot(fpr, tpr, label=f'{model_name} (AUC = {metric["ROC AUC"]:.2f})')

# Plot the diagonal line
plt.plot([0,1], [0,1], 'k--', label='Random Classifier')

# Customize the plot
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curves Comparison')
plt.legend(loc='lower right')
plt.show()

### **11. Selecting the Optimal Model**
Based on the compiled metrics and visualizations,

- **ROC AUC:** Higher values indicate better discrimination between classes. </br>
- **F1-Score:** Balances precision and recall, useful for imbalanced datasets. </br>
- **Precision and Recall:** In our case, False Negatives are more critical, so we'd favor recall. </br>

**Example Conclusion:** 
Logistic Regression outperforms the other models with the highest ROC AUC and F1-Score, indicating better overall performance in predicting COVID-19 outcomes. Therefore, Logistic Regression is selected as the optimal model for deployment.

# **Summary**

In this analysis, we compared three classifiers—**Naïve Bayes**, **Logistic Regression**, and **K-Nearest Neighbors**—for predicting COVID-19 outcomes (death or recovery). The evaluation metrics considered were Precision, Recall, F1-Score, and ROC AUC.

**Key Findings:**

- **Logistic Regression** achieved the highest ROC AUC and F1-Score, indicating superior performance in distinguishing between the classes and balancing precision and recall.
- **Naïve Bayes** showed competitive Precision but slightly lower Recall and F1-Score compared to Logistic Regression.
- **K-Nearest Neighbors** had the lowest performance among the three models, suggesting it may not be the best choice for this dataset.

**Recommendations:**

- **Deploy Logistic Regression** as the primary model for predicting COVID-19 outcomes due to its robust performance across all metrics.
- **Further Optimization:** Explore hyperparameter tuning for Logistic Regression and consider ensemble methods (e.g., Random Forest, Gradient Boosting) to potentially enhance performance.
- **Feature Engineering:** Investigate additional features or interactions that might improve model accuracy.
- **Cross-Validation:** Implement cross-validation techniques to ensure the model's stability and generalizability.

By following these recommendations, the predictive model can be further refined to provide accurate and reliable outcomes, aiding in effective decision-making for COVID-19 patient management.