# Oncology Clinical Trials Model Evaluation

This notebook demonstrates the model evaluation process for oncology clinical trial outcome prediction. We'll evaluate the models trained in the previous notebook, visualize their performance, and interpret the results to gain insights into factors affecting clinical trial outcomes.

This completes our analysis workflow from data exploration to feature engineering, model training, and now model evaluation.

## Setup

First, let's import the necessary libraries and modules.

In [None]:
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from datetime import datetime

# Add project root to path to import project modules
project_root = Path().resolve().parents[0]
sys.path.append(str(project_root))

# Import model evaluation functions
from src.models.evaluate_model import (
    load_latest_model,
    load_test_data,
    evaluate_classification_model,
    evaluate_regression_model,
    plot_feature_importance,
    plot_confusion_matrix,
    plot_roc_curve,
    plot_precision_recall_curve,
    plot_residuals,
    plot_prediction_error,
    generate_evaluation_report
)

# Import visualization functions
from src.visualization.visualize import set_plotting_style

# Define project directories
PROJECT_DIR = project_root
PROCESSED_DATA_DIR = PROJECT_DIR / 'data' / 'processed'
MODEL_DIR = PROJECT_DIR / 'models'
REPORT_DIR = PROJECT_DIR / 'reports'
FIGURE_DIR = PROJECT_DIR / 'reports' / 'figures'

# Set plotting style
set_plotting_style()

# Set random seed for reproducibility
np.random.seed(42)

## Load Test Data

Let's load the modeling-ready dataset for evaluation.

In [None]:
# Load the most recent modeling-ready dataset
df_test = load_test_data()

print(f"Loaded test dataset with {df_test.shape[0]} rows and {df_test.shape[1]} columns")

# Display the first few rows
df_test.head()

## Prepare Test Data

Let's prepare the test data for both classification and regression tasks.

In [None]:
# Define target variables
classification_target = 'is_completed'  # Binary: 1 for completed, 0 for terminated
regression_target = 'duration_days'     # Continuous: trial duration in days

# Filter data to include only completed or terminated trials for classification
df_classification = df_test[df_test['overall_status'].isin(['Completed', 'Terminated'])]

# Create binary target for classification (1 for completed, 0 for terminated)
df_classification[classification_target] = (df_classification['overall_status'] == 'Completed').astype(int)

# Filter data to include only completed trials for regression (duration prediction)
df_regression = df_test[df_test['overall_status'] == 'Completed']

# Identify feature columns (exclude target variables and metadata)
exclude_cols = ['nct_id', 'overall_status', classification_target, regression_target]
feature_cols = [col for col in df_test.columns if col not in exclude_cols]

print(f"Classification test set: {df_classification.shape[0]} rows, {len(feature_cols)} features")
print(f"Regression test set: {df_regression.shape[0]} rows, {len(feature_cols)} features")

# Prepare feature matrices and target vectors
X_classification = df_classification[feature_cols]
y_classification = df_classification[classification_target]

X_regression = df_regression[feature_cols]
y_regression = df_regression[regression_target]

## Evaluate Classification Model

Let's load and evaluate the best classification model trained in the previous notebook.

In [None]:
# Load the most recent classification model
classification_model_dict = load_latest_model('classification')

# Extract model components
classification_model = classification_model_dict['model']
classification_preprocessor = classification_model_dict['preprocessor']

print(f"Loaded classification model: {type(classification_model).__name__}")

# Evaluate the model
classification_eval_results = evaluate_classification_model(
    classification_model_dict, X_classification, y_classification
)

# Display performance metrics
print("
Classification Model Performance:")
for metric, value in classification_eval_results['metrics'].items():
    print(f"{metric}: {value:.4f}")

# Display classification report
print("
Classification Report:")
print(classification_eval_results['classification_report'])

## Visualize Classification Results

Let's create visualizations to better understand the classification model's performance.

In [None]:
# Plot confusion matrix
plt.figure(figsize=(8, 6))
plot_confusion_matrix(
    classification_eval_results['confusion_matrix'],
    classes=['Terminated', 'Completed'],
    title='Confusion Matrix for Trial Completion Prediction'
)
plt.tight_layout()
plt.show()

# Plot ROC curve
plt.figure(figsize=(8, 6))
plot_roc_curve(
    classification_eval_results['fpr'],
    classification_eval_results['tpr'],
    classification_eval_results['metrics']['roc_auc'],
    title='ROC Curve for Trial Completion Prediction'
)
plt.tight_layout()
plt.show()

# Plot precision-recall curve
plt.figure(figsize=(8, 6))
plot_precision_recall_curve(
    classification_eval_results['precision'],
    classification_eval_results['recall'],
    classification_eval_results['metrics']['average_precision'],
    title='Precision-Recall Curve for Trial Completion Prediction'
)
plt.tight_layout()
plt.show()

## Feature Importance for Classification

Let's examine which features are most important for predicting trial completion status.

In [None]:
# Plot feature importance
if 'feature_importance' in classification_eval_results:
    plt.figure(figsize=(12, 8))
    plot_feature_importance(
        classification_eval_results['feature_importance'],
        title='Feature Importance for Trial Completion Prediction',
        top_n=20
    )
    plt.tight_layout()
    plt.show()
    
    # Display top 10 features and their importance scores
    print("
Top 10 Features for Trial Completion Prediction:")
    for feature, importance in classification_eval_results['feature_importance'].items()[:10]:
        print(f"{feature}: {importance:.4f}")
else:
    print("Feature importance not available for this model")

## SHAP Analysis for Classification

Let's use SHAP (SHapley Additive exPlanations) to better understand how each feature contributes to the model's predictions.

In [None]:
# SHAP analysis (if available)
if 'shap_values' in classification_eval_results:
    # Summary plot
    plt.figure(figsize=(12, 8))
    shap.summary_plot(
        classification_eval_results['shap_values'],
        X_classification,
        feature_names=feature_cols,
        plot_type='bar'
    )
    plt.tight_layout()
    plt.show()
    
    # Detailed SHAP plot for top features
    plt.figure(figsize=(12, 8))
    shap.summary_plot(
        classification_eval_results['shap_values'],
        X_classification,
        feature_names=feature_cols
    )
    plt.tight_layout()
    plt.show()
else:
    print("SHAP analysis not available for this model")

## Evaluate Regression Model

Now, let's load and evaluate the best regression model trained in the previous notebook.

In [None]:
# Load the most recent regression model
regression_model_dict = load_latest_model('regression')

# Extract model components
regression_model = regression_model_dict['model']
regression_preprocessor = regression_model_dict['preprocessor']

print(f"Loaded regression model: {type(regression_model).__name__}")

# Evaluate the model
regression_eval_results = evaluate_regression_model(
    regression_model_dict, X_regression, y_regression
)

# Display performance metrics
print("
Regression Model Performance:")
for metric, value in regression_eval_results['metrics'].items():
    print(f"{metric}: {value:.4f}")

## Visualize Regression Results

Let's create visualizations to better understand the regression model's performance.

In [None]:
# Plot actual vs. predicted values
plt.figure(figsize=(10, 6))
plot_prediction_error(
    regression_eval_results['y_true'],
    regression_eval_results['y_pred'],
    title='Actual vs. Predicted Trial Duration'
)
plt.tight_layout()
plt.show()

# Plot residuals
plt.figure(figsize=(10, 6))
plot_residuals(
    regression_eval_results['y_true'],
    regression_eval_results['y_pred'],
    title='Residuals for Trial Duration Prediction'
)
plt.tight_layout()
plt.show()

# Plot residual distribution
plt.figure(figsize=(10, 6))
residuals = regression_eval_results['y_true'] - regression_eval_results['y_pred']
sns.histplot(residuals, kde=True)
plt.title('Distribution of Residuals')
plt.xlabel('Residual (Actual - Predicted)')
plt.ylabel('Frequency')
plt.axvline(x=0, color='r', linestyle='--')
plt.tight_layout()
plt.show()

## Feature Importance for Regression

Let's examine which features are most important for predicting trial duration.

In [None]:
# Plot feature importance
if 'feature_importance' in regression_eval_results:
    plt.figure(figsize=(12, 8))
    plot_feature_importance(
        regression_eval_results['feature_importance'],
        title='Feature Importance for Trial Duration Prediction',
        top_n=20
    )
    plt.tight_layout()
    plt.show()
    
    # Display top 10 features and their importance scores
    print("
Top 10 Features for Trial Duration Prediction:")
    for feature, importance in regression_eval_results['feature_importance'].items()[:10]:
        print(f"{feature}: {importance:.4f}")
else:
    print("Feature importance not available for this model")

## SHAP Analysis for Regression

Let's use SHAP to better understand how each feature contributes to the regression model's predictions.

In [None]:
# SHAP analysis (if available)
if 'shap_values' in regression_eval_results:
    # Summary plot
    plt.figure(figsize=(12, 8))
    shap.summary_plot(
        regression_eval_results['shap_values'],
        X_regression,
        feature_names=feature_cols,
        plot_type='bar'
    )
    plt.tight_layout()
    plt.show()
    
    # Detailed SHAP plot for top features
    plt.figure(figsize=(12, 8))
    shap.summary_plot(
        regression_eval_results['shap_values'],
        X_regression,
        feature_names=feature_cols
    )
    plt.tight_layout()
    plt.show()
else:
    print("SHAP analysis not available for this model")

## Generate Evaluation Report

Let's generate a comprehensive evaluation report for both models.

In [None]:
# Generate evaluation report
report = generate_evaluation_report(
    classification_model_dict,
    classification_eval_results,
    regression_model_dict,
    regression_eval_results
)

# Display report summary
print("
Evaluation Report Summary:")
print(report['summary'])

# Save report to file
report_path = REPORT_DIR / f"model_evaluation_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.html"
with open(report_path, 'w') as f:
    f.write(report['html'])

print(f"
Saved evaluation report to {report_path}")

## Key Insights and Findings

Based on our model evaluation, we can draw several insights about oncology clinical trials:

### Factors Affecting Trial Completion

1. **[Add insights after running the notebook]**
2. **[Add insights after running the notebook]**
3. **[Add insights after running the notebook]**

### Factors Affecting Trial Duration

1. **[Add insights after running the notebook]**
2. **[Add insights after running the notebook]**
3. **[Add insights after running the notebook]**

### Recommendations for Trial Design

1. **[Add recommendations after running the notebook]**
2. **[Add recommendations after running the notebook]**
3. **[Add recommendations after running the notebook]**

## Summary

In this notebook, we've evaluated the models trained for predicting oncology clinical trial outcomes:

1. **Classification Model for Trial Completion Status**
   - Evaluated model performance using accuracy, precision, recall, F1 score, and ROC AUC
   - Visualized the confusion matrix, ROC curve, and precision-recall curve
   - Analyzed feature importance and SHAP values to understand factors affecting trial completion

2. **Regression Model for Trial Duration**
   - Evaluated model performance using R², MAE, and RMSE
   - Visualized actual vs. predicted values and residuals
   - Analyzed feature importance and SHAP values to understand factors affecting trial duration

3. **Generated a comprehensive evaluation report** that can be shared with stakeholders

This completes our analysis workflow from data exploration to feature engineering, model training, and model evaluation. The insights gained from this analysis can help inform the design and planning of future oncology clinical trials to improve their success rates and efficiency.