# Model Comparison: Fine-Gray vs Cause-Specific Cox

This notebook compares the **Fine-Gray subdistribution hazard model** with **cause-specific Cox models** for mortgage prepayment prediction.

## Comparison Framework

| Aspect | Cause-Specific Cox | Fine-Gray |
|--------|-------------------|------------|
| **Estimates** | Hazard among at-risk | Subdistribution hazard |
| **Competing events** | Censored | Stay in risk set |
| **CIF prediction** | Indirect (biased) | Direct |
| **Interpretation** | Etiology | Prognosis/prediction |

## Metrics
- Coefficient comparison
- Concordance index
- Time-dependent AUC
- Calibration

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import pickle
import warnings
warnings.filterwarnings('ignore')

# Competing risks module
import sys
sys.path.insert(0, '..')
from src.competing_risks import (
    compare_model_coefficients,
    concordance_index_competing_risks,
    calibration_plot,
)
from src.competing_risks.evaluation import (
    time_dependent_auc,
    plot_discrimination_over_time,
)

from lifelines import CoxPHFitter, AalenJohansenFitter
from sksurv.metrics import concordance_index_censored
from sklearn.metrics import roc_auc_score, brier_score_loss

sns.set_style('whitegrid')
%matplotlib inline

## Load Models and Data

In [None]:
MODELS_DIR = Path('../models')
DATA_DIR = Path('../data/processed')

# Load cause-specific Cox models
try:
    with open(MODELS_DIR / 'cox_prepay.pkl', 'rb') as f:
        cox_prepay = pickle.load(f)
    with open(MODELS_DIR / 'cox_default.pkl', 'rb') as f:
        cox_default = pickle.load(f)
    print("Loaded cause-specific Cox models")
    cox_loaded = True
except FileNotFoundError:
    print("Cox models not found. Run notebook 05 first.")
    cox_loaded = False

# Load Fine-Gray model
try:
    with open(MODELS_DIR / 'fine_gray_prepay.pkl', 'rb') as f:
        fg_data = pickle.load(f)
        fg_model = fg_data['model']
        fg_scaler = fg_data['scaler']
        fg_features = fg_data['features']
    print("Loaded Fine-Gray model")
    fg_loaded = True
except FileNotFoundError:
    print("Fine-Gray model not found. Run notebook 06 first.")
    fg_loaded = False

In [None]:
# Load data
df = pd.read_parquet(DATA_DIR / 'survival_data.parquet')

# Create event code
event_map = {
    'censored': 0, 'prepay': 1, 'default': 2,
    'matured': 0, 'other': 3, 'defect': 3,
}
df['event_code'] = df['event_type'].map(event_map)

# Prepare features (same as training)
feature_cols = ['credit_score', 'orig_ltv', 'orig_dti', 'orig_interest_rate']
df_model = df[feature_cols + ['duration', 'event_code']].dropna().copy()

print(f"Data: {len(df_model):,} loans")

In [None]:
# Use same test split as training (seed=42)
from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(df_model, test_size=0.2, random_state=42)
print(f"Test set: {len(test_df):,} loans")

## 1. Coefficient Comparison

Compare coefficients between Fine-Gray and cause-specific Cox models.

In [None]:
if cox_loaded and fg_loaded:
    # Extract Cox coefficients
    cox_coefs = cox_prepay.summary[['coef', 'exp(coef)']].copy()
    cox_coefs.columns = ['coef_cox', 'hr_cox']
    cox_coefs = cox_coefs.reset_index().rename(columns={'covariate': 'feature'})
    
    # Extract Fine-Gray coefficients (sklearn)
    fg_coefs = pd.DataFrame({
        'feature': fg_features,
        'coef_fg': fg_model.coef_[0],
        'hr_fg': np.exp(fg_model.coef_[0]),
    })
    
    # Merge (handle potential feature name differences)
    comparison = cox_coefs.merge(fg_coefs, on='feature', how='outer')
    comparison['coef_diff'] = comparison['coef_fg'] - comparison['coef_cox']
    comparison['hr_ratio'] = comparison['hr_fg'] / comparison['hr_cox']
    
    print("=== Coefficient Comparison: Fine-Gray vs Cause-Specific Cox ===")
    print(comparison.round(4).to_string(index=False))
else:
    print("Models not loaded. Skipping coefficient comparison.")

In [None]:
if cox_loaded and fg_loaded:
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Coefficient comparison
    ax = axes[0]
    x = np.arange(len(comparison))
    width = 0.35
    
    ax.bar(x - width/2, comparison['coef_cox'], width, label='Cause-Specific Cox', alpha=0.7)
    ax.bar(x + width/2, comparison['coef_fg'], width, label='Fine-Gray', alpha=0.7)
    ax.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
    ax.set_xticks(x)
    ax.set_xticklabels(comparison['feature'], rotation=45, ha='right')
    ax.set_ylabel('Coefficient')
    ax.set_title('Coefficient Comparison')
    ax.legend()
    ax.grid(True, alpha=0.3, axis='y')
    
    # Hazard ratio comparison
    ax = axes[1]
    ax.bar(x - width/2, comparison['hr_cox'], width, label='Cause-Specific Cox', alpha=0.7)
    ax.bar(x + width/2, comparison['hr_fg'], width, label='Fine-Gray', alpha=0.7)
    ax.axhline(y=1, color='black', linestyle='--', linewidth=1)
    ax.set_xticks(x)
    ax.set_xticklabels(comparison['feature'], rotation=45, ha='right')
    ax.set_ylabel('Hazard Ratio')
    ax.set_title('Hazard Ratio Comparison')
    ax.legend()
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    plt.savefig('../reports/figures/model_coefficient_comparison.png', dpi=150)
    plt.show()

## 2. Discrimination Comparison

Compare concordance index and time-dependent AUC.

In [None]:
# Get risk predictions from both models
test_features = test_df[feature_cols]

if cox_loaded:
    # Cox predictions (need log_upb if used)
    if 'log_upb' in cox_prepay.summary.index:
        test_cox = test_features.copy()
        test_cox['log_upb'] = np.log(test_df['orig_upb'])
        risk_cox = cox_prepay.predict_partial_hazard(test_cox).values.flatten()
    else:
        risk_cox = cox_prepay.predict_partial_hazard(test_features).values.flatten()

if fg_loaded:
    # Fine-Gray predictions
    X_test_fg = fg_scaler.transform(test_df[fg_features])
    risk_fg = fg_model.predict_proba(X_test_fg)[:, 1]

print("Predictions computed.")

In [None]:
# Calculate C-index for both models
event_indicator = (test_df['event_code'] == 1).values
event_times = test_df['duration'].values

results = []

if cox_loaded:
    c_cox = concordance_index_censored(event_indicator, event_times, risk_cox)
    results.append({'Model': 'Cause-Specific Cox', 'C-index': c_cox[0], 
                   'Concordant': c_cox[1], 'Discordant': c_cox[2]})

if fg_loaded:
    c_fg = concordance_index_censored(event_indicator, event_times, risk_fg)
    results.append({'Model': 'Fine-Gray', 'C-index': c_fg[0],
                   'Concordant': c_fg[1], 'Discordant': c_fg[2]})

results_df = pd.DataFrame(results)
print("=== Concordance Index Comparison ===")
print(results_df.to_string(index=False))

In [None]:
# Time-dependent AUC at various horizons
time_points = [12, 24, 36, 48, 60, 72, 84, 96, 108, 120]

auc_results = []

for t in time_points:
    row = {'Time (months)': t}
    
    if cox_loaded:
        auc_cox = time_dependent_auc(
            event_times, test_df['event_code'].values, risk_cox, t, event_of_interest=1
        )
        row['Cox AUC'] = auc_cox
    
    if fg_loaded:
        auc_fg = time_dependent_auc(
            event_times, test_df['event_code'].values, risk_fg, t, event_of_interest=1
        )
        row['Fine-Gray AUC'] = auc_fg
    
    auc_results.append(row)

auc_df = pd.DataFrame(auc_results)
print("=== Time-Dependent AUC ===")
print(auc_df.round(4).to_string(index=False))

In [None]:
# Plot time-dependent AUC
fig, ax = plt.subplots(figsize=(10, 6))

if 'Cox AUC' in auc_df.columns:
    ax.plot(auc_df['Time (months)'], auc_df['Cox AUC'], 'o-', 
            label='Cause-Specific Cox', linewidth=2, markersize=8)

if 'Fine-Gray AUC' in auc_df.columns:
    ax.plot(auc_df['Time (months)'], auc_df['Fine-Gray AUC'], 's-', 
            label='Fine-Gray', linewidth=2, markersize=8)

ax.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5, label='Random')
ax.set_xlabel('Time (months)')
ax.set_ylabel('Time-dependent AUC')
ax.set_title('Discrimination Over Time: Prepayment Prediction')
ax.legend()
ax.grid(True, alpha=0.3)
ax.set_ylim(0.45, 0.75)

plt.tight_layout()
plt.savefig('../reports/figures/time_dependent_auc_comparison.png', dpi=150)
plt.show()

## 3. Calibration Comparison

In [None]:
from sklearn.calibration import calibration_curve

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

y_true = (test_df['event_code'] == 1).astype(int).values

if cox_loaded:
    ax = axes[0]
    # Normalize Cox risk scores to probability-like scale
    risk_cox_norm = (risk_cox - risk_cox.min()) / (risk_cox.max() - risk_cox.min())
    prob_true, prob_pred = calibration_curve(y_true, risk_cox_norm, n_bins=10)
    
    ax.plot([0, 1], [0, 1], 'k--', label='Perfect')
    ax.plot(prob_pred, prob_true, 'o-', label='Cox model')
    ax.set_xlabel('Mean predicted probability')
    ax.set_ylabel('Fraction of positives')
    ax.set_title('Calibration: Cause-Specific Cox')
    ax.legend()
    ax.grid(True, alpha=0.3)

if fg_loaded:
    ax = axes[1]
    prob_true, prob_pred = calibration_curve(y_true, risk_fg, n_bins=10)
    
    ax.plot([0, 1], [0, 1], 'k--', label='Perfect')
    ax.plot(prob_pred, prob_true, 'o-', label='Fine-Gray model')
    ax.set_xlabel('Mean predicted probability')
    ax.set_ylabel('Fraction of positives')
    ax.set_title('Calibration: Fine-Gray')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../reports/figures/calibration_comparison.png', dpi=150)
plt.show()

## 4. CIF Prediction Comparison

Compare predicted vs observed cumulative incidence.

In [None]:
# Get observed CIF using Aalen-Johansen
ajf = AalenJohansenFitter()
ajf.fit(test_df['duration'], test_df['event_code'], event_of_interest=1)

# Plot observed CIF
fig, ax = plt.subplots(figsize=(10, 6))

ajf.plot(ax=ax, label='Observed CIF (Aalen-Johansen)')

ax.set_xlabel('Time (months)')
ax.set_ylabel('Cumulative Incidence')
ax.set_title('Observed Cumulative Incidence: Prepayment')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nObserved CIF at key time points:")
for t in [12, 24, 36, 60, 120]:
    cif_t = ajf.cumulative_density_at_times(t).values[0]
    print(f"  {t:3d} months: {cif_t:.1%}")

## Summary

In [None]:
print("=" * 70)
print("MODEL COMPARISON SUMMARY: Fine-Gray vs Cause-Specific Cox")
print("=" * 70)

print("\n1. COEFFICIENT COMPARISON")
print("-" * 40)
if cox_loaded and fg_loaded:
    print("Coefficients are similar but not identical.")
    print("Differences arise because:")
    print("  - Cox: estimates hazard among at-risk subjects")
    print("  - Fine-Gray: estimates subdistribution hazard")
    print("  - Competing events affect risk sets differently")

print("\n2. DISCRIMINATION (C-INDEX)")
print("-" * 40)
print(results_df.to_string(index=False))

print("\n3. TIME-DEPENDENT AUC")
print("-" * 40)
print("AUC varies over time, typically higher at shorter horizons.")

print("\n4. KEY RECOMMENDATIONS")
print("-" * 40)
print("Use Cause-Specific Cox when:")
print("  - Understanding covariate effects on event intensity")
print("  - Etiological research questions")
print("")
print("Use Fine-Gray when:")
print("  - Predicting cumulative incidence (probability by time t)")
print("  - Prognosis and risk stratification")
print("  - Portfolio loss forecasting")

## Interpretation Guide

### When Coefficients Differ

If a covariate has:
- **Stronger effect in Cox vs Fine-Gray**: The covariate affects the event intensity among at-risk subjects more than it affects the overall cumulative incidence
- **Stronger effect in Fine-Gray vs Cox**: The covariate's effect is amplified when accounting for competing events

### Example: FICO Score

If FICO has a **larger negative coefficient** in cause-specific Cox for prepayment:
- High FICO borrowers prepay faster (among those who haven't defaulted)
- But high FICO borrowers also default less often
- Fine-Gray accounts for this: high FICO borrowers are observed longer (don't default), so eventually many prepay
- The subdistribution hazard ratio may be closer to 1

### Practical Implications for Mortgage Prepayment

1. **For pricing/hedging**: Use Fine-Gray to predict cumulative prepayment rates
2. **For risk factor analysis**: Use cause-specific Cox to understand drivers
3. **For stress testing**: May need both to capture dynamics under different scenarios