# Long-Term Simulation and Ruin Analysis

## Overview
- **What this notebook does:** Runs multi-scenario Monte Carlo simulations over long horizons to estimate ruin probabilities, analyze survival times, compare ROE distributions for surviving companies, and measure insurance efficiency.
- **Prerequisites:** [core/04_monte_carlo_simulation.ipynb](04_monte_carlo_simulation.ipynb)
- **Estimated runtime:** 2--10 minutes (depends on mode)
- **Audience:** [Practitioner]

## Light vs Full Mode
This notebook supports two modes controlled by `SIMULATION_MODE` in the Configuration cell:
- **light** (default): 100 simulations, 100 years, 3 scenarios -- runs in ~2 minutes, uses minimal memory.
- **full**: 1,000 simulations, 1,000 years, 5 scenarios -- more robust statistics but requires ~19 GB RAM and ~30 minutes.

In [None]:
"""Google Colab setup: mount Drive and install package dependencies.

Run this cell first. If prompted to restart the runtime, do so, then re-run all cells.
This cell is a no-op when running locally.
"""
import sys, os
if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount('/content/drive')

    NOTEBOOK_DIR = '/content/drive/My Drive/Colab Notebooks/ei_notebooks/core'

    os.chdir(NOTEBOOK_DIR)
    if NOTEBOOK_DIR not in sys.path:
        sys.path.append(NOTEBOOK_DIR)

    !pip install ergodic-insurance -q 2>&1 | tail -3
    print('\nSetup complete. If you see numpy/scipy import errors below,')
    print('restart the runtime (Runtime > Restart runtime) and re-run all cells.')

## Setup

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import time
import gc
import warnings
warnings.filterwarnings('ignore')

from ergodic_insurance import ManufacturerConfig
from ergodic_insurance.manufacturer import WidgetManufacturer
from ergodic_insurance.loss_distributions import ManufacturingLossGenerator

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.right'] = False

# Reproducibility
np.random.seed(42)

## Configuration

Set `SIMULATION_MODE` to `"light"` for a quick run or `"full"` for production-quality statistics.

In [None]:
SIMULATION_MODE = "light"  # "light" or "full"

if SIMULATION_MODE == "full":
    N_SIMULATIONS = 1_000
    MAX_YEARS = 1_000
    TIME_HORIZONS = [10, 20, 50, 100, 500, 1000]
    INSURANCE_SCENARIOS = [
        {'name': 'No Insurance', 'deductible': float('inf'), 'limit': 0},
        {'name': 'High Deductible', 'deductible': 5_000_000, 'limit': 20_000_000},
        {'name': 'Medium Coverage', 'deductible': 1_000_000, 'limit': 10_000_000},
        {'name': 'Low Deductible', 'deductible': 500_000, 'limit': 15_000_000},
        {'name': 'Full Coverage', 'deductible': 100_000, 'limit': 50_000_000},
    ]
else:
    N_SIMULATIONS = 100
    MAX_YEARS = 100
    TIME_HORIZONS = [10, 20, 50, 100]
    INSURANCE_SCENARIOS = [
        {'name': 'No Insurance', 'deductible': float('inf'), 'limit': 0},
        {'name': 'Medium Coverage', 'deductible': 1_000_000, 'limit': 10_000_000},
        {'name': 'Full Coverage', 'deductible': 100_000, 'limit': 50_000_000},
    ]

# Business parameters
INITIAL_ASSETS = 10_000_000
ASSET_TURNOVER = 1.0
OPERATING_MARGIN = 0.08
TAX_RATE = 0.25
RETENTION_RATIO = 1.0

# Claim parameters
CLAIM_PARAMS = {
    'frequency': 0.1,
    'severity_mean': 5_000_000,
    'severity_std': 3_000_000,
    'cat_frequency': 0.01,
    'cat_severity_mean': 25_000_000,
    'cat_severity_std': 10_000_000,
}

total_sim_years = N_SIMULATIONS * MAX_YEARS * len(INSURANCE_SCENARIOS)
print(f"Mode: {SIMULATION_MODE.upper()}")
print(f"  Simulations: {N_SIMULATIONS:,}")
print(f"  Max years: {MAX_YEARS:,}")
print(f"  Scenarios: {len(INSURANCE_SCENARIOS)}")
print(f"  Total simulation-years: {total_sim_years:,}")

## 1. Simulation Engine

Each simulation creates a manufacturer, generates claims over the full horizon, and tracks ruin year, ROE, and final equity.

In [None]:
def run_single_simulation(sim_id, deductible, limit, seed):
    """Run one simulation and return summary dict."""
    config = ManufacturerConfig(
        initial_assets=INITIAL_ASSETS,
        asset_turnover_ratio=ASSET_TURNOVER,
        base_operating_margin=OPERATING_MARGIN,
        tax_rate=TAX_RATE,
        retention_ratio=RETENTION_RATIO,
    )
    manufacturer = WidgetManufacturer(config)

    claim_gen = ManufacturingLossGenerator.create_simple(
        frequency=CLAIM_PARAMS['frequency'],
        severity_mean=CLAIM_PARAMS['severity_mean'],
        severity_std=CLAIM_PARAMS['severity_std'],
        seed=seed,
    )

    # Revenue for loss generation (assets * turnover)
    revenue = INITIAL_ASSETS * ASSET_TURNOVER

    # Generate losses year-by-year using generate_losses()
    # Each LossEvent has .amount, .time (float in years), .loss_type
    claims_by_year = {}
    for year in range(MAX_YEARS):
        losses, _ = claim_gen.generate_losses(
            duration=1.0, revenue=revenue, include_catastrophic=True, time=float(year),
        )
        for loss in losses:
            claims_by_year.setdefault(year, []).append(loss.amount)

    ruin_year = None
    annual_returns = []

    for year in range(MAX_YEARS):
        for amount in claims_by_year.get(year, []):
            manufacturer.process_insurance_claim(amount, deductible, limit)

        metrics = manufacturer.step(letter_of_credit_rate=0.015, growth_rate=0.03)
        annual_returns.append(metrics['roe'])

        if manufacturer.is_ruined and ruin_year is None:
            ruin_year = year + 1
            break

    return {
        'sim_id': sim_id,
        'ruin_year': ruin_year,
        'annualized_roe': np.mean(annual_returns) if annual_returns else 0,
        'final_equity': float(manufacturer.equity) if not manufacturer.is_ruined else 0,
    }

print("Simulation function defined.")

## 2. Run All Scenarios

In [None]:
scenario_results = {}

for scenario in INSURANCE_SCENARIOS:
    print(f"\nRunning {N_SIMULATIONS} simulations for: {scenario['name']}")
    start_time = time.time()

    results = []
    for i in range(N_SIMULATIONS):
        if (i + 1) % max(1, N_SIMULATIONS // 5) == 0:
            print(f"  Progress: {i + 1}/{N_SIMULATIONS}")
        results.append(run_single_simulation(
            sim_id=i, deductible=scenario['deductible'],
            limit=scenario['limit'], seed=42 + i,
        ))

    elapsed = time.time() - start_time
    scenario_results[scenario['name']] = pd.DataFrame(results)
    print(f"  Completed in {elapsed:.1f}s ({N_SIMULATIONS / elapsed:.1f} sims/s)")
    gc.collect()

## 3. Ruin Probabilities by Time Horizon

In [None]:
ruin_rows = []
for name, df in scenario_results.items():
    for horizon in TIME_HORIZONS:
        ruined = df['ruin_year'].notna() & (df['ruin_year'] <= horizon)
        ruin_rows.append({'Scenario': name, 'Horizon': horizon,
                          'Ruin_Probability': ruined.mean() * 100})

ruin_df = pd.DataFrame(ruin_rows)
ruin_pivot = ruin_df.pivot(index='Horizon', columns='Scenario', values='Ruin_Probability')

print(f"Ruin Probability by Time Horizon (%) -- {SIMULATION_MODE} mode")
print("=" * 60)
print(ruin_pivot.to_string(float_format='%.1f'))

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

for name in scenario_results:
    data = ruin_df[ruin_df['Scenario'] == name]
    ax1.plot(data['Horizon'], data['Ruin_Probability'], 'o-', lw=2, label=name)

ax1.set_xlabel('Time Horizon (Years)')
ax1.set_ylabel('Ruin Probability (%)')
ax1.set_title('Probability of Ruin by Time Horizon')
ax1.set_xscale('log')
ax1.set_xticks(TIME_HORIZONS)
ax1.set_xticklabels(TIME_HORIZONS)
ax1.grid(True, alpha=0.3)
ax1.legend()

max_horizon = max(TIME_HORIZONS)
h_data = ruin_df[ruin_df['Horizon'] == max_horizon].sort_values('Ruin_Probability')
colors = ['green' if x < 10 else 'orange' if x < 25 else 'red'
          for x in h_data['Ruin_Probability']]
ax2.barh(range(len(h_data)), h_data['Ruin_Probability'], color=colors)
ax2.set_yticks(range(len(h_data)))
ax2.set_yticklabels(h_data['Scenario'])
ax2.set_xlabel('Ruin Probability (%)')
ax2.set_title(f'{max_horizon}-Year Ruin Probability')
ax2.grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.show()

## 4. ROE Distribution for Surviving Companies

In [None]:
n_scenarios = len(scenario_results)
n_cols = min(3, n_scenarios)
n_rows = (n_scenarios + n_cols - 1) // n_cols
fig, axes = plt.subplots(n_rows, n_cols, figsize=(5 * n_cols, 4 * n_rows), squeeze=False)
axes_flat = axes.flatten()

for idx, (name, df) in enumerate(scenario_results.items()):
    ax = axes_flat[idx]
    survivors = df[df['ruin_year'].isna()]
    if len(survivors) > 0:
        roe_vals = survivors['annualized_roe'] * 100
        n_bins = min(30, max(1, int(np.ceil(np.log2(len(roe_vals)) + 1))))
        ax.hist(roe_vals, bins=n_bins, color='skyblue', edgecolor='black', alpha=0.7)
        ax.axvline(roe_vals.mean(), color='red', ls='--', label=f'Mean: {roe_vals.mean():.1f}%')
        ax.legend(fontsize=8)
    else:
        ax.text(0.5, 0.5, 'No Survivors', ha='center', va='center', fontsize=14,
                transform=ax.transAxes)
    ax.set_title(f'{name}\n(Survivors: {len(survivors)}/{len(df)})')
    ax.set_xlabel('Annualized ROE (%)')
    ax.grid(True, alpha=0.3)

for idx in range(n_scenarios, len(axes_flat)):
    fig.delaxes(axes_flat[idx])

plt.suptitle(f'ROE Distribution for Surviving Companies ({MAX_YEARS} Years)', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

## 5. Survival Time Analysis

In [None]:
box_data, labels = [], []
for name, df in scenario_results.items():
    failed = df[df['ruin_year'].notna()]
    if len(failed) > 0:
        box_data.append(failed['ruin_year'].values)
        labels.append(f"{name}\n(n={len(failed)})")

if box_data:
    fig, ax = plt.subplots(figsize=(10, 5))
    bp = ax.boxplot(box_data, labels=labels, patch_artist=True)
    for patch, color in zip(bp['boxes'], plt.cm.Set3(range(len(box_data)))):
        patch.set_facecolor(color)
    ax.set_ylabel('Years Until Ruin')
    ax.set_title('Distribution of Survival Times for Failed Companies')
    ax.grid(True, alpha=0.3, axis='y')
    plt.tight_layout()
    plt.show()
else:
    print("No companies failed in any scenario.")

## 6. Insurance Efficiency

Risk-adjusted return = average ROE weighted by the probability of survival. Higher is better.

In [None]:
efficiency_rows = []
for scenario in INSURANCE_SCENARIOS:
    df = scenario_results[scenario['name']]
    survival_rate = df['ruin_year'].isna().mean() * 100
    avg_roe = df['annualized_roe'].mean() * 100
    efficiency_rows.append({
        'Scenario': scenario['name'],
        'Survival Rate (%)': survival_rate,
        'Avg ROE (%)': avg_roe,
        'Risk-Adjusted Return': avg_roe * (survival_rate / 100),
    })

eff_df = pd.DataFrame(efficiency_rows)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.scatter(100 - eff_df['Survival Rate (%)'], eff_df['Avg ROE (%)'], s=200, alpha=0.7,
            c=range(len(eff_df)), cmap='viridis')
for _, row in eff_df.iterrows():
    ax1.annotate(row['Scenario'],
                 (100 - row['Survival Rate (%)'], row['Avg ROE (%)']),
                 xytext=(5, 5), textcoords='offset points', fontsize=9)
ax1.set_xlabel(f'Ruin Probability @ {MAX_YEARS} Years (%)')
ax1.set_ylabel('Average ROE (%)')
ax1.set_title('Risk-Return Tradeoff')
ax1.grid(True, alpha=0.3)

ax2.bar(range(len(eff_df)), eff_df['Risk-Adjusted Return'],
        color=plt.cm.Set3(range(len(eff_df))))
ax2.set_xticks(range(len(eff_df)))
ax2.set_xticklabels(eff_df['Scenario'], rotation=30, ha='right')
ax2.set_ylabel('Risk-Adjusted Return (%)')
ax2.set_title('Insurance Efficiency')
ax2.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("Insurance Efficiency Summary:")
print("=" * 60)
print(eff_df.to_string(index=False, float_format='%.2f'))

## Key Takeaways

- Lower deductibles and higher limits significantly reduce long-term ruin probability.
- Ruin probability increases non-linearly with time horizon -- short-term metrics understate true risk.
- Insurance reduces ROE variance while slightly lowering mean returns -- a favorable trade in ergodic terms.
- Risk-adjusted return (ROE x survival probability) is the right metric for comparing insurance structures.
- For production-quality estimates, switch to `SIMULATION_MODE = "full"` with adequate RAM.

## Next Steps

- **Explore growth parameter sensitivity:** [core/07_growth_dynamics.ipynb](07_growth_dynamics.ipynb)
- **Optimize retention levels:** [optimization/01_retention_optimization.ipynb](../optimization/01_retention_optimization.ipynb)
- **See the ergodic advantage:** [core/03_ergodic_advantage.ipynb](03_ergodic_advantage.ipynb)