# Loss Distribution Analysis

## Overview
- **What this notebook does:** Explores the three types of manufacturing losses (attritional, large, catastrophic), their distributions, temporal patterns, extreme-value behavior, and correlations.
- **Prerequisites:** [getting-started/03_basic_manufacturer.ipynb](../getting-started/03_basic_manufacturer.ipynb)
- **Estimated runtime:** 1--2 minutes
- **Audience:** [Practitioner]

## Why Loss Distributions Matter
Insurance optimization depends on accurately modeling the *frequency* and *severity* of losses. This notebook shows how the `ManufacturingLossGenerator` combines three loss types into a realistic aggregate loss distribution, and how to use that distribution to select insurance retention levels.

In [None]:
"""Google Colab setup: mount Drive and install package dependencies.

Run this cell first. If prompted to restart the runtime, do so, then re-run all cells.
This cell is a no-op when running locally.
"""
import sys, os
if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount('/content/drive')

    NOTEBOOK_DIR = '/content/drive/My Drive/Colab Notebooks/ei_notebooks/core'

    os.chdir(NOTEBOOK_DIR)
    if NOTEBOOK_DIR not in sys.path:
        sys.path.append(NOTEBOOK_DIR)

    !pip install git+https://github.com/AlexFiliakov/Ergodic-Insurance-Limits.git -q 2>&1 | tail -3
    print('\nSetup complete. If you see numpy/scipy import errors below,')
    print('restart the runtime (Runtime > Restart runtime) and re-run all cells.')

## Setup

In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio

from ergodic_insurance.loss_distributions import ManufacturingLossGenerator
from ergodic_insurance.visualization import WSJ_COLORS, format_currency

pio.templates.default = "plotly_white"

# Reproducibility
np.random.seed(42)

## Configuration

Loss parameters for a typical manufacturer with $10M revenue.

In [None]:
REVENUE = 10_000_000
N_SIMULATIONS = 10_000

ATTRITIONAL_PARAMS = {
    'base_frequency': 5.0,
    'severity_mean': 50_000,
    'severity_cv': 0.8,
}
LARGE_PARAMS = {
    'base_frequency': 0.5,
    'severity_mean': 2_000_000,
    'severity_cv': 1.2,
}
CATASTROPHIC_PARAMS = {
    'base_frequency': 0.02,
    'severity_xm': 10_000_000,
    'severity_alpha': 2.5,
}

generator = ManufacturingLossGenerator(
    attritional_params=ATTRITIONAL_PARAMS,
    large_params=LARGE_PARAMS,
    catastrophic_params=CATASTROPHIC_PARAMS,
    seed=42,
)

print("Loss generator configured.")

## 1. Loss Distribution by Type

Generate many one-year loss scenarios and break them down by loss type.

In [None]:
all_losses = []
loss_types = []

for _ in range(N_SIMULATIONS):
    events, _ = generator.generate_losses(duration=1.0, revenue=REVENUE)
    for event in events:
        all_losses.append(event.amount)
        loss_types.append(event.loss_type)

df = pd.DataFrame({'amount': all_losses, 'type': loss_types})

# Summary statistics by type
stats = df.groupby('type')['amount'].agg(['count', 'mean', 'std', 'min', 'max']).round(0)
print(f"Generated {len(all_losses):,} individual loss events across {N_SIMULATIONS:,} simulation years.")
print(f"\nSummary by loss type:")
print(stats)

In [None]:
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        'Loss Distribution by Type', 'Empirical CDF',
        'Loss Frequency by Type', 'Summary Statistics',
    ),
    specs=[
        [{'type': 'histogram'}, {'type': 'scatter'}],
        [{'type': 'bar'}, {'type': 'table'}],
    ],
)

for loss_type in df['type'].unique():
    type_data = df[df['type'] == loss_type]['amount']
    fig.add_trace(go.Histogram(x=type_data, name=loss_type, opacity=0.7, nbinsx=30), row=1, col=1)

sorted_losses = np.sort(all_losses)
cdf = np.arange(1, len(sorted_losses) + 1) / len(sorted_losses)
fig.add_trace(go.Scatter(x=sorted_losses, y=cdf, mode='lines', name='ECDF',
                         line=dict(color=WSJ_COLORS['blue'])), row=1, col=2)

freq_data = df.groupby('type').size().reset_index(name='count')
fig.add_trace(go.Bar(x=freq_data['type'], y=freq_data['count'],
                     marker_color=[WSJ_COLORS['blue'], WSJ_COLORS['orange'], WSJ_COLORS['red']]),
              row=2, col=1)

fig.add_trace(go.Table(
    header=dict(values=['Type', 'Count', 'Mean', 'Std', 'Min', 'Max'], align='left'),
    cells=dict(values=[stats.index, stats['count'],
                       [f'${x:,.0f}' for x in stats['mean']],
                       [f'${x:,.0f}' for x in stats['std']],
                       [f'${x:,.0f}' for x in stats['min']],
                       [f'${x:,.0f}' for x in stats['max']]], align='left'),
), row=2, col=2)

fig.update_layout(height=800, title_text=f'Loss Distribution Analysis ({N_SIMULATIONS:,} simulations)')
fig.update_xaxes(title_text='Loss Amount', row=1, col=1, tickformat='$.2s')
fig.update_xaxes(title_text='Loss Amount', row=1, col=2, tickformat='$.2s', type='log')
fig.show()

print(f"\nAverage annual loss: ${np.mean(all_losses):,.0f}")
print(f"95th percentile: ${np.percentile(all_losses, 95):,.0f}")
print(f"99th percentile: ${np.percentile(all_losses, 99):,.0f}")

## 2. Temporal Loss Patterns

How do annual losses evolve over time? This helps understand clustering and volatility.

In [None]:
N_YEARS_TEMPORAL = 20

yearly_data = []
for year in range(N_YEARS_TEMPORAL):
    events, year_stats = generator.generate_losses(duration=1.0, revenue=REVENUE)
    yearly_data.append({
        'year': year + 1,
        'total_loss': year_stats['total_amount'],
        'num_events': len(events),
        'attritional': sum(e.amount for e in events if e.loss_type == 'attritional'),
        'large': sum(e.amount for e in events if e.loss_type == 'large'),
        'catastrophic': sum(e.amount for e in events if e.loss_type == 'catastrophic'),
    })

temporal_df = pd.DataFrame(yearly_data)

fig = make_subplots(rows=3, cols=1, subplot_titles=(
    'Annual Total Losses', 'Loss Composition by Type', 'Cumulative Losses',
), row_heights=[0.35, 0.35, 0.3])

mean_loss = temporal_df['total_loss'].mean()
fig.add_trace(go.Bar(x=temporal_df['year'], y=temporal_df['total_loss'],
                     name='Total Loss', marker_color=WSJ_COLORS['blue']), row=1, col=1)
fig.add_hline(y=mean_loss, line_dash='dash', line_color=WSJ_COLORS['red'],
              annotation_text=f'Mean: ${mean_loss:,.0f}', row=1, col=1)

for col_name, color in [('attritional', WSJ_COLORS['light_blue']),
                         ('large', WSJ_COLORS['orange']),
                         ('catastrophic', WSJ_COLORS['red'])]:
    fig.add_trace(go.Bar(x=temporal_df['year'], y=temporal_df[col_name],
                         name=col_name.title(), marker_color=color), row=2, col=1)

temporal_df['cumulative'] = temporal_df['total_loss'].cumsum()
fig.add_trace(go.Scatter(x=temporal_df['year'], y=temporal_df['cumulative'],
                         mode='lines+markers', name='Cumulative',
                         line=dict(color=WSJ_COLORS['blue'], width=2)), row=3, col=1)

fig.update_layout(height=900, title_text=f'Temporal Loss Pattern Analysis ({N_YEARS_TEMPORAL} Years)',
                  barmode='stack')
fig.update_yaxes(title_text='Loss Amount', tickformat='$.2s')
fig.show()

print(f"\nMean annual loss: ${temporal_df['total_loss'].mean():,.0f}")
print(f"CV of annual losses: {temporal_df['total_loss'].std() / temporal_df['total_loss'].mean():.2f}")
print(f"Years with catastrophic losses: {(temporal_df['catastrophic'] > 0).sum()}")

## 3. Extreme Value Analysis

Return period analysis answers the question: *how large a loss should we expect once every N years?* This is critical for setting insurance limits.

In [None]:
N_EVA_SIMS = 10_000
annual_maxima = []
annual_totals = []

for _ in range(N_EVA_SIMS):
    events, year_stats = generator.generate_losses(duration=1.0, revenue=REVENUE)
    annual_maxima.append(max((e.amount for e in events), default=0))
    annual_totals.append(year_stats['total_amount'])

return_periods = [2, 5, 10, 20, 50, 100, 200, 500]

print('Return Period Analysis')
print('=' * 70)
print(f'{"Return Period":<15} {"Max Loss":<20} {"Total Loss":<20}')
print('-' * 70)
for rp in return_periods:
    pct = 100 * (1 - 1 / rp)
    max_val = np.percentile(annual_maxima, pct)
    total_val = np.percentile(annual_totals, pct)
    print(f'{rp:>10}-year  ${max_val:<19,.0f} ${total_val:<19,.0f}')

## 4. Correlation Between Loss Types

Understanding whether attritional and large losses tend to co-occur in the same year is important for aggregate limit selection.

In [None]:
N_CORR_YEARS = 100
corr_data = {'attritional_total': [], 'large_total': [], 'catastrophic_total': [], 'total_loss': []}

for year in range(N_CORR_YEARS):
    events, year_stats = generator.generate_losses(duration=1.0, revenue=REVENUE)
    corr_data['attritional_total'].append(sum(e.amount for e in events if e.loss_type == 'attritional'))
    corr_data['large_total'].append(sum(e.amount for e in events if e.loss_type == 'large'))
    corr_data['catastrophic_total'].append(sum(e.amount for e in events if e.loss_type == 'catastrophic'))
    corr_data['total_loss'].append(year_stats['total_amount'])

corr_df = pd.DataFrame(corr_data)
corr_matrix = corr_df.corr()

print('Correlation Matrix:')
print(corr_matrix.round(3))

print(f"\nAttritional-Large correlation: {corr_matrix.loc['attritional_total', 'large_total']:.3f}")
print(f"Years with catastrophic losses: {(corr_df['catastrophic_total'] > 0).sum()} / {N_CORR_YEARS}")
print(f"\nAverage contribution to total:")
print(f"  Attritional:  {100 * corr_df['attritional_total'].sum() / corr_df['total_loss'].sum():.1f}%")
print(f"  Large:        {100 * corr_df['large_total'].sum() / corr_df['total_loss'].sum():.1f}%")
print(f"  Catastrophic: {100 * corr_df['catastrophic_total'].sum() / corr_df['total_loss'].sum():.1f}%")

## 5. Optimal Retention Analysis

Given the loss distribution, what retention level minimizes total cost (retained losses + insurance premium)? This analysis finds the sweet spot.

In [None]:
retention_levels = np.logspace(5, 7, 20)  # $100K to $10M
N_RETENTION_SIMS = 1_000
results = []

for retention in retention_levels:
    retained_losses = []
    total_losses = []
    for _ in range(N_RETENTION_SIMS):
        events, year_stats = generator.generate_losses(duration=1.0, revenue=REVENUE)
        total = year_stats['total_amount']
        total_losses.append(total)
        retained_losses.append(min(total, retention))

    avg_retained = np.mean(retained_losses)
    expected_ceded = np.mean(total_losses) - avg_retained
    insurance_premium = expected_ceded * 1.5  # 150% loading

    results.append({
        'retention': retention,
        'avg_retained': avg_retained,
        'est_premium': insurance_premium,
        'total_cost': avg_retained + insurance_premium,
        'volatility_reduction': np.std(retained_losses) / np.std(total_losses),
    })

ret_df = pd.DataFrame(results)
optimal_idx = ret_df['total_cost'].idxmin()

fig = make_subplots(rows=1, cols=2, subplot_titles=('Total Cost vs Retention', 'Volatility Reduction'))

fig.add_trace(go.Scatter(x=ret_df['retention'], y=ret_df['total_cost'], mode='lines',
                         name='Total Cost', line=dict(color=WSJ_COLORS['blue'], width=2)), row=1, col=1)
fig.add_trace(go.Scatter(x=ret_df['retention'], y=ret_df['avg_retained'], mode='lines',
                         name='Retained Loss', line=dict(color=WSJ_COLORS['orange'], width=2, dash='dash')),
              row=1, col=1)
fig.add_trace(go.Scatter(x=ret_df['retention'], y=ret_df['est_premium'], mode='lines',
                         name='Insurance Premium', line=dict(color=WSJ_COLORS['green'], width=2, dash='dot')),
              row=1, col=1)

fig.add_trace(go.Scatter(x=ret_df['retention'], y=ret_df['volatility_reduction'] * 100,
                         mode='lines+markers', name='Volatility',
                         line=dict(color=WSJ_COLORS['red'], width=2)), row=1, col=2)

fig.update_layout(height=400, title_text='Optimal Retention Analysis')
fig.update_xaxes(title_text='Retention Level', type='log', tickformat='$.2s')
fig.update_yaxes(title_text='Cost', tickformat='$.2s', row=1, col=1)
fig.update_yaxes(title_text='Volatility (%)', row=1, col=2)
fig.show()

print(f"\nOptimal retention level: ${ret_df.loc[optimal_idx, 'retention']:,.0f}")
print(f"Total cost at optimum: ${ret_df.loc[optimal_idx, 'total_cost']:,.0f}")
print(f"Volatility reduction: {ret_df.loc[optimal_idx, 'volatility_reduction'] * 100:.1f}%")

## Key Takeaways

- Manufacturing losses come in three distinct types with very different frequency-severity profiles.
- Catastrophic losses are rare but dominate tail risk -- they drive the need for high insurance limits.
- Loss types show low correlation, meaning aggregate analysis overstates diversification benefits.
- Return-period analysis directly informs insurance limit selection.
- An optimal retention level exists that minimizes total cost of risk (retained losses + premium).

## Next Steps

- **Design insurance structures:** [core/02_insurance_structures.ipynb](02_insurance_structures.ipynb)
- **See how losses affect long-term growth:** [core/03_ergodic_advantage.ipynb](03_ergodic_advantage.ipynb)
- **Quantify tail risk:** [core/05_risk_metrics.ipynb](05_risk_metrics.ipynb)