# Claims Reserving Analysis - Chain-Ladder Method

This notebook implements the Chain-Ladder method for claims reserving analysis, which is a fundamental technique in actuarial science for estimating future claim payments and calculating IBNR (Incurred But Not Reported) reserves.

## Overview
The Chain-Ladder method uses historical development patterns to project future claim payments based on:
1. Age-to-Age (ATA) development factors
2. Cumulative Development Factors (CDFs)
3. Ultimate claim projections
4. IBNR reserve calculations


## Step 1: Import Required Libraries


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# Set display options for better formatting
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

print("Libraries imported successfully!")


## Step 2: Load and Transform Data

Load the insurance claims data from CSV file and transform it into the format required for Chain-Ladder analysis. The original dataset contains policy-level information, which we'll transform to create:

- **Origin Year**: The year when the claim occurred (derived from subscription length)
- **Development Year**: The number of years since the claim occurred (simulated development pattern)
- **Paid Claims**: The cumulative amount paid for claims (calculated based on vehicle characteristics and claim status)

The transformation uses the actual claim status and vehicle characteristics to create realistic claims development patterns.


In [None]:
# Load the insurance claims data from CSV file
try:
    # Load the actual insurance dataset
    insurance_data = pd.read_csv('Insurance claims data.csv')
    print("Insurance data loaded successfully!")
    print(f"Shape of data: {insurance_data.shape}")
    print("\nFirst few rows:")
    print(insurance_data.head())
    print("\nData types:")
    print(insurance_data.dtypes)
    
    # Transform the insurance data into claims development format
    print("\nTransforming data for Chain-Ladder analysis...")
    
    # Set random seed for reproducibility
    np.random.seed(42)
    
    # Create simulated claims development data based on the insurance dataset
    # We'll use the claim_status and other features to create realistic development patterns
    
    # Get unique policy years (we'll simulate origin years based on subscription_length)
    # Convert subscription_length to origin years (assuming policies started in different years)
    max_subscription = insurance_data['subscription_length'].max()
    min_subscription = insurance_data['subscription_length'].min()
    
    # Create origin years based on subscription length (older policies = earlier origin years)
    insurance_data['Origin Year'] = 2023 - np.round(insurance_data['subscription_length']).astype(int)
    insurance_data['Origin Year'] = insurance_data['Origin Year'].clip(lower=2018, upper=2023)
    
    # Create claims development data
    claims_data = []
    
    # For each origin year, create development patterns
    for origin_year in sorted(insurance_data['Origin Year'].unique()):
        # Get policies for this origin year
        year_policies = insurance_data[insurance_data['Origin Year'] == origin_year]
        
        # Calculate base claim amounts based on vehicle characteristics
        # Higher claim amounts for older vehicles, certain fuel types, etc.
        base_claims = []
        for _, policy in year_policies.iterrows():
            if policy['claim_status'] == 1:  # Only for policies with claims
                # Base claim amount influenced by vehicle characteristics
                base_amount = 50000  # Base amount
                
                # Adjust based on vehicle age (older = higher claims)
                vehicle_age_factor = 1 + (policy['vehicle_age'] * 0.2)
                
                # Adjust based on customer age (older customers might have higher claims)
                customer_age_factor = 1 + (policy['customer_age'] / 100)
                
                # Adjust based on fuel type
                fuel_factor = 1.2 if policy['fuel_type'] == 'Diesel' else 1.0
                
                # Adjust based on NCAP rating (lower rating = higher claims)
                ncap_factor = 1.5 - (policy['ncap_rating'] * 0.1) if policy['ncap_rating'] > 0 else 1.3
                
                # Calculate final base claim amount
                final_amount = base_amount * vehicle_age_factor * customer_age_factor * fuel_factor * ncap_factor
                base_claims.append(final_amount)
        
        # If no claims for this year, create some simulated claims
        if len(base_claims) == 0:
            num_claims = np.random.randint(5, 15)  # Random number of claims
            base_claims = [np.random.uniform(30000, 150000) for _ in range(num_claims)]
        
        # Create development pattern for each claim
        for base_claim in base_claims:
            # Simulate development over 6 years
            for dev_year in range(1, 7):
                # Claims develop over time with some randomness
                development_factor = 1 + (dev_year - 1) * 0.25 + np.random.normal(0, 0.1)
                development_factor = max(0.8, development_factor)  # Ensure reasonable bounds
                
                paid_claims = base_claim * development_factor
                
                claims_data.append({
                    'Origin Year': origin_year,
                    'Development Year': dev_year,
                    'Paid Claims': max(0, paid_claims)
                })
    
    claims_data = pd.DataFrame(claims_data)
    print("Claims development data created successfully!")
    print(f"Shape of claims data: {claims_data.shape}")
    print("\nFirst few rows of claims data:")
    print(claims_data.head())
    
    # Show summary statistics
    print(f"\nSummary of transformed data:")
    print(f"Origin years: {sorted(claims_data['Origin Year'].unique())}")
    print(f"Development years: {sorted(claims_data['Development Year'].unique())}")
    print(f"Total claims amount: ${claims_data['Paid Claims'].sum():,.0f}")
    
except FileNotFoundError:
    print("Insurance claims data.csv not found. Creating sample data for demonstration...")
    # Create sample data for demonstration
    np.random.seed(42)
    
    # Generate sample claims data
    origin_years = [2018, 2019, 2020, 2021, 2022, 2023]
    development_years = list(range(1, 7))  # 1 to 6 years development
    
    data = []
    for origin in origin_years:
        for dev in development_years:
            # Simulate realistic claims development pattern
            base_claim = np.random.uniform(100000, 500000)
            development_factor = 1 + (dev - 1) * 0.3 + np.random.normal(0, 0.1)
            paid_claims = base_claim * development_factor
            
            data.append({
                'Origin Year': origin,
                'Development Year': dev,
                'Paid Claims': max(0, paid_claims)  # Ensure non-negative
            })
    
    claims_data = pd.DataFrame(data)
    print("Sample data created successfully!")
    print(f"Shape of data: {claims_data.shape}")
    print("\nFirst few rows:")
    print(claims_data.head())


## Step 3: Create Loss Triangle

Transform the data into a cumulative loss triangle using pivot_table. This creates a matrix where:
- Rows represent origin years
- Columns represent development years
- Values represent cumulative paid claims


In [None]:
# Create the cumulative loss triangle using pivot_table
triangle = claims_data.pivot_table(
    index='Origin Year',
    columns='Development Year',
    values='Paid Claims',
    aggfunc='sum',
    fill_value=0
)

print("Cumulative Loss Triangle:")
print("=" * 50)
print(triangle)

# Display triangle with proper formatting
print("\nFormatted Triangle (with commas):")
formatted_triangle = triangle.copy()
for col in formatted_triangle.columns:
    formatted_triangle[col] = formatted_triangle[col].apply(lambda x: f"{x:,.0f}" if x > 0 else "-")
print(formatted_triangle)


## Step 4: Calculate Age-to-Age (ATA) Development Factors

ATA factors measure the ratio of cumulative claims from one development period to the next. They indicate how claims develop over time.


In [None]:
# Calculate Age-to-Age (ATA) development factors
ata_factors = pd.DataFrame(index=triangle.index, columns=triangle.columns[1:])

# Calculate ATA factors for each development period
for dev_year in triangle.columns[1:]:
    prev_dev_year = dev_year - 1
    
    # Calculate ATA factor: current period / previous period
    # Only calculate where both periods have non-zero values
    ata_factors[dev_year] = np.where(
        (triangle[prev_dev_year] > 0) & (triangle[dev_year] > 0),
        triangle[dev_year] / triangle[prev_dev_year],
        np.nan
    )

print("Age-to-Age (ATA) Development Factors:")
print("=" * 50)
print(ata_factors.round(4))

# Display ATA factors with better formatting
print("\nFormatted ATA Factors:")
formatted_ata = ata_factors.copy()
for col in formatted_ata.columns:
    formatted_ata[col] = formatted_ata[col].apply(
        lambda x: f"{x:.4f}" if not pd.isna(x) else "-"
    )
print(formatted_ata)


## Step 5: Select Average Development Factors

Calculate the simple average of ATA factors for each development period to get the selected development factors. Add a tail factor of 1.0 for ultimate development.


In [None]:
# Calculate selected development factors (simple average of ATA factors)
selected_factors = ata_factors.mean()

# Add tail factor of 1.0 (assuming no further development beyond the last period)
max_dev_year = max(triangle.columns)
selected_factors[max_dev_year + 1] = 1.0

print("Selected Development Factors:")
print("=" * 40)
for dev_year, factor in selected_factors.items():
    if dev_year <= max_dev_year:
        print(f"Development {dev_year-1} to {dev_year}: {factor:.4f}")
    else:
        print(f"Tail Factor (Ultimate): {factor:.4f}")

# Create a summary DataFrame
factors_summary = pd.DataFrame({
    'Development Period': [f"{int(dev-1)} to {int(dev)}" for dev in selected_factors.index[:-1]] + ['Ultimate'],
    'Selected Factor': selected_factors.values
})

print("\nFactors Summary:")
print(factors_summary)


## Step 6: Calculate Cumulative Development Factors (CDFs)

CDFs are used to project claims from their current development level to ultimate. They are calculated by multiplying the selected development factors from the current period to ultimate.


In [None]:
# Calculate Cumulative Development Factors (CDFs)
cdfs = pd.Series(index=triangle.columns, dtype=float)

# Calculate CDF for each development period
for dev_year in triangle.columns:
    # CDF is the product of all selected factors from current period to ultimate
    cdf = 1.0
    
    # Multiply factors from current development year to ultimate
    for future_dev in range(dev_year + 1, max_dev_year + 2):
        if future_dev in selected_factors.index:
            cdf *= selected_factors[future_dev]
    
    cdfs[dev_year] = cdf

print("Cumulative Development Factors (CDFs):")
print("=" * 45)
for dev_year, cdf in cdfs.items():
    print(f"Development Year {dev_year}: {cdf:.4f}")

# Create CDF summary DataFrame
cdf_summary = pd.DataFrame({
    'Development Year': cdfs.index,
    'CDF': cdfs.values
})

print("\nCDF Summary:")
print(cdf_summary)


## Step 7: Project Ultimate Claims

Extract the latest claims diagonal from the triangle and apply the CDFs to calculate the ultimate claims for each origin year.


In [None]:
# Extract latest claims diagonal (most recent development for each origin year)
latest_claims = pd.Series(index=triangle.index, dtype=float)

for origin_year in triangle.index:
    # Find the latest non-zero development year for this origin year
    latest_dev = None
    for dev_year in reversed(triangle.columns):
        if triangle.loc[origin_year, dev_year] > 0:
            latest_dev = dev_year
            break
    
    if latest_dev is not None:
        latest_claims[origin_year] = triangle.loc[origin_year, latest_dev]
    else:
        latest_claims[origin_year] = 0

print("Latest Claims Diagonal:")
print("=" * 30)
for origin_year, claims in latest_claims.items():
    print(f"Origin Year {origin_year}: ${claims:,.0f}")

# Calculate ultimate claims by applying CDFs
ultimate_claims = pd.Series(index=triangle.index, dtype=float)

for origin_year in triangle.index:
    # Find the latest development year for this origin year
    latest_dev = None
    for dev_year in reversed(triangle.columns):
        if triangle.loc[origin_year, dev_year] > 0:
            latest_dev = dev_year
            break
    
    if latest_dev is not None:
        # Ultimate = Latest Claims × CDF for that development period
        ultimate_claims[origin_year] = latest_claims[origin_year] * cdfs[latest_dev]
    else:
        ultimate_claims[origin_year] = 0

print("\nUltimate Claims Projections:")
print("=" * 35)
for origin_year, claims in ultimate_claims.items():
    print(f"Origin Year {origin_year}: ${claims:,.0f}")


## Step 8: Calculate IBNR Reserves

IBNR (Incurred But Not Reported) reserves represent the difference between ultimate claims and claims already paid.


In [None]:
# Calculate IBNR reserves
ibnr_reserves = ultimate_claims - latest_claims

print("IBNR Reserve Calculations:")
print("=" * 30)
for origin_year in triangle.index:
    latest = latest_claims[origin_year]
    ultimate = ultimate_claims[origin_year]
    ibnr = ibnr_reserves[origin_year]
    print(f"Origin Year {origin_year}:")
    print(f"  Latest Paid: ${latest:,.0f}")
    print(f"  Ultimate:    ${ultimate:,.0f}")
    print(f"  IBNR:        ${ibnr:,.0f}")
    print()

# Calculate total IBNR reserve
total_ibnr = ibnr_reserves.sum()
print(f"Total IBNR Reserve: ${total_ibnr:,.0f}")


## Step 9: Display Comprehensive Results

Create a summary table showing all key metrics for each origin year.


In [None]:
# Create comprehensive results summary
results_summary = pd.DataFrame({
    'Origin Year': triangle.index,
    'Latest Paid Claims': latest_claims.values,
    'Ultimate Claims': ultimate_claims.values,
    'IBNR Reserve': ibnr_reserves.values
})

# Add percentage of ultimate that is IBNR
results_summary['IBNR % of Ultimate'] = (
    results_summary['IBNR Reserve'] / results_summary['Ultimate Claims'] * 100
)

# Format the summary for display
display_summary = results_summary.copy()
display_summary['Latest Paid Claims'] = display_summary['Latest Paid Claims'].apply(lambda x: f"${x:,.0f}")
display_summary['Ultimate Claims'] = display_summary['Ultimate Claims'].apply(lambda x: f"${x:,.0f}")
display_summary['IBNR Reserve'] = display_summary['IBNR Reserve'].apply(lambda x: f"${x:,.0f}")
display_summary['IBNR % of Ultimate'] = display_summary['IBNR % of Ultimate'].apply(lambda x: f"{x:.1f}%")

print("COMPREHENSIVE RESULTS SUMMARY")
print("=" * 50)
print(display_summary.to_string(index=False))

print("\n" + "=" * 50)
print(f"TOTAL IBNR RESERVE: ${total_ibnr:,.0f}")
print(f"TOTAL ULTIMATE CLAIMS: ${ultimate_claims.sum():,.0f}")
print(f"TOTAL PAID CLAIMS: ${latest_claims.sum():,.0f}")
print(f"OVERALL IBNR %: {(total_ibnr / ultimate_claims.sum() * 100):.1f}%")
print("=" * 50)


## Step 10: Visualization - IBNR Reserves by Origin Year

Create a bar chart showing the IBNR reserve for each origin year.


In [None]:
# Create visualization for IBNR reserves
plt.figure(figsize=(12, 8))

# Create bar chart
bars = plt.bar(results_summary['Origin Year'], results_summary['IBNR Reserve'], 
               color='steelblue', alpha=0.7, edgecolor='navy', linewidth=1.5)

# Customize the chart
plt.title('IBNR Reserves by Origin Year', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Origin Year', fontsize=12, fontweight='bold')
plt.ylabel('IBNR Reserve ($)', fontsize=12, fontweight='bold')

# Format y-axis to show values in thousands or millions
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K' if x < 1000000 else f'${x/1000000:.1f}M'))

# Add value labels on top of bars
for bar, value in zip(bars, results_summary['IBNR Reserve']):
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height + height*0.01,
             f'${value:,.0f}', ha='center', va='bottom', fontweight='bold')

# Add grid for better readability
plt.grid(axis='y', alpha=0.3, linestyle='--')

# Adjust layout and display
plt.tight_layout()
plt.show()

# Create additional visualization showing the development pattern
plt.figure(figsize=(14, 8))

# Plot the loss triangle development pattern
for origin_year in triangle.index:
    # Get non-zero values for this origin year
    dev_years = []
    claims_values = []
    
    for dev_year in triangle.columns:
        if triangle.loc[origin_year, dev_year] > 0:
            dev_years.append(dev_year)
            claims_values.append(triangle.loc[origin_year, dev_year])
    
    if dev_years:  # Only plot if there are values
        plt.plot(dev_years, claims_values, marker='o', linewidth=2, 
                label=f'Origin {origin_year}', markersize=6)

# Add ultimate claims as horizontal lines
for origin_year in triangle.index:
    if ultimate_claims[origin_year] > 0:
        max_dev = max([dev for dev in triangle.columns if triangle.loc[origin_year, dev] > 0])
        plt.axhline(y=ultimate_claims[origin_year], xmin=0, xmax=1, 
                   color=plt.gca().lines[-1].get_color(), linestyle='--', alpha=0.5)

plt.title('Claims Development Pattern and Ultimate Projections', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Development Year', fontsize=12, fontweight='bold')
plt.ylabel('Cumulative Paid Claims ($)', fontsize=12, fontweight='bold')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K' if x < 1000000 else f'${x/1000000:.1f}M'))

plt.tight_layout()
plt.show()


## Summary and Conclusions

This Chain-Ladder analysis has successfully:

1. **Loaded and processed** the claims data into a cumulative loss triangle
2. **Calculated ATA factors** to understand claims development patterns
3. **Selected average development factors** for projection purposes
4. **Computed CDFs** to project claims to ultimate values
5. **Projected ultimate claims** for each origin year
6. **Calculated IBNR reserves** representing future claim payments
7. **Visualized results** to aid in interpretation

### Key Insights:
- The total IBNR reserve represents the estimated future claim payments
- This analysis provides a foundation for financial planning and regulatory reporting

### Next Steps:
- Consider additional methods (Bornhuetter-Ferguson, Cape Cod) for comparison
- Perform sensitivity analysis on development factors
- Update analysis as new data becomes available
- Consider external factors that might affect future development patterns
