# Car Insurance Risk Analysis

This notebook analyzes vehicle safety based on insurance claim data across different metrics:
- Collision: Cost index for collision claims
- Comprehensive (Comp): Cost index for comprehensive claims including theft
- DCPD (Direct Compensation Property Damage): Cost index for property damage claims
- AB (Accident Benefits): Frequency index for personal injury claims

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from typing import List

# Set style for better visualizations
plt.style.use('seaborn')
sns.set_palette('husl')

In [2]:
# Load and prepare the data
df = pd.read_csv('hcmu_e_2024.csv')

# Clean column names
df.columns = df.columns.str.strip()

# Display basic information about the dataset
print("Dataset Info:")
print(df.info())

print("\nSample of the data:")
print(df.head())

In [3]:
def calculate_safety_score(row):
    """Calculate overall safety score (lower is better)"""
    # Convert missing values to median for each column
    collision = row['Collision'] if pd.notna(row['Collision']) else df['Collision'].median()
    comp = row['Comp'] if pd.notna(row['Comp']) else df['Comp'].median()
    dcpd = row['DCPD'] if pd.notna(row['DCPD']) else df['DCPD'].median()
    ab = row['AB'] if pd.notna(row['AB']) else df['AB'].median()
    
    # Normalize and weight different factors
    # Higher weights for collision and accident benefits as they're more safety-critical
    weights = {
        'collision': 0.35,  # Collision severity
        'comp': 0.15,      # Comprehensive claims
        'dcpd': 0.20,      # Property damage
        'ab': 0.30         # Personal injury
    }
    
    return (
        collision * weights['collision'] +
        comp * weights['comp'] +
        dcpd * weights['dcpd'] +
        ab * weights['ab']
    )

# Calculate safety scores
df['safety_score'] = df.apply(calculate_safety_score, axis=1)

In [4]:
def get_top_safe_vehicles(df: pd.DataFrame, body_style: str, top_n: int = 5) -> pd.DataFrame:
    """Get the top N safest vehicles for a specific body style"""
    style_df = df[df['Body Style'] == body_style].copy()
    return (
        style_df
        .sort_values('safety_score')
        .head(top_n)
        [['Make', 'Model', 'Model Year', 'safety_score', 'Collision', 'Comp', 'DCPD', 'AB']]
    )

# Display top 5 safest vehicles for each body style
for style in df['Body Style'].unique():
    print(f"\nTop 5 Safest {style} Vehicles:")
    print(get_top_safe_vehicles(df, style))

In [5]:
# Visualization 1: Risk Metrics by Body Style
plt.figure(figsize=(12, 6))

metrics = ['Collision', 'Comp', 'DCPD', 'AB']
data_melted = df.melt(id_vars=['Body Style'], value_vars=metrics, var_name='Metric', value_name='Value')

sns.boxplot(x='Body Style', y='Value', hue='Metric', data=data_melted)
plt.title('Risk Metrics Distribution by Body Style')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [6]:
# Visualization 2: Safety Score Trends Over Years
plt.figure(figsize=(12, 6))

yearly_avg = df.groupby(['Model Year', 'Body Style'])['safety_score'].mean().unstack()

for style in yearly_avg.columns:
    plt.plot(yearly_avg.index, yearly_avg[style], marker='o', label=style)

plt.title('Average Safety Score Trends by Body Style')
plt.xlabel('Model Year')
plt.ylabel('Average Safety Score (lower is better)')
plt.legend(title='Body Style')
plt.grid(True)
plt.tight_layout()
plt.show()

In [7]:
# Analysis of Make Performance
make_analysis = df.groupby('Make').agg({
    'safety_score': 'mean',
    'Collision': 'mean',
    'Comp': 'mean',
    'DCPD': 'mean',
    'AB': 'mean',
    'Model': 'count'
}).round(2)

make_analysis = make_analysis.rename(columns={'Model': 'Number of Models'})
make_analysis = make_analysis.sort_values('safety_score')

print("Manufacturer Performance Analysis (Top 10):")
print(make_analysis.head(10))

## Key Findings

1. Safety Score Calculation:
   - Weighted combination of all risk factors
   - Higher weights assigned to collision and accident benefits
   - Lower scores indicate safer vehicles

2. Body Style Analysis:
   - Different body styles show distinct risk patterns
   - SUVs and 4-door vehicles generally show more consistent safety scores

3. Temporal Trends:
   - Safety scores generally improve in newer models
   - Some body styles show more improvement than others

4. Manufacturer Analysis:
   - Shows which manufacturers consistently produce safer vehicles
   - Considers the full range of models from each manufacturer