
# Assignment 2 – Fit & Interpret Probability Distributions for Claim Severity

**Course:** Data Analytics for Actuarial Science  
**Week:** 5  
**Dataset:** `claim_severity.csv` (column: `claim_amount`)  
**Deliverables:** Notebook + brief PDF summary + figures

**Objectives**
- Diagnose heavy-tailed behavior in claim severities
- Fit Lognormal, Gamma, Weibull, Pareto via MLE
- Compare models (AIC/BIC, QQ, KS/AD where applicable)
- Compute VaR/TVaR and interpret actuarially


## 0. Setup & Load

In [None]:

import numpy as np, pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as st


# Load dataset
df = pd.read_csv('claim_severity.csv')  # adjust path if needed
x = df['claim_amount'].astype(float).values
n = x.size
print('n =', n)
df.head()

## Part A — Exploratory Data Analysis (EDA)

In [None]:

# A1. Summary stats
s = pd.Series(x)
# Make summary stat contained n, mean, median, std, cv, skew and kurtosis
summary = {
# please fill with necessary code
}
summary

> **Discuss the summary stats!**

In [None]:

# A2. Plots
# Histogram (density) on log-x
# please fill with necessary code


# Empirical CDF
# please fill with necessary code

# Mean Excess Plot
# please fill with necessary code


> **Write 2–3 sentences interpreting tail heaviness and log-scale behavior.**

## Part B — Parametric Fits via MLE

In [None]:

# B1. Fit Lognormal, Gamma, Weibull, Pareto
ln_shape, ln_loc, ln_scale = st.lognorm.fit(x, floc=0)        # Lognormal
                                                              # Gamma (please fill with necessary code)
                                                              # Weibull (please fill with necessary code)
                                                              # Pareto (please fill with necessary code)

#extract the estimated parameters
#please fill with necessary code
params = {
    "Lognormal": {},
    "Gamma": {},
    "Weibull": {},
    "Pareto": {}
}
params

> **Discuss parametric fitting process (Theories and methods) for each distribution!**

In [None]:

# B3. Overlay fitted densities (log-x)
#please fill with necessary code
xx = np.linspace(x.min(), x.max(), 500)
plt.figure()
plt.hist()
plt.plot( , label='Lognormal')
plt.plot( , label='Gamma')
plt.plot( , label='Weibull')
plt.plot( , label='Pareto')
plt.xscale('log')
plt.title('Fitted PDFs (log-x)')
plt.xlabel('claim_amount'); plt.ylabel('density')
plt.legend()
plt.show()

> **Discuss all figures and recommend the best visual fit distribution!**

## Part C — Model Adequacy & Selection

In [None]:

def aic(ll, k): return 2*k - 2*ll
def bic(ll, k, n): return np.log(n)*k - 2*ll

ll_ln = np.sum())#log-likelihood for Lognormal dist, please fill with necessary code
ll_ga = np.sum() #log-likelihood for Gamma, dist please fill with necessary code
ll_wb = np.sum() #log-likelihood for Weibull dist, please fill with necessary code
ll_pa = np.sum(st.pareto.logpdf(x, pa_b, pa_loc, pa_scale)) #log-likelihood for Pareto  dist

k_ln = k_ga = k_wb = k_pa = 2

# display diagnostic AIC/BIC results
info = pd.DataFrame({
    "model": ["Lognormal","Gamma","Weibull","Pareto"],
    "loglik": [ll_ln, ll_ga, ll_wb, ll_pa],
    "AIC": [aic(ll_ln,k_ln), aic(ll_ga,k_ga), aic(ll_wb,k_wb), aic(ll_pa,k_pa)],
    "BIC": [bic(ll_ln,k_ln,n), bic(ll_ga,k_ga,n), bic(ll_wb,k_wb,n), bic(ll_pa,k_pa,n)]
}).sort_values(["AIC","BIC"])
info
# display diagnostic QQ/KS/AD (QQ-plot, Kolmogorov-Smirnov and Andersen-Darling)
# please fill with necessary code


> **Discuss diagnostics AIC/BIC, compare with QQ/KS/AD and select a preferred model with justification.**

## Part D — Tail Risk & Interpretation

In [None]:

from scipy.stats import norm

# Choose your preferred model (edit this string after selection)
preferred = "Lognormal"

def var_lognorm(p, mu, sigma):
    return np.exp(mu + sigma*norm.ppf(p))

def tvar_lognorm(p, mu, sigma):
    num = 1 - norm.cdf(norm.ppf(p) - sigma)
    return np.exp(mu + 0.5*sigma**2) * (num / (1-p))

if preferred == "Lognormal":
    mu_hat = float(np.log(ln_scale))
    sigma_hat = float(ln_shape)
    for p in [0.95, 0.99]:
        print(p, "VaR", var_lognorm(p, mu_hat, sigma_hat), "TVaR", tvar_lognorm(p, mu_hat, sigma_hat))
else:
    print("Implement VaR/TVaR for your chosen model.")

> **Interpret VaR/TVaR for pricing and capital (1–2 paragraphs).**

## Part E — Sensitivity Check (Top 1% Trim)

In [None]:

cut = np.quantile(x, 0.99)
x_trim = x[x <= cut]

ln_shape_t, ln_loc_t, ln_scale_t = st.lognorm.fit(x_trim, floc=0)
mu_t, sigma_t = float(np.log(ln_scale_t)), float(ln_shape_t)

print({'Base_sigma': float(ln_shape), 'Trim_sigma': sigma_t})


> **Comment on robustness to extremes and implications for pricing/reserving.**

## Bonus (+5 pts)
Fit a truncated Lognormal or mixture (e.g., 2-component Lognormal) and compare AIC/BIC.

##Grading Rubric (100 pts)
• EDA: 10

• Fits: 30

• Selection: 30

• Tail risk: 20

• Sensitivity: 10

• Bonus: +5