# Bayesian Modelling for Loss Curves

This notebook aims to translate the rstan case study [Modelling Loss Curves in Insurance with RStan](https://mc-stan.org/users/documentation/case-studies/losscurves_casestudy.html) by Mick Cooney to [Bambi](https://bambinos.github.io/bambi/) in python.

## What is Bambi?

Bambi is a Python package that serves as an interface for specifying and running Bayesian models. If you're familiar with BRMS in R, you'll find Bambi to be a similar tool. It simplifies the process of fitting Bayesian multi-level models by allowing you to provide an equation (in a syntax similar to lmer), a likelihood family, priors, and data. Under the hood, Bambi utilizes [PyMC](https://www.pymc.io/) to handle the heavy lifting, making it ideal for quick prototyping, experimentation, and providing a more accessible introduction to probabilistic modeling compared to working directly with lower-level frameworks.





## What are Loss Curves?

When you purchase insurance from a company, you typically pay an upfront amount known as a premium, with the expectation that if you need to make a claim in the future, the company will provide a payout. On an individual policy basis, your claim may exceed the amount you paid in premiums, be less than that, or you may not make a claim at all. However, from the insurer's perspective, they are more concerned with the collective policies of a specific type. They aim to ensure that across all policies, the total amount claimed is less than the premiums charged, with the difference being sufficient to cover their costs and generate a profit.

One challenge insurers face is determining the amount of capital they need to reserve to cover potential claims. Since some claims may take years to materialize, it is crucial for insurers to understand how losses develop over time for different types of policies. For instance, a claim related to a car accident is likely to be reported, assessed, and paid quickly. On the other hand, something like medical malpractice may take time to become apparent, followed by complex investigations and legal proceedings, thereby extending the loss development period.

This is where loss curves prove to be a valuable tool. Loss curves enable insurers to estimate the time it takes for losses to develop and determine the ultimate cost of those losses. By analyzing historical data and industry trends, insurers can construct loss curves specific to each type of policy. These curves provide insights into the expected loss development pattern, helping insurers determine the amount of capital reserves necessary to fulfill their obligations for the policies they underwrite.

By utilizing loss curves, insurers can make informed decisions about risk management, capital allocation, and setting premiums. This proactive approach enables them to maintain financial stability, accurately assess their liabilities, and ensure they have sufficient reserves to meet their policyholders' needs.


## Translating to Bambi

This next section will follow the stan case study translating the model over to python and Bambi.

### Overview

The case study looks at two structures for the loss curves: Weibull model and log-logistic model.

\begin{equation*}
    \begin{align*}
        f(t) &= \frac{t^{\omega}}{t^{\omega} + \theta^{\omega}}  &\text(Weibull)\\
        f(t) &= 1 - \exp\left(-\left(\frac{t}{\omega}\right)^{\omega}\right)  &\text(Log-logistic)
    \end{align*}
\end{equation*}


### Data

The source of the data used is the [casact](http://www.casact.org) website. Unfortunately the data has been removed, but it is still accessible from the [`raw`](https://github.com/casact/raw_package/tree/master) package in R.



In [3]:
from pathlib import Path
import pandas as pd

folder_path = "data/"

df = pd.concat(
    [
        pd.read_csv(file_path).assign(lob=file_path.stem)
        for file_path in Path(folder_path).iterdir()
        if file_path.suffix == '.csv'
    ],
    ignore_index=True
)


In [7]:
claimdata_tbl = (
    df.pipe(lambda x: x.assign(
        acc_year=x['AccidentYear'].astype(str),
        dev_year=x['DevelopmentYear'],
        dev_lag=x['Lag'],
        premium=x['DirectEP'],
        cum_loss=x['CumulativePaid'],
        loss_ratio=x['CumulativePaid'] / x['DirectEP']
    ))
    .loc[:, ['GroupCode', 'Company', 'lob', 'acc_year', 'dev_year', 'dev_lag', 'premium', 'cum_loss', 'loss_ratio']]
)