## Applying Stochastic Methods
### Getting Started
This tutorial focuses on using stochastic methods to estimate ultimates. 

Note that a lot of the examples shown here might not be applicable in a real world scenario, and is only meant to demonstrate some of the functionalities included in the package. The user should always exercise their best actuarial judgement, and follow any applicable laws, the Code of Professional Conduct, and applicable Actuarial Standards of Practice.

Be sure to make sure your packages are updated. For more info on how to update your pakages, visit [Keeping Packages Updated](https://chainladder-python.readthedocs.io/en/latest/install.html#keeping-packages-updated).

In [1]:
# import pandas as pd
# import numpy as np
# import chainladder as cl
# import seaborn as sns
# sns.set_style('whitegrid')
# %matplotlib inline
# print('chainladder:' + cl.__version__)
# print('pandas:' + pd.__version__)

# Black linter, optional
%load_ext lab_black

import pandas as pd
import numpy as np
import chainladder as cl
import matplotlib.pyplot as plt
import statsmodels.api as sm
import os

%matplotlib inline

print("pandas: " + pd.__version__)
print("numpy: " + np.__version__)
print("chainladder: " + cl.__version__)

pandas: 1.3.2
numpy: 1.20.3
chainladder: 0.8.8


### Intro to MackChainladder

Like the basic `Chainladder` method, the `MackChainladder` is entirely specified by its selected development pattern. In fact, it is the basic `Chainladder`, but with extra features.

In [2]:
clrd = (
    cl.load_sample("clrd")
    .groupby("LOB")
    .sum()
    .loc["wkcomp", ["CumPaidLoss", "EarnedPremNet"]]
)

cl.Chainladder().fit(clrd["CumPaidLoss"]).ultimate_ == cl.MackChainladder().fit(
    clrd["CumPaidLoss"]
).ultimate_

True

Let's create a Mack's Chainladder model.

In [3]:
mack = cl.MackChainladder().fit(clrd["CumPaidLoss"])

MackChainladder has the following additional fitted features that the deterministic `Chainladder` does not:

- `full_std_err_`:  The full standard error
- `total_process_risk_`: The total process error
- `total_parameter_risk_`: The total parameter error
- `mack_std_err_`: The total prediction error by origin period
- `total_mack_std_err_`: The total prediction error across all origin periods

Notice these are all measures of uncertainty, but where can they be applied? Let's start by examining the `link_ratios` underlying the triangle between age 12 and 24.

In [4]:
clrd_first_lags = clrd[clrd.development <= 24][clrd.origin < "1997"]["CumPaidLoss"]
clrd_first_lags

Unnamed: 0,12,24
1988,285804,638532
1989,307720,684140
1990,320124,757479
1991,347417,793749
1992,342982,781402
1993,342385,743433
1994,351060,750392
1995,343841,768575
1996,381484,736040


A simple average link-ratio can be directly computed.

In [5]:
clrd_first_lags.link_ratio.to_frame().mean()[0]

2.2066789527531494

We can also verify that the result is the same as the `Development` object.

In [6]:
cl.Development(average="simple").fit(clrd["CumPaidLoss"]).ldf_.to_frame().values[0, 0]

2.2066789527531494

### The Linear Regression Framework

Mack noted that the estimate for the LDF is really just a linear regression fit. In the case of using the `simple` average, it is a weighted regression where the weight is set to $\left (\frac{1}{X}  \right )^{2}$.

Take a look at the fitted coefficient in the next cell and verify that it ties to the direct calculations above.
With the regression framework in hand, we get much more information about our LDF estimate than just the coefficient.

In [9]:
y = clrd_first_lags.to_frame().values[:, 1]
X = clrd_first_lags.to_frame().values[:, 0]

model = sm.WLS(y, X, weights=(1 / X) ** 2)
results = model.fit()
results.summary()

  "anyway, n=%i" % int(n))


0,1,2,3
Dep. Variable:,y,R-squared (uncentered):,0.997
Model:,WLS,Adj. R-squared (uncentered):,0.997
Method:,Least Squares,F-statistic:,2887.0
Date:,"Wed, 01 Sep 2021",Prob (F-statistic):,1.6e-11
Time:,22:43:27,Log-Likelihood:,-107.89
No. Observations:,9,AIC:,217.8
Df Residuals:,8,BIC:,218.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
x1,2.2067,0.041,53.735,0.000,2.112,2.301

0,1,2,3
Omnibus:,7.448,Durbin-Watson:,1.177
Prob(Omnibus):,0.024,Jarque-Bera (JB):,2.533
Skew:,-1.187,Prob(JB):,0.282
Kurtosis:,4.058,Cond. No.,1.0


By toggling the weights of our regression, we can handle the most common types of averaging used in picking loss development factors.

In [None]:
print('Does this work for simple?')
print(round(cl.Development(average='simple').fit(tri_first_lags).ldf_.to_frame().values[0, 0], 8) == \
      round(sm.WLS(y, X, weights=(1/X)**2).fit().params[0],8))
print('Does this work for volume-weighted average?')
print(round(cl.Development(average='volume').fit(tri_first_lags).ldf_.to_frame().values[0, 0], 8) == \
      round(sm.WLS(y, X, weights=(1/X)).fit().params[0],8))
print('Does this work for regression average?')
print(round(cl.Development(average='regression').fit(tri_first_lags).ldf_.to_frame().values[0, 0], 8) == \
      round(sm.OLS(y, X).fit().params[0],8))

This regression framework is what the `Development` estimator uses to set development patterns.  Although we discard the information in deterministic approaches, `Development` has two useful statistics for estimating reserve variability, both of which come from the regression framework.  The stastics are `std_err_` and `sigma_` and they are used by the `MackChainladder` estimator to determine the prediction error of our reserves.

In [None]:
dev = cl.Development(average='simple').fit(tri['CumPaidLoss'])

In [None]:
dev.std_err_

In [None]:
dev.sigma_

Since the regression framework is weighted, we can easily turn on/off any observation we want using the dropping capabilities of the `Development` estimator.  Dropping link ratios not only affects the `ldf_` and `cdf_`, but also the `std_err_` and `sigma` of the regression.

Here we eliminate the 1988 valuation from our triangle, which is identical to eliminating the first observation from our 12-24 regression fit.

In [None]:
print('Does this work for dropping observations?')
print(round(cl.Development(average='volume', drop_valuation='1988') \
              .fit(tri['CumPaidLoss']).std_err_.to_frame().values[0, 0], 8) == \
      round(sm.WLS(y[1:], X[1:], weights=(1/X[1:])).fit().bse[0],8))

With `sigma_` and `std_err_` in hand, Mack goes on to develop recursive formulas to estimate `parameter_risk_` and `process_risk_`.

In [None]:
mack.parameter_risk_

### Assumption of Independence
The Mack model makes a lot of assumptions about independence (i.e. covariance between random processes is 0).  This means many of the Variance estimates in the `MackChainladder` model follow the form of $Var(A+B) = Var(A)+Var(B)$.

Notice the square of `mack_std_err_` is simply the sum of the sqaures of `parameter_risk_` and `process_risk_`.

In [None]:
print('Parameter risk and process risk are independent?')
print(round(mack.mack_std_err_**2, 4) == round(mack.parameter_risk_**2 + mack.process_risk_**2, 4))

This independence assumption applies to variance of each origin period.

In [None]:
print('Total Parameter and process risk across origin periods is independent?')
print(round(mack.total_process_risk_**2, 4) == round((mack.process_risk_**2).sum('origin'), 4))

Independence is also assumed to apply to the overall standard error of reserves, `total_mack_std_err_`.

In [None]:
(mack.total_process_risk_**2 + mack.total_parameter_risk_**2).to_frame().values[0, -1] == \
(mack.total_mack_std_err_**2).values[0,0]

This over-reliance on independence is one of the weaknesses of the `MackChainladder` method. Nevertheless, if the data align with this assumption, then `total_mack_std_err_` is a reasonable esimator of reserve variability.

### Mack Reserve Variability
The `mack_std_err_` at ultimate is the reserve variability for each `origin` period.

In [None]:
mack.mack_std_err_[mack.mack_std_err_.development==mack.mack_std_err_.development.max()]

These are probably easier to see in the `summary_` of the `MackChainladder` model.

In [None]:
mack.summary_

In [None]:
plot_data = mack.summary_.to_frame()
g = plot_data[['Latest', 'IBNR']] \
    .plot(kind='bar', stacked=True,
          yerr=pd.DataFrame({'latest': plot_data['Mack Std Err']*0,
                             'IBNR': plot_data['Mack Std Err']}),
          ylim=(0, None), title='Mack Chainladder Ultimate')
g.set_xlabel('Accident Year')
g.set_ylabel('Loss');

In [None]:
dist = pd.Series(np.random.normal(mack.ibnr_.sum(),
                           mack.total_mack_std_err_.values[0, 0], size=10000))
dist.plot(
    kind='hist', bins=50,
    title="Normally distributed IBNR estimate with a mean of " + '{:,}'.format(round(mack.ibnr_.sum(),0))[:-2]);

### ODP Bootstrap Model

The `MackChainladder` focused on a regression framework for determining the variability of reserve estimates.  An alternative approach is to use statistical bootstrapping or sampling from a triangle with replacement to simulate new triangles.

Bootstrapping imposes less model constraints than the `MackChainladder` which allows for greater applicability in different scenarios.  Sampling new triangles can be accomplished through the `BootstrapODPSample` estimator.  This estimator will take a single triangle and simulate new ones from it.

Notice how easy it is to simulate 10,000 new triangles from an existing triangle by accessing the `resampled_triangles_` attribute.

In [None]:
samples = cl.BootstrapODPSample(n_sims=10000).fit(tri['CumPaidLoss']).resampled_triangles_

Alternatively, we could use `BootstrapODPSample` to transform our triangle into a resampled set.

In [None]:
samples = cl.BootstrapODPSample(n_sims=10000).fit_transform(tri['CumPaidLoss'])

The notion of the ODP Bootstrap is that as our simulations approach infinity, we should expect our mean simulation to converge on the basic `Chainladder` estimate of of reserves.

Let's apply the basic chainladder to our original triangle and also to our simulated triangles to see whether this holds true.

In [None]:
difference = round(1 - cl.Chainladder().fit(samples).ibnr_.sum('origin').mean() / \
                       cl.Chainladder().fit(tri['CumPaidLoss']).ibnr_.sum())
print("Percentage difference in estimate using original triangle and BootstrapODPSample is " +str(difference))

### Using deterministic methods with Bootstrap samples
Our `samples` is just another triangle object with all the functionality of a regular triangle.  This means we can apply any functionality we want to our `samples` including any deterministic methods we learned about previously.

In [None]:
samples

In [None]:
pipe = cl.Pipeline([
    ('dev', cl.Development(average='simple')),
    ('tail', cl.TailConstant(1.05))])
pipe.fit(samples)

Now instead of a single `cdf_` vector, we have 10,000.

In [None]:
pipe.named_steps.dev.cdf_

This allows us to look at the varibility of any fitted property used in our prior tutorials.

In [None]:
orig_dev = cl.Development(average='simple').fit(tri['CumPaidLoss'])
resampled_ldf = pipe.named_steps.dev.ldf_
print("12-24 LDF of original Triangle: " + str(round(orig_dev.ldf_.values[0,0,0,0],4)))
pd.Series(resampled_ldf.values[:, 0, 0, 0]).plot(
    kind='hist', bins=100,
    title='Age 12-14 LDF distribution using Bootstrap');

### Comparison between Bootstrap and Mack
We should even be able to approximate some of the Mack parameters calculated using the regression framework.

In [None]:
mack_vs_bs = resampled_ldf.std('index').to_frame().append(
    orig_dev.std_err_.to_frame()).T
mack_vs_bs.columns = ['Mack', 'Bootstrap']
mack_vs_bs.plot(kind='bar', title='Mack Regression Framework LDF Std Err\nvs\nBootstrap Simulated LDF Std Err');

While the `MackChainladder` produces statistics about the mean and variance of reserve estimates, those have to be fit to a distribution using MLE, MoM, etc to see the range of outcomes of reserves.  With `BootstrapODPSample` based fits, we can use the empirical distribution directly if we choose to.

In [None]:
ibnr = cl.Chainladder().fit(samples).ibnr_.sum('origin')
ibnr_99 = ibnr.quantile(q=0.99)
print("99%-ile of reserve estimate is " +'{:0,}'.format(round(ibnr_99,0)))

Let's see how the `MackChainladder` reserve distribution compares to the `BootstrapODPSample` reserve distribution.

In [None]:
ax = ibnr.plot(kind='hist', bins=50, alpha=0.7, color='green').plot()
dist.plot(kind='hist', bins=50, alpha=0.4, color='blue', title='Mack vs Bootstrap Variability');

### Expected loss methods with Bootstrap

So far, we've only applied the multiplicative methods (i.e. basic chainladder) in a stochastic context.  It is possible to use an expected loss method like the `BornhuetterFerguson`. 

To do this, we will need an exposure vector.

In [None]:
tri['EarnedPremNet'].latest_diagonal

Passing an `apriori_sigma` to the `BornhuetterFerguson` estimator tells it to consider the apriori selection itself as a random variable.  Fitting a stochastic `BornhuetterFerguson` looks very much like the determinsitic version.

In [None]:
import chainladder as cl
import numpy as np
clrd = cl.load_sample('clrd')
np.prod(clrd, axis=3)

In [None]:
import numpy as np

np.tan(cl.load_sample('raa')).T.plot()

We can use our knowledge of `Triangle` manipulation to grab most things we would want out of our model.

In [None]:
# Grab completed triangle replacing simulated known data with actual known data
full_triangle = bf.full_triangle_ - bf.X_ + tri['CumPaidLoss']
# Limiting to the current year for plotting
current_year = full_triangle[full_triangle.origin==full_triangle.origin.max()].to_frame().T

As expected, plotting the expected development of our full triangle over time from the Bootstrap `BornhuetterFerguson` model fans out to greater uncertainty the farther we get from our valuation date.

In [None]:
# Plot the data
current_year.iloc[:, :200].reset_index(drop=True).plot(
    color='green', legend=False, alpha=0.1,
    title='Current Accident Year Expected Development Distribution', grid=True);

### Recap
- The Mack method approaches stochastic reserving from a regression point of view<br>
- Bootstrap methods approach stochastic reserving from a simulation point of view<br>
- Where they assumptions of each model are not violated, they produce resonably consistent estimates of reserve variability<br>
- Mack does impose more assumptions (i.e. constraints) on the reserve estimate making the Bootstrap approach more suitable in a broader set of applciations<br>
- Both methods converge to their corresponding deterministic point estimates<br>