# Applying Deterministic Methods
## Getting Started
This tutorial focuses on using deterministic methods to square a triangle. 

Be sure to make sure your packages are updated. For more info on how to update your pakages, visit [Keeping Packages Updated](https://chainladder-python.readthedocs.io/en/latest/install.html#keeping-packages-updated).

In [None]:
# Black linter, optional
%load_ext lab_black

import pandas as pd
import numpy as np
import chainladder as cl
import matplotlib.pyplot as plt

print("pandas: " + pd.__version__)
print("numpy: " + np.__version__)
print("chainladder: " + cl.__version__)

## Disclaimer
Note that a lot of the examples shown might not be applicable in a real world scenario, and is only meant to demonstrate some of the functionalities included in the package. The user should always follow all applicable laws, the Code of Professional Conduct, applicable Actuarial Standards of Practice, and exercise their best actuarial judgement.

## The Chainladder Method

The basic chainladder method is entirely specified by its development pattern selections. For this reason, the `Chainladder` estimator takes no additional assumptions, i.e. no additional arguments. Let's start by loading an example dataset and creating an Triangle with `Development` patterns and a `TailCurve`.  Recall, we can bundle these two estimators into a single `Pipeline` if we wish.

In [None]:
genins = cl.load_sample("genins")

genins_dev = cl.Pipeline(
    [("dev", cl.Development()), ("tail", cl.TailCurve())]
).fit_transform(genins)

We can now use the basic `Chainladder` estimator to estimate `ultimate_` values of our `Triangle`.

In [None]:
genins_model = cl.Chainladder().fit(genins_dev)
genins_model.ultimate_

We can also view the `ibnr_`.  Techincally the term IBNR is reserved for Incurred but not Reported, but the `chainladder` models use it to describe the difference between the ultimate and the latest evaluation period.

In [None]:
genins_model.ibnr_

It is often useful to see the completed `Triangle` and this can be accomplished by inspecting the `full_triangle_`.  As with most other estimator properties, the `full_triangle_` is itself a `Triangle` and can be manipulated as such.

In [None]:
genins

In [None]:
genins_model.full_triangle_

In [None]:
genins_model.full_triangle_.dev_to_val()

Notice the calendar year of our ultimates.  While ultimates will generally be realized before this date, the `chainladder` package picks the highest allowable date available for its `ultimate_` valuation. 

In [None]:
genins_model.full_triangle_.valuation_date

We can further manipulate the "triangle", such as applying `cum_to_incr()`.

In [None]:
genins_model.full_triangle_.dev_to_val().cum_to_incr()

Another useful property is `full_expectation_`. Similar to the `full_triangle`, it "squares" the `Triangle`, but replaces the known data with expected values implied by the model and development pattern.

In [None]:
genins_model.full_expectation_

With some clever arithmetic, we can use these objects to give us other useful information.  For example, we can retrospectively review the actual `Triangle` against its modeled expectation.

In [None]:
genins_model.full_triangle_ - genins_model.full_expectation_

We can also filter out the lower right part of the triangle with `[genins_model.full_triangle_.valuation <= genins.valuation_date]`.

In [None]:
(
    genins_model.full_triangle_[
        genins_model.full_triangle_.valuation <= genins.valuation_date
    ]
    - genins_model.full_expectation_[
        genins_model.full_triangle_.valuation <= genins.valuation_date
    ]
)

Getting comfortable with manipulating `Triangle`s will greatly improve our ability to extract value out of the `chainladder` package. Here is another way of getting the same answer.

In [None]:
genins_AvE = genins - genins_model.full_expectation_
genins_AvE[genins_AvE.valuation <= genins.valuation_date]

We can also filter out the lower right part of the triangle with `[genins_model.full_triangle_.valuation <= genins.valuation_date]` before applying the `heatmap()`.

In [None]:
genins_AvE[genins_AvE.valuation <= genins.valuation_date].heatmap()

Can you figure out how to get the expected IBNR runoff in the upcoming year?

In [None]:
cal_yr_ibnr = genins_model.full_triangle_.dev_to_val().cum_to_incr()
cal_yr_ibnr[cal_yr_ibnr.valuation.year == 2011]

## The Bornhuetter-Ferguson method
The `BornhuetterFerguson` estimator is another deterministic method having many of the same attributes as the `Chainladder` estimator. It comes with one input assumption, the a priori (`apriori`). This is a scalar multiplier that will be applied to an exposure vector, which will produce an a priori ultimate estimate vector that we can use for the model.

Since the CAS Loss Reserve Database has premium, we will use it as an example.  Let's grab the paid loss and net earned premium  for the commercial auto line of business.

Remember that `apriori` is a scaler, which we need to apply it to a vector of exposures. Let's assume that the a priori is 0.75, for 75% loss ratio.

Let's set an apriori Loss Ratio estimate of 75%

The `BornhuetterFerguson` method along with all other expected loss methods like `CapeCod` and `Benktander` (discussed later), need to take in an exposure vector. The exposure vector has to be a `Triangle` itself. Remember that the `Triangle` class supports single exposure vectors.

In [None]:
comauto = cl.load_sample("clrd").groupby("LOB").sum().loc["comauto"]

bf_model = cl.BornhuetterFerguson(apriori=0.75)
bf_model.fit(
    comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)

In [None]:
bf_model.ultimate_

Having an `apriori` that takes on only a constant for all origins can be limiting.  This shouldn't stop the practitioner from exploiting the fact that the `apriori` can be embedded directly in the exposure vector itself allowing full cusomization of the `apriori`.

In [None]:
b1 = cl.BornhuetterFerguson(apriori=0.75).fit(
    comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)

b2 = cl.BornhuetterFerguson(apriori=1.00).fit(
    comauto["CumPaidLoss"],
    sample_weight=0.75 * comauto["EarnedPremNet"].latest_diagonal,
)

b1.ultimate_ == b2.ultimate_

If we need to create a new colume, such as `AdjEarnedPrmNet` with varying implied loss ratios. It is recommend that we perform any data modification in `pandas` instead of `Triangle` forms.

Let's perform the estimate using `Chainladder` and compare the results.

In [None]:
cl_model = cl.Chainladder().fit(comauto["CumPaidLoss"])

plt.plot(
    bf_model.ultimate_.to_frame().index.year, bf_model.ultimate_.to_frame(), label="BF",
)
plt.plot(
    cl_model.ultimate_.to_frame().index.year, cl_model.ultimate_.to_frame(), label="CL",
)
plt.legend(loc="upper left")

## The Benktander Method

The `Benktander` method is similar to the `BornhuetterFerguson` method, but allows for the specification of one additional assumption, `n_iters`, the number of iterations to recalculate the ultimates. The Benktander method generalizes both the `BornhuetterFerguson` and the `Chainladder` estimator through this assumption.

- When `n_iters = 1`, the result is equivalent to the `BornhuetterFerguson` estimator.
- When `n_iters` is sufficiently large, the result converges to the `Chainladder` estimator.

In [None]:
bk_model = cl.Benktander(apriori=0.75, n_iters=2)
bk_model.fit(
    comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)

Fitting the `Benktander` method looks identical to the other methods.

In [None]:
bk_model.fit(
    X=comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)

In [None]:
plt.plot(
    bf_model.ultimate_.to_frame().index.year, bf_model.ultimate_.to_frame(), label="BF"
)
plt.plot(
    cl_model.ultimate_.to_frame().index.year, cl_model.ultimate_.to_frame(), label="CL"
)
plt.plot(
    bk_model.ultimate_.to_frame().index.year, bk_model.ultimate_.to_frame(), label="BK"
)
plt.legend(loc="upper left")

## The Cape Cod Method
The `CapeCod` method is similar to the `BornhuetterFerguson` method, except its `apriori` is computed from the `Triangle` itself. Instead of specifying an `apriori`, `decay` and `trend` need to be specified.  

 - `decay` is the rate that gives weights to earlier origin periods, this parameter is required by the Generalized Cape Cod Method, as discussed in [Using Best Practices to Determine a Best Reserve Estimate](https://www.casact.org/sites/default/files/database/forum_98fforum_struhuss.pdf) by Struzzieri and Hussian. As the `decay` factor approaches 1 (the default value), the result approaches the traditional Cape Cod method. As the `decay` factor approaches 0, the result approaches the `Chainladder` method. 
 - `trend` is the trend rate along the origin axis to reflect systematic inflationary impacts on the a priori.

When we `fit` a `CapeCod` method, we can see the `apriori` it computes with the given `decay` and `trend` assumptions. Since it is an array of estimated parameters, this `CapeCod` attribute is called the `apriori_`, with a trailing underscore.

In [None]:
cc_model = cl.CapeCod()
cc_model.fit(
    comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)

With `decay=1`, each `origin` period gets the same `apriori_` (this is the traditional Cape Cod). The `apriori_` is calculated using the latest diagonal over the used-up exposure, where the used-up exposure is the exposure vector / CDF. Let's validate the calculation of the a priori.

In [None]:
latest_diagonal = comauto["CumPaidLoss"].latest_diagonal

cdf_as_origin_vector = (
    cl.Chainladder().fit(comauto["CumPaidLoss"]).ultimate_
    / comauto["CumPaidLoss"].latest_diagonal
)

latest_diagonal.sum() / (
    comauto["EarnedPremNet"].latest_diagonal / cdf_as_origin_vector
).sum()

With `decay=0`, the `apriori_` for each `origin` period stands on its own.

In [None]:
cc_model = cl.CapeCod(decay=0, trend=0).fit(
    X=comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)
cc_model.apriori_

Doing the same on our manually calculated `apriori_` yields the same result.

In [None]:
latest_diagonal / (comauto["EarnedPremNet"].latest_diagonal / cdf_as_origin_vector)

Let's verify the result of this Cape Cod model's result with the Chainladder's.

In [None]:
cc_model.ultimate_ - cl_model.ultimate_

We can examine the `apriori_`s to see whether there exhibit any trends over time.

In [None]:
plt.plot(cc_model.apriori_.to_frame().index.year, cc_model.apriori_.to_frame())

Looks like there is a small positive trend, let's judgementally select the `trend` as 1%.

In [None]:
trended_cc_model = cl.CapeCod(decay=0, trend=0.01).fit(
    X=comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)

plt.plot(
    cc_model.apriori_.to_frame().index.year,
    cc_model.apriori_.to_frame(),
    label="Untrended",
)
plt.plot(
    trended_cc_model.apriori_.to_frame().index.year,
    trended_cc_model.apriori_.to_frame(),
    label="Trended",
)
plt.legend(loc="lower right")

We can of course utilize both the `trend` and the `decay` parameters together. Adding `trend` to the `CapeCod` method is intended to adjust the `apriori_`s to a common level. Once at a common level, the `apriori_` can be estimated from multiple origin periods using the `decay` factor.

In [None]:
trended_cc_model = cl.CapeCod(decay=0, trend=0.01).fit(
    X=comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)

trended_decayed_cc_model = cl.CapeCod(decay=0.75, trend=0.01).fit(
    X=comauto["CumPaidLoss"], sample_weight=comauto["EarnedPremNet"].latest_diagonal
)

plt.plot(
    cc_model.apriori_.to_frame().index.year,
    cc_model.apriori_.to_frame(),
    label="Untrended",
)
plt.plot(
    trended_cc_model.apriori_.to_frame().index.year,
    trended_cc_model.apriori_.to_frame(),
    label="Trended",
)
plt.plot(
    trended_decayed_cc_model.apriori_.to_frame().index.year,
    trended_decayed_cc_model.apriori_.to_frame(),
    label="Trended and Decayed",
)
plt.legend(loc="lower right")

Once estimated, it is necessary to detrend our `apriori_`s back to their untrended levels and these are contained in `detrended_apriori_`. It is the `detrended_apriori_` that gets used in the calculation of `ultimate_` losses.

In [None]:
plt.plot(
    trended_cc_model.apriori_.to_frame().index.year,
    trended_cc_model.apriori_.to_frame(),
    label="Trended",
)
plt.plot(
    trended_cc_model.detrended_apriori_.to_frame().index.year,
    trended_cc_model.detrended_apriori_.to_frame(),
    label="Detended to Original",
)
plt.legend(loc="lower right")

The `detrended_apriori_` is a much smoother estimate of the initial expected `ultimate_`.  With the `detrended_apriori_` in hand, the `CapeCod` method estimator behaves exactly like our the `BornhuetterFerguson` model.

In [None]:
bf_model = cl.BornhuetterFerguson().fit(
    X=comauto["CumPaidLoss"],
    sample_weight=trended_cc_model.detrended_apriori_
    * comauto["EarnedPremNet"].latest_diagonal,
)

bf_model.ultimate_.sum() - trended_cc_model.ultimate_.sum()

## Recap

All the deterministic estimators have `ultimate_`, `ibnr_`, `full_expecation_` and `full_triangle_` attributes that are themselves `Triangle`s.  These can be manipulated in a variety of ways to gain additional insights from our model. The expected loss methods take in an exposure vector, which itself is a `Triangle` through the `sample_weight` argument of the `fit` method. The `CapeCod` method has the additional attributes `apriori_` and `detrended_apriori_` to accommodate the selection of its `trend` and `decay` assumptions.

Finally, these estimators work very well with the transformers discussed in previous tutorials. Let's demonstrate the compositional nature of these estimators.

In [None]:
wkcomp = (
    cl.load_sample("clrd")
    .groupby("LOB")
    .sum()
    .loc["wkcomp"][["CumPaidLoss", "EarnedPremNet"]]
)
wkcomp

Let's calculate the age-to-age factors:
- Without the the 1995 valuation period
- Using volume weighted for the first 5 factors, and  simple average for the next 4 factors (for a total of 9 age-to-age factors)
- Using no more than 7 periods (with `n_periods`)

In [None]:
patterns = cl.Pipeline(
    [
        (
            "dev",
            cl.Development(
                average=["volume"] * 5 + ["simple"] * 4,
                n_periods=7,
                drop_valuation="1995",
            ),
        ),
        ("tail", cl.TailCurve(curve="inverse_power", extrap_periods=80)),
    ]
)

In [None]:
cc = cl.CapeCod(decay=0.8, trend=0.02).fit(
    X=patterns.fit_transform(wkcomp["CumPaidLoss"]),
    sample_weight=wkcomp["EarnedPremNet"].latest_diagonal,
)
cc.ultimate_

In [None]:
plt.bar(cc.ultimate_.to_frame().index.year, cc.ultimate_.to_frame()["2261"])

## Voting Chainladder

A `VotingChainladder` is an ensemble meta-estimator that fits several base chainladder methods, each on the whole triangle. Then it combines the individual predictions based on a matrix of weights to form a final prediction.

Let's begin by loading the `raa` dataset.

In [None]:
raa = cl.load_sample("raa")

Instantiate the Chainladder's estimator.

In [None]:
cl_mod = cl.Chainladder()

Instantiate the Bornhuetter-Ferguson's estimator. Remember that the `BornhuetterFerguson` requires one argument, the `apriori`.

In [None]:
bf_mod = cl.BornhuetterFerguson(apriori=1)

Instantiate the Cape Cod's estimator and their required arguments.

In [None]:
cc_mod = cl.CapeCod(decay=1, trend=0)

Instantiate the Benktander's estimator and their required arguments.

In [None]:
bk_mod = cl.Benktander(apriori=1, n_iters=2)

Let's prepare the `estimators` variable. The `estimators` parameter in `VotingChainladder` must be in an array of tuples, with (estimator_name, estimator) pairing.

In [None]:
estimators = [("cl", cl_mod), ("bf", bf_mod), ("cc", cc_mod), ("bk", bk_mod)]

Recall that some estimators (in this case, `BornhuetterFerguson`, `CapeCod`, and `Benktander`) also require the variable `sample_weight`, let's use the mean of `Chainladder`'s average ultimate estimate.

In [None]:
sample_weight = cl_mod.fit(raa).ultimate_ * 0 + (
    float(cl_mod.fit(raa).ultimate_.sum()) / 10
)
sample_weight

In [None]:
model_weights = np.array(
    [[0.6, 0.2, 0.2, 0]] * 4 + [[0, 0.5, 0.5, 0]] * 3 + [[0, 0, 1, 0]] * 3
)

vot_mod = cl.VotingChainladder(estimators=estimators, weights=model_weights).fit(
    raa, sample_weight=sample_weight
)
vot_mod.ultimate_

In [None]:
plt.plot(
    vot_mod.ultimate_.to_frame().index.year,
    cl_mod.fit(raa).ultimate_.to_frame(),
    label="Chainladder",
    linestyle="dashed",
    marker="o",
)
plt.plot(
    vot_mod.ultimate_.to_frame().index.year,
    bf_mod.fit(raa, sample_weight=sample_weight).ultimate_.to_frame(),
    label="Bornhuetter-Ferguson",
    linestyle="dashed",
    marker="o",
)
plt.plot(
    vot_mod.ultimate_.to_frame().index.year,
    cc_mod.fit(raa, sample_weight=sample_weight).ultimate_.to_frame(),
    label="Cape Cod",
    linestyle="dashed",
    marker="o",
)
plt.plot(
    vot_mod.ultimate_.to_frame().index.year,
    bk_mod.fit(raa, sample_weight=sample_weight).ultimate_.to_frame(),
    label="Benktander",
    linestyle="dashed",
    marker="o",
)
plt.plot(
    vot_mod.ultimate_.to_frame().index.year,
    vot_mod.ultimate_.to_frame(),
    label="Selected",
)
plt.legend(loc="best")

We can also call the `weights` attribute to confirm the weights being used by the `VotingChainladder` ensemble model.

In [None]:
vot_mod.weights