# Task 5 - Predictive and Statistical Modelling of claims

# Remark 
Pareto distribution based on a very limited data set, the standard value for property business of 1.5 is used. However, it's impossible to compute the pareto second moment because 1.5 <2! In fact, it's a mistake in the exercise. Please use 2.5

### Poisson distribution for claim frequency

A Poisson distribution has been selected to 
model the claims frequency, i.e. the number of 
claims during one year. This is based on the 
assumption that claims are independent of each 
other. Due to the nature of the cover this 
assumption is considered to be reasonable.

$$
P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}
$$

e is the base of the natural logarithm (approximately 2.71828).

λ is the average rate of occurrence of events.

k is the number of events that occur.

## Pareto distribution for the severity

A Pareto distribution has been selected for the 
claims severity. The Pareto distribution has a 
long tail, which ensures that the model 
produces extremely large losses which have not 
been observed in the past. In addition, extreme 
value theory provides a mathematical basis for 
the assumption of a Pareto distribution for 
extreme losses.

 The Pareto distribution, which is used to model 
the severity, has two parameters. Since it is 
difficult to estimate the shape parameter of the 
Pareto distribution based on a very limited data 
set, the standard value for property business of 
2.5 is used. This produces a relatively long 
tailed distribution.

### Dataset

In [5]:
import numpy as np
import pandas as pd

In [6]:
dict_data = {'WIS':[3.1,2.1,10.5,2.0,np.nan,np.nan,230.5,51.0,0.5], 'Non-WIS':[4.5,np.nan,np.nan,np.nan,125.3,np.nan,0.4,np.nan,np.nan]}

In [7]:
df = pd.DataFrame(dict_data)
df

Unnamed: 0,WIS,Non-WIS
0,3.1,4.5
1,2.1,
2,10.5,
3,2.0,
4,,125.3
5,,
6,230.5,0.4
7,51.0,
8,0.5,


In [8]:
index = ['2016 -1','2016-2','2017 -1','2017-2','2018 -1','2018-2','2019 -1','2019-2','2020 -1']
df.index=index

In [9]:
df

Unnamed: 0,WIS,Non-WIS
2016 -1,3.1,4.5
2016-2,2.1,
2017 -1,10.5,
2017-2,2.0,
2018 -1,,125.3
2018-2,,
2019 -1,230.5,0.4
2019-2,51.0,
2020 -1,0.5,


### Question e
Which parameters do you obtain when you are fitting the claims frequency and severity distributions for each site to the provided data set? 

#### Find the Lamdba - average rate of occurence for the 'Poisson distribution'

In [10]:
N_WIS = (2+2+0+2+1)/5 #average rate of occurence for the last 5 years
print(f'lambda_WIS (expected nbr claims): {N_WIS}')

N_NoN_WIS = (1 + 0 +1 +1 +0)/5
print(f'lambda_NoN_WIS (expected nbr claims): {N_NoN_WIS}')

lambda_WIS (expected nbr claims): 1.4
lambda_NoN_WIS (expected nbr claims): 0.6


#### For pareto distribution find t

#### Expected Value (Mean)

$$
E[X]= t\cdot \frac{a}{(a-2)}
$$
so t is:
$$
t = E[X] * \frac{(a-2)}{(a)}
$$


Before the aggregate loss model can be fit to the data, the loss 
amounts need to be on-levelled to current year amounts using 
an indexation of 3% p.a. to allow for monetary inflation since the 
time of the loss.  

In [11]:
df_2021 = df.copy()

In [12]:
# put all the value at end 2021
def value_2021(df_2021):
    for i in range(9):
        df_2021.iloc[i,0]=df_2021.iloc[i,0]*1.03**(5-np.floor(i/2))
        df_2021.iloc[i,1]=df_2021.iloc[i,1]*1.03**(5-np.floor(i/2))
    return df_2021 

In [13]:
df_2021 =np.round(value_2021(df_2021),1)

In [14]:
df_2021

Unnamed: 0,WIS,Non-WIS
2016 -1,3.6,5.2
2016-2,2.4,
2017 -1,11.8,
2017-2,2.3,
2018 -1,,136.9
2018-2,,
2019 -1,244.5,0.4
2019-2,54.1,
2020 -1,0.5,


In [15]:
E_X_WIS = np.round(np.sum(df_2021.iloc[:,0].dropna())/5,1) # can not use np.average because 
print(E_X_WIS)
E_X_Non_WIS = np.round(np.sum(df_2021.iloc[:,1].dropna()) /5,1)
print(E_X_Non_WIS)

63.8
28.5


In [16]:
t_WIS = np.round(E_X_WIS*(2.5-1)/2.5,1)
print(t_WIS)
t_Non_WIS = np.round(E_X_Non_WIS*(2.5-1)/2.5,1)
print(t_Non_WIS)

38.3
17.1


### Question (f) 
What is the expected value of the claims for the current year 2021 at the individual (i.e. for each site) as well as at the aggregate level? Do you need a simulation tool or can the expected values be derived analytically? Jakob has your back, check out his hint.

Use : $$E[S_{individual}]=E[N]⋅E[X] $$

In [17]:
E_2021_WIS = N_WIS*E_X_WIS
print(f'expected claims for WIS in 2021 : {E_2021_WIS}')
E_2021_Non_WIS= N_NoN_WIS *E_X_Non_WIS 
print(f'expected claims for NON WIS in 2021 : {E_2021_Non_WIS}')

expected claims for WIS in 2021 : 89.32
expected claims for NON WIS in 2021 : 17.099999999999998


In [18]:
# total expected claims is the sum of both of them 
E_aggregate_claim = E_2021_WIS+E_2021_Non_WIS
E_aggregate_claim

106.41999999999999

### Question g
Approximate the Value at Risk at level 80% of the (total) aggregate loss distribution for both risk types using the standard normal distribution (or “CLT”) to obtain an estimate of the equalisation reserves to be set up by the insurer. What now? Jakob tells you in some hints.



 **Variance of Total Aggregate Loss \( S \)**

From **Hint 4**, the variance of the total aggregate loss \( S \) is given by:

$$
\text{Var}[S] = E[N] \cdot \text{Var}[X] + \text{Var}[N] \cdot (E[X])^2
$$

**Explanation of Terms:**

- E[N]: Expected number of claims.
- Var[N] : Variance of the number of claims.
- E[X]: Expected claim severity.
- Var[X]: Variance of claim severity.

But let's simplify the term:

We start with the variance of the aggregate claims \(S_i\):
$$
\text{Var}(S_i) = E[N_i]\,\text{Var}(X_{i,1}) + \text{Var}(N_i)\,(E[X_{i,1}])^2.
$$

Recall that the variance of \(X_{i,1}\) can be written as:
$$
\text{Var}(X_{i,1}) = E[X_{i,1}^2] - \big(E[X_{i,1}]\big)^2.
$$

Substitute this into the expression for Var(S_i):
$$
\text{Var}(S_i) = E[N_i]\Big(E[X_{i,1}^2] - \big(E[X_{i,1}]\big)^2\Big) + \text{Var}(N_i)\,(E[X_{i,1}])^2.
$$

If we assume that the number of claims \(N_i\) follows a Poisson distribution, then
$$
\text{Var}(N_i) = E[N_i].
$$

Thus, the formula becomes:
$$
\begin{aligned}
\text{Var}(S_i) &= E[N_i]\Big(E[X_{i,1}^2] - \big(E[X_{i,1}]\big)^2\Big) + E[N_i]\,(E[X_{i,1}])^2 \\
&= E[N_i] \left( E[X_{i,1}^2] - \big(E[X_{i,1}]\big)^2 + \big(E[X_{i,1}]\big)^2 \right) \\
&= E[N_i]\,E[X_{i,1}^2].
\end{aligned}
$$

This is the Var for individual - so WIS or Non_WIS

To have the aggregate we just some them

 How to find \(E[X_{i,1}^2]\)? 
 As it's a pareto distribution, the second moment  moment of Pareto is:

$$
\text{Second Moment} = t^2 * {\frac{a}{a-2}}
$$


In [19]:

var_S = np.round(np.round(N_WIS*(t_WIS**2 *((2.5/(2.5-2)))),1) + np.round(N_NoN_WIS*(t_Non_WIS**2 * ((2.5/(2.5-2)))),1),1)


In [20]:
var_S

11145.4

And then find do the value at risk Value at Risk at level 80%

$$
\mathbb{P}[S \leq z] = \mathbb{P}\left[\frac{S - \mathbb{E}[S]}{\sqrt{\mathrm{Var}[S]}} \leq \frac{z - \mathbb{E}[S]}{\sqrt{\mathrm{Var}[S]}}\right] \approx \Phi_{0,1}\left(\frac{z - \mathbb{E}[S]}{\sqrt{\mathrm{Var}[S]}}\right) \overset{!}{=} 0.8
$$

$$
\iff z = \Phi_{0,1}^{-1}(0.8) \cdot \sqrt{\mathrm{Var}[S]} + \mathbb{E}[S]
$$

The value of the 80%-quantile of the standard normal distribution is 0.84

In [22]:
z = 0.84 * np.sqrt(var_S)+E_aggregate_claim
z

195.10029228639246