# Principles of Life Actuarial Statistics

## *Introduction*
When defining **Actuarial Science** we mean that discipline in which the risks assumed by insurance and financial entities are evaluated through the application of statistical and mathematical techniques. So, in this subject the main scope refers to assessing *financial risks* specially into insurance industry using quantitative tools opportunely reliables. <br>
Of course, there are a lot of fields in which this topic has evolved over the course of years. For instance: Casualty issues, Pensions, Reinsurance and many others. Nowadays, being straightforward, since the dawn of **The Industrial Revolution** we have witnessed an upward trend in <font color='red'>**Life Expectancy Worlwide**</font>. Moreover, science progress has brought a widespread of benefits in lifestyle of Mankind. As a matter of fact, there has been a secular tendency to work less and increase productivity, increasing leisuring. It is for this reason that the various jurisdictions were able to adopt retirement policies thanks to detailed **Demographic Studies** as well.<br> As some Europe Populations are struggling with aging and a negative gap in births, others Continents such as Africa are facing high rates but as a result of other factors their Life Expectancy is quite far from the Occidental one. To some extent, aging features such as Schooling or GDP are relevant in which *Spock* would have said <font color='green'>**"Life long and prosper"**</font>.   <br> In this brief Jupyter Notebook we are going to intruduce some basic concepts of Life Actuarial Statistics hoping make clear ideas through Python and its powerful libraries.

## *Biometric Model*
<a id='bi'></a>

The Basic Biometric model is a stochastic model defined around a random variable $X$ that we call the **age of death** of the individual. <br> Therefore, $X$ is a variable defined in the set of **positive real numbers**, although in the practical constructions it accepts the existence of an actuarial infinite or actuarial age denoting as $\omega$. The information referring to the age of death in census studies or samples of specific populations refers to the complete years that the deceased has lived, so it would be much more reasonable to treat $X$ as a discrete variable though. <br> Both approach are conciliables and they only conduct to slight divergencies indeed.<br>

**Basic Hypotesis of Biometric Model**<br>
**1.Homogenity** : Individuals form up a homogeneous group, that is, the statistical behavior of their age of death is identical.<br>
**2.Independence** : The variables that describe the ages of death of the different individuals they are statistically independent. <br>
**3.Stationarity** : The biometric properties of individuals do not depend on their date of birth, but only on their age. This hypothesis is accepted in practice for short periods of time.

Before going any further let's recall some useful **notation**:<br> 
* $X =$ The death's age of the induvidual<br> 
* $x =$ Current age of the individual<br> 
* $T(x)$ or $X$ - $x =$ Residual life at age $x$. It takes values into the interval (0, $\omega$ - $x$)<br>
* $F(x)$ or $P(X \le x) =$ Distribution fuction of the death's age<br>
* $S(x) = P(X > x) = 1 - F(x) =$ Survival function (e.g. I survive at $x$ age)<br>
* $G_{x}(t) = P(T(x) \le t) = P(X - x \le | X > x) = P(X \le x+t | X > x)$ = Distribution function of residual age

## Instantaneous death rate

When defining **instantaneous death rate** denote as $\mu_{x}$ or $\mu (x)$ we allude to the measure of the strength or intensity of mortality at age $x$, for individuals who have reached that age. In other words, that rate collects the limit value of the temporal probability of fractional death within the year.<br> 
**We have the following equality where f is the density function of $X$.**

### $$\mu(x) =  \frac{f(x)}{1-F(x)}$$

## Life Tables

The life tables collect the biometric functions that describe the evolution of a **cohort** from its <font color='green'>**birth**</font> to its <font color='red'>**extinction**</font> (or death of all its members), and are used in actuarial statistics to determine *the probabilities of death*. <br>The **cohort** is function define as:<br>
### $$l : [0, \omega] -> \mathbb{R} $$

Their values are designated by $l (x)$, where $l (0)$ is the initial size of the group or cohort and $l(x)$ is the number of survivors of $l (0)$ who reach exactly age $x$. We also have that $l (ω) = 0$ and that $l (x) = l (0) · S (x)$. For convenience, $l (0)$ is usually assumed to be 100,000.
In life tables, $l (x)$ is the main function from which
derive all others.<br>
In the life tables, we also have that $_{n}d_{x}$ represents the **number of deaths** between the ages $x$ and $x + n$:<br>
Indeed we have that:<br>
### $$_{n}d_{x} = l(x) - l(x+n)$$

## Life Expectancy

When a person decides to take out life insurance, one of the Most relevant factors at determining the **premium** of policy is the insured's current age. <br>The characteristics of the insured's future life (hope, variance, median residual life) will provide very important information in this regard. <br>We recall that the distribution function of the residual life $G(x)$ is related to that of the age of death as follows:

#### $$G_{x}(t) = P[T(x) \le t] = \frac{F(x+t)-F(x)}{1-F(x)}$$
[Notation above](#bi)

## Interactive diagrams with Plotly

Let's now plotting some charts for make clear concepts. The first graph is a <font color='blue'>**Survival Plot**</font>, it is usually rapresented as a line chart.<br> As usual, in this chart it is illustrated on the $x$ axis the Age of a group of Individuals, it clearly begins at 0 coming up to the limit age (our $\omega$)

In [1]:
#First of all, we import basic libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
file = r'C:\Users\Josè Valencia\Desktop\Actuarial\lifeTable.csv'
dataset = pd.read_csv(file) 
dataset.head() # Our dataset looks like this

Unnamed: 0,age,Australia_Male,Austria_Male,Belgium_Male,Bulgaria_Male,Canada_Male,Czech Republic_Male,Denmark_Male,Estonia_Male,Finland_Male,...,Netherlands_Female,Poland_Female,Portugal_Female,Russia_Female,Slovakia_Female,Slovenia_Female,Spain_Female,Sweden_Female,United Kingdom_Female,United States_Female
0,0,100000,100000,100000,100000,100000,100000,100000,100000,100000,...,100000,100000,100000,100000,100000,100000,100000,100000,100000,100000
1,1,99528,99600,99620,98938,99488,99698,99689,99582,99740,...,99649,99493,99693,99279,99526,99747,99713,99759,99583,99415
2,2,99495,99572,99570,98887,99453,99674,99663,99521,99723,...,99624,99461,99657,99205,99486,99727,99682,99735,99558,99376
3,3,99471,99545,99549,98854,99433,99646,99657,99521,99717,...,99609,99443,99639,99160,99444,99717,99669,99718,99538,99351
4,4,99458,99528,99528,98795,99421,99637,99654,99521,99707,...,99595,99424,99626,99130,99414,99706,99657,99700,99522,99332


The dataset was downloaded from **Kaggle**. This data contains some of the G20s and is divided in <font color='pink'>**Females**</font> and <font color='navy'>**Males**</font>. <br>As **Index** is set the age from birth to the age of death. Note that features *(columns)* are Coutries by Gender. It is also present in the Github Repository. <br>But for futher information click here: [lifeTables](https://www.kaggle.com/cthierfelder/life-table-g20)

In [3]:
import plotly.express as px
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
init_notebook_mode(connected=True)

## *Spain*

Let's take the case of Spain. At the beginning of $21^{th}$ the life expectancy in this Mediterranean Country was in average only of 34.76 years.<br> By the turn of the century **life expectancy at birth has increased** rapidly and since 1990s,
life expectancy at birth increased almost four years, peaking 80.9 by 2007.<br>
**For more detailed information about Automonous Communities**. [Pick here](https://www.ine.es/en/prensa/np584_en.pdf)

### Now let's have a look using Plotly
Plotly is a powerful interactive library for Python. It is quite useful as we can slide onto the chart for looking to the smallest detail. In fact, I advise the reader at swiping it up and down to see age and number of individuals interactively.

In [4]:
spain = pd.pivot_table(dataset, values=['Spain_Male', 'Spain_Female'], index='age')

linef = go.Scatter(x= spain.index, y=spain.Spain_Female, mode='lines', name='Females')
linem = go.Scatter(x=spain.index, y=spain.Spain_Male, mode ='lines', name='Males')

frames = [linef, linem]
figure = go.Figure(frames)
figure.update_layout(title = 'Life Expectancy at Birth - Spain', xaxis_title='Age', yaxis_title='Individuals l(x)', 
                    font= go.layout.Font(family='Arial'))
iplot(figure)

In [121]:
d = r'C:\Users\Josè Valencia\Desktop\Actuarial\ine.csv'
df = pd.read_csv(d, sep=';')
df.head()

Unnamed: 0,Provinces,Sexo,Edad,Funciones,Period,Total
0,02 Albacete,Ambos sexos,0 years old,Mortality rate,2018,2.230829
1,02 Albacete,Ambos sexos,0 years old,Mortality rate,2017,3.026787
2,02 Albacete,Ambos sexos,0 years old,Mortality rate,2016,0.876656
3,02 Albacete,Ambos sexos,0 years old,Mortality rate,2015,3.123511
4,02 Albacete,Ambos sexos,0 years old,Mortality rate,2014,2.232349


In [122]:
life = df[df.Funciones=='Life Expectancy']

In [125]:
life.dropna(axis=0)

Unnamed: 0,Provinces,Sexo,Edad,Funciones,Period,Total
196,02 Albacete,Ambos sexos,0 years old,Life Expectancy,2018,83.368297
197,02 Albacete,Ambos sexos,0 years old,Life Expectancy,2017,83.480692
198,02 Albacete,Ambos sexos,0 years old,Life Expectancy,2016,83.153809
199,02 Albacete,Ambos sexos,0 years old,Life Expectancy,2015,82.878404
200,02 Albacete,Ambos sexos,0 years old,Life Expectancy,2014,82.667250
...,...,...,...,...,...,...
768315,52 Melilla,Females,90 years old and over,Life Expectancy,1995,4.330786
768316,52 Melilla,Females,90 years old and over,Life Expectancy,1994,3.476390
768317,52 Melilla,Females,90 years old and over,Life Expectancy,1993,3.539922
768318,52 Melilla,Females,90 years old and over,Life Expectancy,1992,4.877758


In [137]:
madrid = life[life.Provinces =='28 Madrid']

In [140]:
mat = madrid.Edad.str.split(' ', expand=True)
mat.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8
428932,0,years,old,,,,,,
428933,0,years,old,,,,,,
428934,0,years,old,,,,,,
428935,0,years,old,,,,,,
428936,0,years,old,,,,,,


In [141]:
madrid.Edad = mat[0]
madrid.head()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Provinces,Sexo,Edad,Funciones,Period,Total
428932,28 Madrid,Ambos sexos,0,Life Expectancy,2018,84.783872
428933,28 Madrid,Ambos sexos,0,Life Expectancy,2017,84.528316
428934,28 Madrid,Ambos sexos,0,Life Expectancy,2016,84.541344
428935,28 Madrid,Ambos sexos,0,Life Expectancy,2015,84.001572
428936,28 Madrid,Ambos sexos,0,Life Expectancy,2014,84.219367


In [156]:
mat[0].str.isnumeric().sum()

168

In [163]:
len(mat[0])

1764

In [162]:
sum(mat[0] == 'From')

1596

In [None]:
!pip install --upgrade notebook

In [116]:
italy = pd.pivot_table(dataset, values=['Italy_Male', 'Italy_Female'], index='age')
linitam = go.Scatter(x = italy.index, y = italy.Italy_Male, mode = 'lines', name = 'Males Ita')
linitaf = go.Scatter(x = italy.index, y = italy.Italy_Female, mode = 'lines', name = 'Females Ita')

frames2 = [linitam, linitaf]
figurita = go.Figure(frames2)
figurita.update_layout(title='Life Expectancy at Birth - Italy', xaxis_title='Age', yaxis_title='Individuals l(x)')
iplot(figurita)