# Probability a Person Will Die in the United States 
Based on the [actuarial life table](https://www.ssa.gov/oact/STATS/table4c6.html) from the Social Security Administration, we calculate the probability a person will die in a given number of years by providing their age and sex.

In [1]:
import pandas as pd

import requests

from bs4 import BeautifulSoup

## Retrieve Data from Website

In [2]:
# Should give us most recent data: 2017 as of Feb 2022
url = 'https://www.ssa.gov/oact/STATS/table4c6.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find(class_='t')

In [3]:
# `pd.read_html` returns a list of dataframes
dataframe = pd.read_html(str(table))[0]
# Get the oldest age we have data for
max_age = len(dataframe) - 2
dataframe.head()

Unnamed: 0_level_0,Exact age,Male,Male,Male,Female,Female,Female
Unnamed: 0_level_1,Exact age,Death probability a,Number of lives b,Life expectancy,Death probability a,Number of lives b,Life expectancy
0,0,0.006081,100000,76.23,0.005046,100000,81.28
1,1,0.000425,99392,75.69,0.000349,99495,80.69
2,2,0.00026,99350,74.73,0.000212,99461,79.72
3,3,0.000194,99324,73.75,0.000166,99440,78.74
4,4,0.000154,99305,72.76,0.000137,99423,77.75


## Calculating Probability of Death
We have a probability of dying at each age. When computing the probability of dying within the next number of years, we cannot simply multiply these probabilities together starting at their age and incrementing. This is because once an individual dies, they will not make it to the next year. Instead, we compute the probability of surviving year on year, and subtract from 1 at the end. Let $P_{n,\text{sex}}(\text{dying})$ give the probability of a person of certain sex dying at age $n$.

$$P(\text{dying})=1-\prod_{n=\text{age}}^{\text{age}+\text{years}-1}\big(1-P_{n,\text{sex}}(\text{dying})\big)$$

In [4]:
def calculate_death_probability(age, sex, years):
    '''
    Given a person's `age` and `sex`, what is the 
    probability they will die in the next `years`?
    '''
    # Checks
    assert age >= 0 and age <= max_age
    assert years > 0 and age + years <= max_age + 1
    assert sex == 'Male' or sex == 'Female'
    
    # Get data for correct `sex`
    df = dataframe[sex]
    prob = 1
    for i in range(years):
        # Probability of surviving to the next year
        prob *= 1 - float(df.iloc[age+i, 0])
    # Subtract from 1 to get probability of dying
    return 1 - prob

In [5]:
calculate_death_probability(73, 'Male', 6)

0.20200351765453473