# Life Tables:  a deeper look at mortality

This week we will build on the survival probabilities that we developed last week in the hopes of learning more stuff about the lives of US slaves.  

The plan is to construct a 'life table' -- which is a way of presenting survival probabilities that allow us to calculate other interesting quantities and also to compare the life course of our population of interest with that of other populations.


In [2]:
# Run this cell, but please don't change it.

# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *
#from datascience.predicates import are
# These lines do some fancy plotting magic
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
from datascience.predicates import are
import math

<h2>What do we think about these survival probabilities</h2>

Over the last week, you (hopefuly) spent some time developing a casual expertise regarding $19^{th}$ century infant and child mortality.  Based on what you now know... what do you think?

let's take a look at a few of them just to remind ourselves of what we are woking with.



In [5]:
## read and remind ourselves about the survival rates that we have wrested from the old cenus data
slavSurv=Table.read_table("SlaveSurvivalRates1850_60.csv").select(['sex','midAge','loAge','hiAge','surv5060'])
slavSurv.where('sex','FEMALES').show(5)
slavSurv.where('sex','MALES').show(5)


sex,midAge,loAge,hiAge,surv5060
FEMALES,2.0,0,4,0.96641
FEMALES,7.0,5,9,0.952182
FEMALES,14.5,10,19,0.867193
FEMALES,24.5,20,29,0.780687
FEMALES,34.5,30,39,0.779428


sex,midAge,loAge,hiAge,surv5060
MALES,2.0,0,4,1.03681
MALES,7.0,5,9,0.921583
MALES,14.5,10,19,0.893296
MALES,24.5,20,29,0.754211
MALES,34.5,30,39,0.803249


## Mortality is more than just survival (probabilities)

It turns out that we (demographers) can do rather a lot with a set of age specific mortality rates. 
For example we can calculate life expectancies and, in the context of known population growth rates, we can
also say something interesting about fertility.

To do these things, it is convenient to construct what is called a "life table"  Like the other tables that we
have experineced in DS8,  it is an array of numbers where columns are "vectors" of quantities arranged according to
something displayed on the rows.

the columns of a life table include the following :
<ol>
<li> $x$ : the inclusive lower bound of the age category
<li> $n$ : the width of the age category
<li> $_na_x$ : generally the midpoint of the age category (technically its the average point within the age interval at which people who die, die)
<li> $l_x$ : the number of people observed alive at the beginning of an age cateogry. This is the key column, from which all the columns below are derived. We can figure this on out because we know the <b>rates of survival across all the age categories.</b>  but its going to take a little work.

<li> $_nq_x$ : the probablility of death by age $x+n$ conditional on having survived to age x ($1-(l_{x+n}/l_x)$)
<li> $_nd_x$ : number of deaths between age $x$ and $x+n$ ($l_{x+n}-l_x$)
<li> $_nL_x$ : person years lived between ages $x$ and $x+n$ (  $(n)(l_{x+n})+(_na_x)(_nd_x)$)
<li> $T_x$ : remaining person years at age $x$ until extinction ($\sum_{a=x}^{\infty} {_nL_a}$)
<li> $e_x$ : Expectation of life at age $x$ ($T_x /l_x$)

</ol>

Before you panic please notice a couple of things.  First,  only a few columns are intrinsically interesting -- cheif among them the $e_x$ column which gives us life expectancy. Is there a more important number in any social science? 
The rest of the columns are there mainly to help us calculage $e_x$. 

Second, while the notation might be confusing, <b>NONE of this is conceptually difficult.</b> 


In [None]:
#np.array([1,2,3])**np.array([2,4,6])
#.867**(1/10)
slavSurv.append_column('s5060an',slavSurv['surv5060']**(1/10))
#x5= 
(slavSurv["surv5060"][0]**(1/5)) * (slavSurv["surv5060"][1]**(1/5))
#slavSurv['s5060an'][2]**5
#**(1/5)
#x=annualizedSurv
#x[1]=slavSurv['surv5060'][1]/x[2]
#slavSurv.where('sex','FEMALES').where('loAge',5).column('surv5060') / annualizedSurv


<h2>Using survival probabilities to find the $l_x$ column</h2>

This is the key to the whole lifetable construction enterprise:

Let's begin with some assumptions:
<ol>
<li> Lets assume that the mortality experience over the 10 years from 1850 - 1860 is constant across years. This means that the probability of say a new born (in say 1852) surviving upto her 5th birthday (in 1857) is the same as a different new born in say 1854 celebrating her 5th birthday in 1857.  In other words: same ages, same survival probabilities regardless of the year (between 1850 and 1860).  What do you think if this assumption?


<li> Let's assume further that (to the extent possible) the mortality probability <i>within</i> age categories is also constant. In other words, the probability of suriving from one's 2nd birthday to one's third is the same as from the 3rd to 4th --BUT NOT the same as between the 6th and 7th (which is within a different age category on our table).  What do you think of this assumption ?
</ol>

For our present purpose, we can model survival as a coin toss.  Each  person is born with a coin whose probability of heads (survival) changes with age, and she tosses it once a year to determine whether or not she will see her next birthday.  
<li> Let's assume that survival is like tossing a coin -- happily in this case it is NOT a "fair" coin but rather a coin that lands "heads" most of the time.  If you are 20 year old female slave in 1850 (and you have the time and curiousity .. and I guess courage and a metaphysical imagination) you can toss that coin (of survival) say 10 times 1000 times (thats 1000 sets of 10 coin tosses).  Then count it up and about 7811 times out of 10,000 you will get 10 heads in a row. 
Mathematically we can ask ourselves then what 
toss that coin (of survival) lots of times.  Suten times and expect to get 10 heads in a row 78.0687 percent of the times.

The probability of 10 consecutive heads from a coin where the probability of heads on each toss is $p$ is $p^{10}$ or $p*p*p*p*p*p*p*p*p*p$ (there should be 10 $p$'s in that expression).
<ol>



In [None]:
# Now to build that life table
tab=slavSurv.where(slavSurv.column('sex') == 'MALES')

x=tab['loAge']
n=tab['hiAge']-tab['loAge']+1
nax= .5*n
## Now ti tge lx column. This is the important one and will require some thoughful assumptions
## first let's assume because we can't do anything better that the one year probabilities of survival are constant across
## both age and time. this is a HUGE assumption which is most certainly wrong for infants and the very old. It's probably
## OK for most ages and... since we have 10 years between observations .  What it means is that a person who is say 22 years old in 1850 has the same
## probability of dying in the next year as a person who is say 27 years old in 1850 -- AND the same probability
## of dying in the next year as a person who is say 25 years old in 1857.  Got that?  
## It mean that the probability of a 20 year old female slave surviving to age 21 is .780687^(1/10) 
## why?  because the probability of this person surviving 10 years is given in the slavSurv table as .780687 
## and to survive 10 years is to roll the dice 10 times. 
## To beat this to death ... consider a fair coin: what is the probability of tossing 2 heads in a row?
## why it's (1/2)^2 = 1/4.  and how about tossing ten heads in a row?  (1/2)^10=0.0009765625
annual_surv=tab['surv5060']**(1/10)
temp=annual_surv**n
temp.cumprod()
#temp1 = lambda  surv,lo,hi :math.pow(surv,((hi-lo+1)/10)) ,['surv5060','loAge','hiAge'] ))
#temp1=math.pow(tab['surv5060'],n/10)
#temp1

#lx.append('survAC',slavSurv.apply(lambda surv,lo,hi :math.pow(surv,((hi-lo+1)/10)) ,['surv5060','loAge','hiAge'] ))
#malSurv=slavSurv.where(slavSurv.column('sex') == 'MALES')


#lx=
#slavSurv.append_column('survAC',slavSurv.apply(lambda surv,lo,hi :math.pow(surv,((hi-lo+1)/10)) ,['surv5060','loAge','hiAge'] ))
#femSurv=slavSurv.where(slavSurv.column('sex') == 'FEMALES')
#nax


In [None]:
ptab=malSurv.to_df()
ptab

In [None]:
slavSurv=Table.read_table("SlaveSurvivalRates1850_60.csv").select(['sex','midAge','loAge','hiAge','surv5060'])
#slavSurv.show
## convert 10 year to annual survival rate
#slavSurv.show(5)
slavSurv.append_column('survAC',slavSurv.apply(lambda surv,lo,hi :math.pow(surv,((hi-lo+1)/10)) ,['surv5060','loAge','hiAge'] ))
malSurv=slavSurv.where(slavSurv.column('sex') == 'MALES')
femSurv=slavSurv.where(slavSurv.column('sex') == 'FEMALES')
#malSurv.show()
radix=10000
c_lx=[] ;c_x=[]
tab=malSurv.copy()
for i in range(0,tab.num_rows) :
    #print(tab.row(i)[1])
    midAge,loAge,hiAge =tab.row(i)[1:4]  #grabs 2nd 3rd and 4th columns aka 0th through 3rd
    survAC=tab.row(i)[5]
    if (i == 0) :
        lx=radix
    else :
        lx= lx*survAC
    
    c_lx.append(lx)
    c_x.append(loAge)
    
c_nqx=[]
c_ndx=[]
for i in range(0,tab.num_rows) :
    #print( c_lx[i])
    if (i < tab.num_rows):
        c_nqx.append(1- (c_lx[i+1]/c_lx[i]))    
        c_ndx.append(c_lx[i] - c_lx[i+1])
    else :
        c_nqx.append(1)
        c_ndx.append(c_lx[i])
    
    