In [None]:
from IPython.core.display import HTML
from datascience import *

import matplotlib
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import os
plt.style.use('fivethirtyeight')

import pandas as pd
import zipfile
import io
import math

def css_styling():
    styles = open('../notebook_styles.css', 'r').read()
    return HTML(styles)
css_styling()

In [None]:
#Loading testing data
from client.api.notebook import Notebook 
lab02 = Notebook('lab02.ok')
_ = lab02.auth(inline=True)

# Lab 02 - Life and death around the world

## Introductions

**What is your partner's name?**

[ANSWER HERE]

**What year is your partner in? (Freshman, Sophomore, etc)**

[ANSWER HERE]

**What is your partner's favorite food?**

[ANSWER HERE]

### Getting started

In the last lab, we explored life tables for men and women in the 50 United States.

Today, we're going to broaden our scope and take a look at mortality around the globe.  We'll use data from the [United Nations Population Division](https://population.un.org/wpp/). The UNPD produces estimates for demographic quantities for countries around the world.

Our goals are:

* TODO
* practice iteration
* practice filtering data

First, we'll read in a dataset that has life tables for all of the countries around the world in the time period 2010-2015:

In [None]:
unpd_2015 = Table.read_table('../data/UNPD/unpd_life_tables_2015_cleaned.csv')
unpd_2015

You can see that there are life tables for the time period from 2010 to 2015 for many countries in this dataset.  It will be useful to make a list that has the names of all of the countries in the dataset.

In [None]:
all_countries = np.unique(unpd_2015['area'])
all_countries

In [None]:
print("There are", len(all_countries), "countries in the dataset.")

We'll start by extracting life expectancy at birth for males and for females from each country. As we've discussed over the past couple of classes, life expectancy at birth is a widely used indicator for the mortality experience of a population.

Recall that in a standard life table, life expectancy at birth with be in the `e` column for age 0.

**Question: Make a dataset that has the row corresponding to age 0 for each country for males; then make another dataset that has the row corresponding to age 0 for each country for females.**

In [None]:
# males
lt_m_age0 = unpd_2015.where(...).where(...)

# females
lt_f_age0 = unpd_2015.where(...).where(...)

In [None]:
_ = lab02.grade('test_lt_age0')

Using the two datasets we just made, we'll now iterate through all of the countries in the list `all_countries` to build up a dataset that has the life expectancy for males and females in each country.

**Question: Fill in the missing parts of the code below to build a dataset that has the male and female life expectancy at birth for all of the countries in the UNPD dataset.**

In [None]:
country_e0_m = make_array()
country_e0_f = make_array()

for country in ...:
    country_e0_m = np.append(country_e0_m,
                             lt_m_age0.where(...)['e'])
    country_e0_f = np.append(country_e0_f,
                             lt_f_age0.where(...)['e'])
    
e0_2015 = Table().with_columns('country', ...,
                                  'e0_m', country_e0_m,
                                  'e0_f', country_e0_f)
e0_2015

In [None]:
_ = lab02.grade('test_country_e0')

**Question: Write some code to determine which country had the highest and lowest life expectancy for males and for females; please report the country and the life expectancy itself.**

In [None]:
lowest_male_le = ...
highest_male_le = ...
lowest_female_le = ...
highest_female_le = ...

print("Highest Female e0:", highest_female_le)
print("Lowest Female e0:",  lowest_female_le)
print("Highest Male e0:", highest_male_le)
print("Lowest Male e0:", lowest_male_le)

ANSWER:  
Highest female e0: [ANSWER HERE]  
Lowest female e0: [ANSWER HERE]  
Highest male e0: [ANSWER HERE]  
Lowest male e0: [ANSWER HERE]  

**Question: Make a histogram that shows the distribution of life expectancy values around the world for males and for females.**  
*[NOTE: try to see if you can plot the distribution of males and females on the same plot, to most easily compare them. If you get stuck on this, you can make a separate plot for males and for females]*

In [None]:
...

**Question: Now make a scatterplot that compares life expectancy for males (x axis) and life expectancy for females (y axis).**

In [None]:
...

**Question: Looking at the two plots you just made, what do you conclude about the relationship between male and female life expectancy?**

[ANSWER HERE]

[SOLUTION]

* male and female life expectancy are very strongly related to one another: countries that have high life expectancy for males also have high life expectancy for females, and vice-versa
* in every single country, female life expectancy is higher than male life expectancy
* there appears to be more variation across countries in female life expectancy than male life expectancy

**Question: Fill in the code below to make a histogram of the difference between female and male e0.**

In [None]:
e0_mf_comp = Table().with_column('e0_f_minus_e0_m', ...)
e0_mf_comp.hist()

**Question: Fill in the code below to summarize the difference between female and male e0 in a few different ways.**

In [None]:
print("Avg diff between female and male e0: ", ...)
print("Std. deviation of diff between female and male e0: ", ...)
print("Minimum diff between female and male e0: ", ...)
print("Maximum diff between female and male e0: ", ...)

### The components of mortality

Life expectancy at birth give us an aggregate picture of mortality across all ages. In order to understand more deeply, demographers often look at different components that contribute to life expectancy. Here, we'll explore child mortality and adult mortality.

We'll start by looking at child mortality. There are many different ways you could imagine trying to summarize mortality at childhood ages; we'll look at a frequently-used indicator: the life table probability of death before age 5.

Before we get started looking at the data, let's pause for a second and check that we understand what we want to measure here.  The life table probability of death before age 5, called ${}_5q_0$ in demographic language, is

$$
{}_5q_0 = 1 - \frac{\text{# life table deaths between ages 0 and 5}}{\text{# life table births}}
$$

Take this example of the life table for Mexican females:

In [None]:
unpd_2015.where('area', are.equal_to('Mexico')).where('sex', are.equal_to('female')).show()

**Question: Use the life table above to calculate child mortality for Mexican females by hand.**

[ANSWER HERE]

You should get an answer of about 0.02. This means that in the synthetic population described by the life table, there is about a 2% chance that a baby will die before reaching age 5.

**Question: Now fill in the code below to calculate child mortality for Mexican females (the same thing you just calculated by hand).**

In [None]:
lt_data = unpd_2015.where('area', are.equal_to(...)).where('sex', are.equal_to(...))
1 - (lt_data[...][...] / lt_data[...][...])

You should get the same answer that you got when calculating by hand.  
  
Now that we know how to calculate child mortality for a specific example, let's write a function that will help us calculate it everywhere.

**Question: Fill in the function below, which calculates child mortality for a given life table**

In [None]:
def get_child_mortality(lt_data):
    """
    Given the data for a life table, calculate the life table probability of death between 
    ages 0 and age 5.
    
    NOTE: this assumes that the life table is sorted by age; in particular, the first row
    (index 0) should be age 0, and the third row (index 2) should be age group 5.
    (The UNPD dataset should satisfy this requirement.)
    """
    child_survival = ... / ...
    # the probability of death is 1 - the probability of surviving
    child_mortality = 1 - child_survival

    return(child_mortality)

# check on Mexican females
mx_f_lt = unpd_2015.where('area', are.equal_to('Mexico')).where('sex', are.equal_to('female'))
get_child_mortality(mx_f_lt)

OK, now we're ready to calculate child mortality for every country and both sexes.

**Question: Fill in the code for the loop below to calculate child mortality for all countries and for all sexes** . 
*[HINT: remember that `all_countries` has a list of all of the countries in the dataset]*

In [None]:
country_childmort_m = make_array()
country_childmort_f = make_array()

lt_m = unpd_2015.where('sex', are.equal_to('male'))
lt_f = unpd_2015.where('sex', are.equal_to('female'))

for country in all_countries:
    country_childmort_m = np.append(country_childmort_m,
                                    ...)
    country_childmort_f = np.append(country_childmort_f,
                                    ...)
    
childmort_2015 = Table().with_columns('country', ...,
                                      'childmort_m', ...,
                                      'childmort_f', ...)

childmort_2015.show()

In [None]:
_ = lab02.grade('test_childmort_2015')

**Question: Make a scatterplot comparing child mortality for males (x axis) and females (y axis).**

In [None]:
...

**Question: Make a histogram showing the distribution of male and female child mortality across countries.**  
*[NOTE: You should be able to plot the histograms in a single plot.]* . 

In [None]:
...

**Question: Looking at the two plots you just made, what do you conclude about male and female child mortality? How similar or different are they?**

[ANSWER HERE]

[SOLUTION]  
Similar to patterns in life expectancy, there is a very strong positive relationship between male and female child mortality: when a country has low child mortality for one sex, it also has low child mortality for the other sex. However, within a country, male and female child mortality are not the same: instead, female child mortality is consistently lower than male child mortality.

### Make a dataset with both mortality indicators

Now let's examine how child mortality relates to life expectancy. To do so, we'll assemble a dataset with `e0` and `childmort` in the same place.

**Question: Use `join` to make a dataset that has `e0_m`, `e0_f`, `childmort_m`, and `childmort_f` for all countries in the UNPD database. Call your combined dataset `all_mort`:

In [None]:
all_mort = ...
all_mort.show()

**Question: Make a scatterplot that compares female life expectancy (x axis) to child mortality for males and females (both on the y axis)**

In [None]:
...

**Question: How would you describe the relationship between life expectancy and child mortality?**

[ANSWER HERE]

## Look at life expectancy over time

Finally, we'll start to explore how life expectancy has changed over time in some countries. This will be important to a big topic that we'll discuss a little later in the semester: the Demographic Transition.

The dataset we've used so far has the UNPD life tables for the period 2010-2015.  Now we'll open up the full UNPD datset, which has many time periods in it.

In [None]:
unpd = Table.read_table('../data/UNPD/unpd_life_tables_cleaned.csv')
unpd

The `period` column in the new datset has the year that is the end of a 5-year time interval. So, for example, when `period` is equal to 1955, the row refers to the 5-year window from 1950-1955.

**Question: Make a variable that has the unique periods that show up in the `unpd` dataset.**  
*[HINT: Use the same pattern we used above to create the `all_years` variable]*

In [None]:
all_periods = ...
all_periods

In [None]:
_ = lab02.grade('test_all_periods')

Recall that in a standard life table, life expectancy at birth is given by the entry in the `e` column for age. Therefore, we can get a dataset that has life expectancy for each country/sex/period by identifying the rows in the full dataset that correspond to age 0.

**Question: Fill in the code below to produce a dataset that has life expectancy at birth for each country/sex/period.**

In [None]:
e0_byperiod = unpd.where(...).select('area', 'sex', 'period', 'e')
e0_byperiod

In [None]:
_ = lab02.grade('test_e0_byperiod')

Above, we determined which countries had the highest and lowest life expectancies in 2010-2015.

**Question: Determine which countries had the highest and lowest female life expectancy in the period ending in 1960**

In [None]:
e0_1960_m = e0_byperiod.where(...).where('sex', are.equal_to('male'))
e0_1960_f = e0_byperiod.where(...).where('sex', are.equal_to('female'))

lowest_male_le = e0_1960_m.sort(...).take(0)
highest_male_le = e0_1960_m.sort(..., descending=True).take(0)
lowest_female_le = e0_1960_f.sort(...).take(0)
highest_female_le = e0_1960_f.sort(..., descending=True).take(0)

print("1955-60 Highest Female e0:", highest_female_le)
print("1955-60 Lowest Female e0:",  lowest_female_le)
print("1955-60 Highest Male e0:", highest_male_le)
print("1955-60 Lowest Male e0:", lowest_male_le)

ANSWER:  
Highest female e0: [ANSWER HERE]  
Lowest female e0: [ANSWER HERE]  
Highest male e0: [ANSWER HERE]  
Lowest male e0: [ANSWER HERE]  

**Question: Make a scatterplot that shows time (x axis) and life expectancy for females (y axis) for the following countries:**

* **Japan**
* **United States of America**
* **Mali**
* **China**

**(So this will be 4 plots in total)**

In [None]:
...

In [None]:
...

In [None]:
...

In [None]:
...

**Question: Do these countries show similar patterns in life expectancy over time? What do you conclude about how mortality has changed over the past 60 or so years?**

[ANSWER HERE]

**Question: Based on these plots, would you say that inequality in mortality has increased, decreased, or stayed the same over the past 60 or so years?**

[ANSWER HERE]

## Run all tests

This cell just re-runs all of the unit tests in the notebook, to summarize the results

In [None]:
# this cell runs all the tests at once!
print("Running all tests...")
_ = [lab02.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('test')]
print("Finished running all tests.")

### SUBMIT your assignment by MIDNIGHT on the day of class

Please submit your lab in by running the cell below. You can submit as many times as you want, up to midnight on the day of the class. **No late submissions are allowed**, and the system will prevent you from being able to submit late.

In [None]:
_ = lab02.submit()

## Optional challenge problems: Exploring adult mortality

In this lab, we looked at life expectancy and child mortality. Of course, mortality at adult ages is also important. One indicator that captures adult mortality is the life table probability of dying before age 60, among people who survive to age 15.

**Question: Using the model of child mortality, see if you can write code that calculates adult mortality for each country in 2015.**

In [None]:
...

**Question** Compare the indicators of child and adult mortality: which one tends to be higher? Which one tends to vary more from country to country?

In [None]:
...

### Don't forget to SUBMIT your assignment by MIDNIGHT on the day of class

If you attempted the challenge questions, great! Be sure to submit afterwards using the instructions in the cell above.