In [None]:
# Initialize OK
from client.api.notebook import Notebook
ok = Notebook('hwk01.ok')

In [129]:
from IPython.core.display import HTML
from datascience import *

import matplotlib
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('fivethirtyeight')
import os
def css_styling():
    styles = open('../notebook_styles.css', 'r').read()
    return HTML(styles)
css_styling()

# Hwk 01 - Mortality

In this homework, we'll continue our exploration of the United Nations Population Division's [estimated life tables](https://esa.un.org/unpd/wpp/Download/Standard/Mortality/).  We'll draw upon the techniques we learned in Lab 01 (where we studied US life tables) and Lab 02 (where we studied the UN life tables).

We'll start by opening up the UNPD life table data for 2015:

In [130]:
unpd_2015 = Table.read_table('../data/UNPD/unpd_life_tables_2015_cleaned.csv')
unpd_2015

We'll also make a list with the names of all of the countries, like we did in Lab 02.

In [131]:
all_countries = np.unique(unpd_2015['area'])
all_countries

## Part 1 - Looking at life table functions

In Lab 01, we saw in US state life tables that the log of the mortality rate had a characteristic shape. In this first section, we'll check to see if this phenomenon seems to apply to life tables from other parts of the world.

**Question 1: Add a column to `unpd_2015` that has the log of the death rate.**

<!--
BEGIN QUESTION
name: q_1
points: 1
-->

In [132]:
unpd_2015['log_death_rate'] = ...
unpd_2015

In [None]:
ok.grade("q_1");

**Question 2. Plot age (x axis) and the log mortality rate (y axis) for women in (a) Japan; (b) Mali; (c) Thailand; and (d) Brazil. [So, you will make four different plots.]**

**2a - Japan**
<!--
BEGIN QUESTION
name: q_2a
points: 1
manual: True
-->
<!-- EXPORT TO PDF -->

In [134]:
unpd_2015.where('area', ...).where('sex', ...).plot(..., ...)

**Question 2b - Mali**
<!--
BEGIN QUESTION
name: q_2b
points: 1
manual: True
-->
<!-- EXPORT TO PDF -->

In [135]:
# Mali
unpd_2015.where('area', ...).where('sex', ...).plot(..., ...)

**Question 2c - Thailand**
<!--
BEGIN QUESTION
name: q_2c
points: 1
manual: True
-->
<!-- EXPORT TO PDF -->

In [136]:
# Japan
unpd_2015.where('area', ...).where('sex', ...).plot(..., ...)

**Question 2d - Brazil**
<!--
BEGIN QUESTION
name: q_2d
points: 1
manual: True
-->
<!-- EXPORT TO PDF -->

In [137]:
# Brazil
unpd_2015.where('area', ...).where('sex', ...).plot(..., ...)

**Question 3. Do these four countries show the same general shape of log mortality rates that we saw for California in Lab 01?**
<!--
BEGIN QUESTION
name: q_3
points: 1
manual: True
-->
<!-- EXPORT TO PDF -->

*Write your answer here, replacing this text.*

## Part 02 - Adult mortality

In Lab 02, we looked at life expectancy at birth and at an indicator of child mortality called ${}_5q_0$. Now we'll extend our analysis to an indicator of adult mortality called ${}_{45}q_{15}$.  ${}_{45}q_{15}$ is the life table probability of death before age 60, *conditional on surviving to age 15*. So, ${}_{45}q_{15}$ can be written

$$
{}_{45}q_{15} = \frac{\text{# life table deaths between ages 15 and 60}}{\text{# life table survivors to age 15}}
$$

Note that, unlike child mortality, the denominator is **not** the life table number of births; instead, the denominator is the life table number of people who survive to age 15.
Similar to child mortality, we can also write ${}_{45}q_{15}$ as one minus a survival probability:

$$
{}_{45}q_{15} = 1 - \frac{\text{# life table survivors to 60}}{\text{# life table survivors to age 15}}.
$$

This second expression will turn out to be a bit more useful in practice.


Before we get started looking at the data, let's pause for a second and check that we understand what we want to measure here.  Take this example of the life table for Kenyan males:

In [138]:
unpd_2015.where('area', are.equal_to('Kenya')).where('sex', are.equal_to('male')).show()

**Question: Use the life table above to calculate adult mortality for Kenyan males by hand.**
<!--
BEGIN QUESTION
name: q_lt_adultmort_kenya_m
points: 1
-->

In [139]:
kenya_m_adultmort = ... # just type the numbers you need to calculate this in by hand (from the life table above)
kenya_m_adultmort

In [None]:
ok.grade("q_lt_adultmort_kenya_m");

You should get an answer of about 0.282. This means that in the synthetic population described by the life table, there is about a 28% chance that someone who survives to age 15 will continue to survive to age 60.

**Question: Look at the rows in the life table. What is the index of the row corresponding to (a) age 15? (b) age 60?**  

*[HINT: Remember that indexes start at zero; so, for example, the second row has index 1]*
<!--
BEGIN QUESTION
name: q_ffqf_index
points: 2
-->

In [141]:
index_for_age_15 = ...
index_for_age_60 = ...

In [None]:
ok.grade("q_ffqf_index");

**Question: Now fill in the code below to calculate adult mortality for Kenyan males (the same thing you just calculated by hand).**
<!--
BEGIN QUESTION
name: q_ffqf_kenya_m
points: 2
-->

In [144]:
lt_data = unpd_2015.where('area', are.equal_to(...)).where('sex', are.equal_to(...))
kenya_m_ffqf = 1 - (lt_data[...][...] / lt_data[...][...])
kenya_m_ffqf

In [None]:
ok.grade("q_ffqf_kenya_m");

You should get the same answer that you got when calculating by hand.  
  
Now that we know how to calculate adult mortality for a specific example, let's write a function that will help us calculate it everywhere.

**Question - fill in the code below to make a function that calculates adult mortality for a given life table**
<!--
BEGIN QUESTION
name: q_ffqf_kenya_m_fn
points: 3
-->

In [146]:
def get_adult_mortality(lt_data):
    """
    Given the data for a life table, calculate the life table probability of death between 
    ages 15 and age 60.
    
    NOTE: this assumes that the life table is sorted by age
    """
    adult_survival = lt_data[...][...] / lt_data[...][...]
    adult_mortality = ...
    
    return(adult_mortality)

# check on Kenyan males
kenya_f_lt = unpd_2015.where('area', are.equal_to('Kenya')).where('sex', are.equal_to('male'))
kenya_f_am = get_adult_mortality(kenya_f_lt)
print("adult mortality for Kenyan males: ", kenya_f_am)

In [None]:
ok.grade("q_ffqf_kenya_m_fn");

Now that we have a function, we can calculate adult mortality for every country in the UNPD database by writing a loop.

**Question - Fill in the missing parts of the code below to calculate adult mortality for all of the countries in the UNPD database.**
<!--
BEGIN QUESTION
name: q_adultmort
points: 4
-->

In [148]:
country_adultmort_m = make_array()
country_adultmort_f = make_array()

lt_m = unpd_2015.where('sex', are.equal_to('male'))
lt_f = unpd_2015.where('sex', are.equal_to('female'))


for country in all_countries:
    country_adultmort_m = np.append(country_adultmort_m,
                                    ...)
    country_adultmort_f = np.append(country_adultmort_f,
                                    ...)
    
adultmort_2015 = Table().with_columns('country', ...,
                                      'adultmort_m', ...,
                                      'adultmort_f', ...)

adultmort_2015.show()

In [None]:
ok.grade("q_adultmort");

## Part 03 - Which economic and social factors are related to mortality?

Recall that in Lab 02, we compared life expectancy and chlid mortality. Now that we've calculated adult mortality, we can add it to our analysis.

To avoid having to rerun all of the analysis from Lab 02, I've saved the important results (life expectancy and child mortality) in a dataset. We'll load that dataset now.

In [151]:
all_mort = Table.read_table('lab02_all_mort.csv')
all_mort

`all_mort` has some of the results we calculated in Lab 02: for each country, it has life expectancy (`e0_m` and `e0_f`) and child mortality. Now we want to add the adult mortality values.

**Question - Use `join` to add in adult mortality indicators you calculated (stored in `adultmort_2015`) to the `all_mort` dataset.**
<!--
BEGIN QUESTION
name: q_adult_join
points: 2
-->

In [152]:
all_mort = ...
all_mort.show()

In [None]:
ok.grade("q_adult_join");

**Question - Make a scatter plot that compares female life expectancy (x axis) and female adult mortality (y axis).**
<!--
BEGIN QUESTION
name: q_fffqf_vs_fe0
points: 2
manual: True
image: True
-->
<!-- EXPORT TO PDF -->

In [155]:
...

**Question - Now make a scatter plot that compares female child mortality (x axis) and female adult mortality (y axis)**
<!--
BEGIN QUESTION
name: q_fffqf_vs_ffq0
points: 2
manual: True
image: True
-->
<!-- EXPORT TO PDF -->

In [156]:
...

**Question - Thinking about the two plots above, which relationship is stronger? In other words, if you wanted to predict a country's female adult mortality, would you rather use female life expectancy or female child mortality?**
<!--
BEGIN QUESTION
name: q_twoplots
points: 1
manual: True
-->
<!-- EXPORT TO PDF -->

*Write your answer here, replacing this text.*

### Join in development indicators

The World Bank periodically estimates lots of indicators related to different aspects of international development. We'll look at a couple here:

1. [GDP per capita (Current US $)](https://datacatalog.worldbank.org/search?search_api_views_fulltext_op=AND&query=NY.GDP.PCAP.CD&nid=&sort_by=search_api_relevance&sort_by=search_api_relevance)
2. [Health Expenditure, Total (% of GDP)](https://datacatalog.worldbank.org/search?search_api_views_fulltext_op=AND&query=SH.XPD.TOTL.ZS&nid=&sort_by=search_api_relevance&sort_by=search_api_relevance)
3. [Percentage of secondary school students that is female](http://databank.worldbank.org/data/reports.aspx?source=world-development-indicators)

Note that these indicators are for the single year 2012, in order to approximate the midpoint of the 2010-2015 time period of the UNDP life tables.

We'll read these indicators in now.

In [157]:
wdi_dat = Table.read_table('../data/WB/wdi_health_2015_cleaned.csv')
wdi_dat

Now we'd like to add the economic/development indicators to the mortality dataset. This will enable us to study how the economic/development indicators are related to mortality.

**Question - Join the economic/development indictors (`wdi_dat`) onto the mortality data (`all_mort`); call the resulting table `all_mort_econ`.**  
*[HINT: the column 'country' in `all_mort` can be matched with the column called 'area' in `wdi_dat`]*
<!--
BEGIN QUESTION
name: q_econ_join
points: 2
-->

In [158]:
all_mort_econ = ...
all_mort_econ

In [None]:
ok.grade("q_econ_join");

Now that we have the data together, we'll make plots that compare life expectancy to the three social/economic indicators.

**Question - make a scatter plot that shows GDP on the x axis and life expectancy (for males and females) on the y axis**
<!--
BEGIN QUESTION
name: q_gdp_e0
points: 3
manual: True
image: True
-->
<!-- EXPORT TO PDF -->

In [161]:
...

**Question - make a scatter plot that shows health expenditure (`hlthexp`) on the x axis and life expectancy (for males and females) on the y axis**
<!--
BEGIN QUESTION
name: q_hlthexp_e0
points: 3
manual: True
image: True
-->
<!-- EXPORT TO PDF -->

In [162]:
...

**Question - make a scatter plot that shows the percentage of female secondary school students (`pctf_secondary`) on the x axis and life expectancy (for males and females) on the y axis**
<!--
BEGIN QUESTION
name: q_secondaryed_e0
points: 3
manual: True
image: True
-->
<!-- EXPORT TO PDF -->

In [163]:
...

**Question - For each of the three scatterplots, describe whether you think it suggests a strong relationship, weak relationship, or no relationship to life expectancy**
<!--
BEGIN QUESTION
name: q_e0_relationships
points: 2
manual: True
-->
<!-- EXPORT TO PDF -->

*Write your answer here, replacing this text.*

**Question - Suppose a new technology is discovered that allows very poor countries to increase their GDP dramatically.  Does the scatterplot above tell us anything about what would happen to life expectancy in those countries?**
<!--
BEGIN QUESTION
name: q_gdp_relationships
points: 2
manual: True
-->
<!-- EXPORT TO PDF -->

*Write your answer here, replacing this text.*

The scatterplot you just made comparing GDP and life expectancy is sometimes called the **Preston Curve**. It's famous! (Well, famous in some circles.)

### Run all tests

This cell just re-runs all of the unit tests in the notebook, to summarize the results

In [164]:
# this cell runs all the tests at once!
print("Running all tests...")
if os.path.isdir("tests"):
    _ = [ok.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q_')]
print("Finished running all tests.")

## SUBMIT your assignment by the deadline

Please submit your lab in by running the cell below. You can submit as many times as you want, up to midnight on the day of the deadline. 

# Submit
Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output.
**Please save before submitting!**

<!-- EXPECT 13 EXPORTED QUESTIONS -->

In [None]:
# Save your notebook first, then run this cell to submit.
import jassign.to_pdf
jassign.to_pdf.generate_pdf('hwk01.ipynb', 'hwk01.pdf')
ok.submit()