In [None]:
from IPython.core.display import HTML
from datascience import *

import matplotlib
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import os
plt.style.use('fivethirtyeight')

import pandas as pd
import zipfile
import io
import math

def css_styling():
    styles = open('../notebook_styles.css', 'r').read()
    return HTML(styles)
css_styling()

In [None]:
#Loading testing data
from client.api.notebook import Notebook 
hwk03 = Notebook('hwk03.ok')
_ = hwk03.auth(inline=True)

# Hwk 03 - The demographic transition since 1950

## Part I - Background [10 pts]

The first set of questions refer to the Population Reference Bureau's report, [The World at 7 Billion](https://www.prb.org/wp-content/uploads/2011/07/world-at-7-billion.pdf).  Please read the report and then answer the following questions **in about two sentences** each. (Each question is worth 2 points.)

**Question - According to the PRB Bulletin, developing countries had very high population growth rates in the second half of the 20th century. Why?**

[ANSWER HERE]

**Question - Accoriding to the PRB Bulletin, why does Uganda have a youthful age structure?**

[ANSWER HERE]

**Question - According to the PRB Bulletin, why is interpreting demographic trends in India more complex than it is for a country like Vietnam?**

[ANSWER HERE]

**Question - According to the PRB report, deaths have exceeded births in Germany since about 1972. Nonetheless, Germany has managed to avoid rapid population decline. Why?**

[ANSWER HERE]  

**Question - The PRB report divides the world's countries into four groups according to the stage of the demographic transition they are in. Which stage of the demographic transition covers the largest share of the world population?**

[ANSWER HERE]

## Part II - Looking at data [15 pts]

In this part of the homework, we'll dive into the UNPD data to examine the last 60 or so years of demographic change.

In addition to the reading you did for the first part of the homework, you can look at the [slides from lecture](https://docs.google.com/presentation/d/1DWktoPIhzQByRfruyIkqrJKtlP__79E9fsThni4JeIA/edit?usp=sharing) to review the main points of the demographic transition.

**Load the UNPD mortality data**

First, we'll read in a dataset that has life tables for all of the countries around the world in the time period 2010-2015:

In [None]:
unpd_lt = Table.read_table('../data/UNPD/unpd_life_tables_cleaned.csv')
unpd_lt

You can see that there are life tables for the time period from 2010 to 2015 for many countries in this dataset.

**Load the UNPD fertility data**

Next, we'll read in a couple of datasets that have estimated fertility measures for all of the countries in the world (we used these datasets in Lab 03). 

We'll start by loading estimated Total Fertility Rates (TFRs):

In [None]:
unpd_tfr = Table.read_table('../data/UNPD/unpd_tfr_cleaned.csv')
unpd_tfr

We'll start by extracting life expectancy at birth for males and for females from each country. As we've discussed over the past couple of classes, life expectancy at birth is a widely used indicator for the mortality experience of a population.

**Question - Write some code that will produce a dataset that has the columns `area`, `period`, `sex`, `age`, and `e` (where `e` is life expectancy at birth).**  
*[HINT: You should only need to use `where` and `select`]*

In [None]:
unpd_e0 = unpd_lt....
unpd_e0

In [None]:
_ = hwk03.grade('test_unpd_e0')

**Question - make a list that has all of the periods in the `unpd_lt` dataset**

In [None]:
all_periods = ...
all_periods

In [None]:
_ = hwk03.grade('test_all_periods')

**Question - Fill in the code below to produce a function that calculates female child mortality from the life table for the given country and year.**  
*[HINT: We did something similar to this in one of the Labs]*

In [None]:
def get_f_child_mortality(country, year):
    """
    Given the data for a life table, calculate the life table probability of death between 
    ages 0 and age 5.
    
    NOTE: this assumes that the life table is sorted by age; in particular, the first row
    (index 0) should be age 0, and the third row (index 2) should be age group 5.
    (The UNPD dataset should satisfy this requirement.)
    """
    
    lt_data = ...
    
    # calculate the life table probability of a child surviving from birth to age 5
    child_survival = ...
    
    # the probability of death is 1 - the probability of surviving
    child_mortality = 1 - child_survival

    return(child_mortality)

france_5q0_2010 = get_f_child_mortality('France', 2010)
france_5q0_2010

In [None]:
_ = hwk03.grade('test_get_f_child_mortality')

In practice, we'll actually want to grab all of the child mortality estimates over time for a particular country. Let's write a function to make that a bit easier.

**Question - fill in the code below to produce a function that returns child mortality for a particular country over all of the periods in the UNPD data.**  

In [None]:
def get_child_mortality_over_time(country):
    result = Table(labels=['area', 'period', 'child_mort'])
    
    for period in ...:
        ## cur_child_mort should have child mortality for the current
        ## country and time period        
        cur_child_mort = ...

        ## this builds up a Table with the results, row by row        
        result = result.with_row([country, period, cur_child_mort])
    
    return(result)

swe_5q0 = get_child_mortality_over_time('Sweden')
swe_5q0

In [None]:
_ = hwk03.grade('test_get_f_child_mortality_over_time')

Finally -- we're almost there! -- let's write a function that will return info about fertility, life expectancy, and child mortality for a given country over all of the time periods in the UNPD data.

**Question - Fill in the code below to produce a function that return a Table that has TFR, life expectancy, and child mortality over time for a given country and sex.**  
*[HINT: you should use the `unpd_tfr` and `unpd_e0` datasets, and you'll also need the `get_child_mortality_over_time` function. You'll also use `join`.]*

In [None]:
def get_trends(country):
    
    ## get TFR over time
    tfr_dat = ...
    
    ## get e0 over time
    e0_dat = ...
    
    ## get child mortality over time
    child_mort_dat = ...
    
    ## combine TFR, e0, and child mortality into a single Table
    country_dat = tfr_dat....
    
    ## keep only the columns we need
    country_dat = country_dat.select('area', 'period', 'tfr', 'e', 'child_mort')
    
    return(country_dat)
    
france_trends = get_trends('France')
france_trends

In [None]:
_ = hwk03.grade('test_get_trends')

OK, now we have written some tools that will help us easily explore the demographic transition over the past 60 or so years.

**Question - if the demographic transition theory was exactly right, what would you expect to see happen to (a) mortality and (b) fertility in a country whose transition started around 1950?**

[ANSWER HERE]

**Question - When thinking about the demographic transition, we've mainly discussed overall mortality (which is captured here by life expectancy). In a second, we'll look at child mortality as well as life expectancy. How would you predict that child mortality will factor into the demographic transition? Do you think that it will (a) not be related to changes in life expectancy or fertility; (b) play the same role as life expectancy (i.e., change in the same was as life expectancy); or (c) play a different role from life expectancy? Why?** 

[ANSWER HERE]

This helper function takes the data produced by `get_trends` (which you just wrote) and plots it nicely. Here's an example showing the trends for Kenya.

In [None]:
def plot_trends(country):
    dat = get_trends(country)
    dat.plot('period', 'e')
    plt.title(country + '- female life expectancy');
    dat.plot('period', 'tfr')
    plt.title(country + ' - TFR');
    dat.plot('period', 'child_mort')
    plt.title(country + ' - child mortality');

plot_trends('Kenya')

**Question - Use `get_trends` to plot the trends for the four countries discussed in the PRB report that you read for part I**  


In [None]:
...
...
...
...

**Question - Now that you have seen the data, would you say that the prediction you made about the role of child mortality was accurate? Why or why not?**

[ANSWER HERE]

### Growth

**Question - According to the demographic transition theory, if a country started a demographic transition around 1950 and finished around 2010, what would you expect to see happen to an indicator of population growth such as RONI or the instrinsic growth rate $r$ over that period?**

[ANSWER HERE]

First, we'll open up the UNPD crude birth rate estimates:

In [None]:
unpd_cbr = Table.read_table('../data/UNPD/unpd_cbr_cleaned.csv').select(['area', 'period', 'cbr'])
unpd_cbr

And then we'll open up the UNPD crude birth rate estimates:

In [None]:
unpd_cdr = Table.read_table('../data/UNPD/unpd_cdr_cleaned.csv').select(['area', 'period', 'cdr'])
unpd_cdr

The `join` function in the datascience package can only join on one column at once. Here, we need to line cbr and cdr up by country and by year simultaneously. So we'll have to turn to the `pandas` package to do this. (Don't worry if you haven't seen this before - I don't think it is covered in Data 8.)

In [None]:
unpd_roni = pd.merge(unpd_cbr.to_df(), unpd_cdr.to_df())
unpd_roni = Table.from_df(unpd_roni)
unpd_roni

**Question - Fill in the code below to calculate the rate of natural increase (RONI) for each country year.**

In [None]:
unpd_roni = unpd_roni.with_column('roni',
                                  ...)
unpd_roni

In [None]:
_ = hwk03.grade('test_roni')

It will be helpful to write a helper function to grab the time series of RONI values for a specific country.

**Question - Fill in the code below to produce a function that returns a Table with the RONI values for a given country over time.**

In [None]:
def get_roni_over_time(country):
    this_roni_data = ...
    return(this_roni_data)

ug_roni = get_roni_over_time('Uganda')
ug_roni

In [None]:
_ = hwk03.grade('test_roni_over_time')

Finally, this function uses the `get_roni_over_time` function you just wrote to plot RONI values over time.

In [None]:
def plot_roni(country):
    rd = get_roni_over_time(country)
    rd.plot('period', 'roni')
    plt.title(country + " RONI");
    
plot_roni("Kenya")

**Questio - Use `plot_roni` to plot the RONI time series for each of the four countries discussed in the PRB Report you read in Part I.**

In [None]:
...
...
...
...

**Question - Do these time trends in RONI seem consistent with the stage of the demographic transition each country is in?**

[ANSWER HERE]

## Run all tests

This cell just re-runs all of the unit tests in the notebook, to summarize the results

In [None]:
# this cell runs all the tests at once!
print("Running all tests...")
_ = [hwk03.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('test')]
print("Finished running all tests.")

### SUBMIT your assignment

Please submit your lab in by running the cell below. You can submit as many times as you want, up to the due date and time. **No late submissions are allowed**, and the system will prevent you from being able to submit late.

In [None]:
_ = hwk03.submit()