In [1]:
from IPython.core.display import HTML
from datascience import *

import matplotlib
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import os
plt.style.use('fivethirtyeight')

import pandas as pd
import zipfile
import io
import math

def css_styling():
    styles = open('../notebook_styles.css', 'r').read()
    return HTML(styles)
css_styling()

In [2]:
#Loading testing data
from client.api.notebook import Notebook 
hwk04 = Notebook('hwk04.ok')
_ = hwk04.auth(inline=True)

Assignment: L&S88-02: Hwk 04
OK, version v1.12.5

Successfully logged in as feehan@berkeley.edu


# Hwk 04

## Part I: Population growth and changes in mortality and fertility

In Labs 05 and 06, we took a look at formal demography and population projections.  We discovered, among other things, that both fertility and mortality can affect the intrinsic growth rate: when fertility increases, the growth rate increases; and, when mortality decreases, the growth rate increases.

In this part of the homework, we're going to (1) be sure we understand what different growth rates imply about population size over time; and (2) use the tools we developed in class to understand the relative importance of changes in fertility and mortality. In other words, we're going to try to determine which one makes a bigger difference to the growth rate.

As we did in Lab 6, we'll load the `leslie` module to make use of the various population projection functions.

In [3]:
import leslie

We have added a new function, called `build_leslie_matrix`, to the `leslie` module. `build_leslie_matrix` takes two arguments:

* `lt`, the first argument, is a lifetable, such as would be returned by the function `get_lt`
* `asfr`, the second argument, is a set of age-specific fertility rates, such as would be returned by the function `get_asfr`

Here is an example of `build_leslie_matrix` in action. The following code makes a Leslie matrix using (1) death rates for 1990 France (i.e. the 1990 France life table); and (2) fertility rates for 2000 Italy.

In [4]:
demo_lt = leslie.get_lt('France', 1990)
demo_asfr = leslie.get_asfr('Italy', 2000)

demo_lm = leslie.build_leslie_matrix(demo_lt, demo_asfr)
demo_lm.shape

(17, 17)

You'll remember that before we've only made Leslie matrices based on the real-world experience of a particular country in a particular year. The nice thing about `build_leslie_matrix` is that we can give it any life table and fertility data we like -- we're not restricted to fertility and mortality that actually happened.

What we're going to do now is to learn about whether changes in mortality or fertility have a bigger impact on the growth rate by constructing Leslie matrices for (i) a baseline scenario; (ii) fertility rates that are increased by 10%; (iii) mortality rates that are decreased by 10%. (We will look at mortality decreases and fertility increases since both with push the growth rate up, leading to more population growth.)

## What affects the growth rate more: mortality or fertility?

**Question - Grab the life tables and the fertility rates for 1990 Uganda and use `build_leslie_matrix` to make a Leslie matrix out of them.**

In [None]:
...
...

uganda_90_lm = leslie.build_leslie_matrix(..., ...)
uganda_90_lm

In [18]:
_ = hwk04.grade('test_uganda_lm')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 2
    Failed: 0
[ooooooooook] 100.0% passed



**Question - calculate the growth rate associated with the Leslie matrix you just created?**   
*[HINT: the `leslie` package has a function called `get_growth_rate` based on the one you used in Lab 05]*.

In [None]:
uganda_90_r = ...
uganda_90_r

In [24]:
_ = hwk04.grade('test_uganda_r')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



**Question - Does the growth rate you just calculated imply (a) long-term population growth; (b) no long-term population change; or (c) long-term population decline?**

[ANSWER HERE]

Great - so we have our baseline scenario. Now we'll investigate whether the growth rate gets changed more by increasing birth rates by 10 percent, or by lowering death rates by 10 percent.

**Question - Make a copy of the 1990 Uganda age-specific fertility rate data and then change the fertility rates (the `asfr` column) to make them 10% higher.**   

In [None]:
uganda_90_high_asfr = uganda_90_asfr.copy()
uganda_90_high_asfr = ...
uganda_90_high_asfr

In [36]:
_ = hwk04.grade('test_uganda_high_asfr')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



**Question - Make a Leslie matrix out of the increased Uganda ASFRs and Uganda's 1990 life table.**

In [None]:
uganda_high_fert_lm = ...
uganda_high_fert_lm

In [43]:
_ = hwk04.grade('test_uganda_high_asfr_lm')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 2
    Failed: 0
[ooooooooook] 100.0% passed



**Question - Now calculate the growth rate that results from increasing fertility rates by 10 percent.**

In [None]:
uganda_high_fert_r = ...
uganda_high_fert_r

In [45]:
_ = hwk04.grade('test_uganda_high_fert_r')

Assignment: L&S88-02: Hwk 04
OK, version v1.12.5

Successfully logged in as feehan@berkeley.edu
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



Now we'll turn to the final scenario, in which death rates are reduced by 10%.  This is a bit more complex than changing fertility rates: in the life table, when death rates change, that causes all of the other columns to change as well. So we've written a function that will take a life table and change the death rates (and all of the other columns) for you:

In [46]:
def change_lt_death_rates(lt, change, radix=100000):
    """
    Given a life table `lt`, such as would be returned by `get_lt()`,
    change the death rates by multiplying them by `change`
    """
    
    ## NOTE: this function assumes the a column does not change
    ## this is an approximation - a better approach would use graduation
    
    new_lt = lt.copy()

    # update m
    new_lt['death_rate'] = new_lt['death_rate'] * change

    # update q, assuming a stays the same (an approximation!)
    new_lt['q'] = (new_lt['age_interval_width']*new_lt['death_rate']) / \
                  (1 + (new_lt['age_interval_width'] - new_lt['a'])*new_lt['death_rate'])
    new_lt['q'][-1] = 0

    new_lt['p'] = 1 - new_lt['q']

    new_lt['l'][0] = radix

    for i in np.arange(start=1, stop=new_lt.num_rows):
        new_lt['l'][i] = new_lt['l'][i-1] * new_lt['p'][i-1]

    new_lt['d'] = np.append(new_lt['l'][:-1] - new_lt['l'][1:], new_lt['l'][-1])

    new_lt['L'] = np.append((new_lt['l'][1:]*new_lt['age_interval_width'][:-1]) + \
                                (new_lt['d'][:-1]*new_lt['a'][:-1]),\
                                new_lt['l'][-1]/new_lt['death_rate'][-1])
    
    new_lt['T'] = np.flip(np.cumsum(new_lt['L']), axis=0)
    
    new_lt['e'] = new_lt['T']/new_lt['l']
    
    return(new_lt)

## EXAMPLE: increase death rates for 2015 Canada by 10%
new_canada_lt = change_lt_death_rates(leslie.get_lt('Canada', 2015), 1.1)
new_canada_lt

row,variant,area,notes,country_code,period,age,age_interval_width,death_rate,q,p,l,d,L,S,T,e,a,sex
59148,Estimates,Canada,,124,2015,0,1,0.0047751,0.00475383,0.995246,100000.0,475.383,99554.6,0.995446,8270400.0,82.704,0.0629702,female
59149,Estimates,Canada,,124,2015,1,4,0.0001584,0.000633351,0.999367,99524.6,63.034,397942.0,0.99947,7906830.0,79.446,1.51697,female
59150,Estimates,Canada,,124,2015,5,5,8.14e-05,0.000406917,0.999593,99461.6,40.4726,497207.0,0.99958,7596450.0,76.3757,2.5,female
59151,Estimates,Canada,,124,2015,10,5,0.0001034,0.000516866,0.999483,99421.1,51.3874,496977.0,0.999229,7222630.0,72.6469,2.5,female
59152,Estimates,Canada,,124,2015,15,5,0.0002607,0.00130273,0.998697,99369.7,129.452,496556.0,0.998634,6806450.0,68.4962,2.73707,female
59153,Estimates,Canada,,124,2015,20,5,0.0003234,0.00161572,0.998384,99240.3,160.345,495809.0,0.998497,6362380.0,64.1109,2.55533,female
59154,Estimates,Canada,,124,2015,25,5,0.000341,0.00170359,0.998296,99079.9,168.791,494990.0,0.998222,5900140.0,59.5493,2.57117,female
59155,Estimates,Canada,,124,2015,30,5,0.0004565,0.00228003,0.99772,98911.1,225.521,494021.0,0.997549,5426010.0,54.8574,2.63108,female
59156,Estimates,Canada,,124,2015,35,5,0.0006424,0.00320719,0.996793,98685.6,316.504,492690.0,0.996312,4943870.0,50.0971,2.66689,female
59157,Estimates,Canada,,124,2015,40,5,0.001023,0.005103,0.994897,98369.1,501.977,490692.0,0.993952,4456440.0,45.3032,2.70101,female


**Question - Use `change_lt_death_rates` to produce a new lifetable with Uganda's 1990 death rates decreased by 10%.**

In [None]:
uganda_90_low_mort = ...
uganda_90_low_mort

In [53]:
_ = hwk04.grade('test_uganda_low_mort_lt')

Assignment: L&S88-02: Hwk 04
OK, version v1.12.5

Successfully logged in as feehan@berkeley.edu
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



**Question - Make a Leslie matrix out of the decreased death rates and Uganda's 1990 fertility rates.**

In [None]:
uganda_low_mort_lm = ...
uganda_low_mort_lm

In [57]:
_ = hwk04.grade('test_uganda_low_mort_lm')

Assignment: L&S88-02: Hwk 04
OK, version v1.12.5

Successfully logged in as feehan@berkeley.edu
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 2
    Failed: 0
[ooooooooook] 100.0% passed



**Question - what is the growth rate that results from reducing fertility rates by 10 percent?**

In [None]:
uganda_low_mort_r = ...
uganda_low_mort_r

In [59]:
_ = hwk04.grade('test_uganda_low_mort_r')

Assignment: L&S88-02: Hwk 04
OK, version v1.12.5

Successfully logged in as feehan@berkeley.edu
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



**Question - what made a bigger difference to the growth rate: reducing mortality rates or increasing fertility rates?**

[ANSWER HERE]

## Part II: Synthesizing in- and out-migration

In Lab 07, we looked at migration within the United States. We came up with two ways to measure migration for a particular county: the in-migration rate and the out-migration rate.

In this part of the homework, we're going to extend this analysis by trying to synthesize in-migration and out-migration for a given county.

I've created a library that has the code that we developed in Lab 07 for convenience; it's called `mig`. Let's load it now:

In [60]:
import mig

This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.



(Don't worry if you get a warrning and some red text from matplotlib.)   
The `mig` library has the `map_counties` function, which we'll use a little later.

In Lab 07, we made `in_migrants` and `out_migrants` Tables. We saved versions of these files to avoid having to repeat all of the analysis from Lab 07 for this homework. Let's load the files now:

In [61]:
out_migrants = Table.read_table('out_migrants.csv')
out_migrants

fips,num_out_migrants,county_name,state,pop_2015,omr
1001,1542,Autauga County,AL,54838,0.0281192
1003,2823,Baldwin County,AL,202863,0.0139158
1005,318,Barbour County,AL,26264,0.0121078
1007,464,Bibb County,AL,22561,0.0205665
1009,1444,Blount County,AL,57590,0.0250738
1011,120,Bullock County,AL,10419,0.0115174
1013,179,Butler County,AL,20141,0.00888734
1015,2091,Calhoun County,AL,115505,0.0181031
1017,962,Chambers County,AL,33968,0.0283208
1019,421,Cherokee County,AL,25741,0.0163552


In [62]:
in_migrants = Table.read_table('in_migrants.csv')
in_migrants

fips,num_in_migrants,county_name,state,pop_2015,imr
1001,1794,Autauga County,AL,54838,0.0327145
1003,3521,Baldwin County,AL,202863,0.0173565
1005,294,Barbour County,AL,26264,0.011194
1007,494,Bibb County,AL,22561,0.0218962
1009,1501,Blount County,AL,57590,0.0260636
1011,61,Bullock County,AL,10419,0.00585469
1013,150,Butler County,AL,20141,0.0074475
1015,1794,Calhoun County,AL,115505,0.0155318
1017,921,Chambers County,AL,33968,0.0271138
1019,438,Cherokee County,AL,25741,0.0170157


Recall that, for out migrants, `omr` is the out migration rate and for in-migrants, `imr` is the in-migration rate.

### Net migration rates

One way to synthesize in- and out-migration rates is to define a *net migration rate*:

$$
\text{Net-Migration Rate (NMR)} = \frac{\text{# people moving into county} - \text{# people moving out of county}}{\text{# people in county}}.
$$

(You'll notice that this is also equal to the in-migration rate minus the out-migration rate.)

The net migration rate is helpful because it tells us whether, overall, migration is increasing or decreasing the county population. Counties with a negative net migration rate are losing residents, while counties with a positive net migration rate are gaining residents.


**Question - Join the `out_migrants` and `in_migrants` tables together to produce a table that has `omr` and `imr` for each county.**

In [None]:
net_migrants = ...
net_migrants

In [70]:
_ = hwk04.grade('test_create_net_migrants')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 2
    Failed: 0
[ooooooooook] 100.0% passed



**Question - Add a column to the `net_migrants` table with the net migration rate; you should call the new column `nmr`.**

In [None]:
net_migrants = ...
net_migrants

In [77]:
_ = hwk04.grade('test_nmr')

Assignment: L&S88-02: Hwk 04
OK, version v1.12.5

Successfully logged in as feehan@berkeley.edu
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



**Question - Make a histogram that shows the distribution of net migration rates across all of the counties in the dataset.**

In [None]:
...

Your histogram should reveal that most counties have net migration rates that are pretty close to zero, and that the distribution is pretty symmetric.

Now we'll try to understand which countries have high and low net migration rates.

**Question - which three counties have the lowest net migration rates?**

In [None]:
...

[ANSWER HERE]

**Question - which counties have highest net migration rates?**

In [None]:
...

[ANSWER HERE]

**Question - Make a scatterplot that compares each county's population (x axis) to its net migration rate (y axis)**

In [None]:
...

**Question - What does the scatterplot lead you to conclude about the relationship between how extreme net migration rates are and the size of each county?**

[ANSWER HERE]

**Question - Make a map of net migration rates**   
*[NOTE: some counties which have little data will not show up on your map]*

In [None]:
...

### Population turnover 

The net migration rate tells us how in-migration and out-migration balance out to affect population size. But there is another way to synthesize in- and out-migration rates that we will call *population turnover*:

$$
\text{Population turnover rate (PTR)} = \frac{\text{# people moving into county} + \text{# people moving out of county}}{\text{# people in county}}.
$$

(You'll notice that this is also equal to the in-migration rate plus the out-migration rate.)

The population turnover rate tells us how much movement there is into and out of the county, without worrying about whether this movement ends up increasing or decreasing the size of the population.

**Question - Give an example of a situation in which two counties might have the same net migration rate of 0, but different population turnover rates.**

[ANSWER HERE]

**Question - Add a new column to your `net_migrants` dataset that has population turnover; call the column `ptr`.**

In [None]:
net_migrants = ...
net_migrants

In [81]:
_ = hwk04.grade('test_ptr')

Assignment: L&S88-02: Hwk 04
OK, version v1.12.5

Successfully logged in as feehan@berkeley.edu
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



**Question - Make a map of population turnover rates**   
*[NOTE: some counties which have little data will not show up on your map]*

In [None]:
...

Let's see if there appears to be any relationship between population turnover and net migration: do places with more change also tend to experience a net loss or gain of people?

**Question - Make a scatterplot comparing the population turnover rate (x axis) and the net migration rate (y axis) across all counties in the datset.**

In [None]:
...

It can be helpful to summarize the relationship shown in the scatterplot above with a single number. We'll use the correlation coefficient to do so. We haven't talked about the correlation coefficient in our class, but it was discussed in Data 8; briefly, the correlation coefficient summarizes the strength of the linear relationship between two variables: when the correlation coefficient is close to -1, the two variables have a very strong negative relationship; when the correlation coefficient is close to +1, the two variables have a very strong positive relationship; and when the correlation coefficients is near 0, the two variables are not related.

We'll use code from the Data 8 textbook to help calculate the correlation coefficient:

In [82]:
## from Data 8 textbook:
### https://www.inferentialthinking.com/chapters/15/2/Regression_Line.html
def standard_units(xyz):
    "Convert any array of numbers to standard units."
    return (xyz - np.mean(xyz))/np.std(xyz)  

def correlation(t, label_x, label_y):
    return np.mean(standard_units(t.column(label_x))*standard_units(t.column(label_y)))

**Question - Calculate the correlation between the net migration rate and the population turnover rate.**

In [None]:
nmr_ptr_corr = ...
nmr_ptr_corr

In [84]:
_ = hwk04.grade('test_corr_nmr_ptr')

Assignment: L&S88-02: Hwk 04
OK, version v1.12.5

Successfully logged in as feehan@berkeley.edu
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



**Question - What does the correlation coefficient suggest about the relationship between turnover and net migration across US counties?**

[ANSWER HERE]

Note that a more careful analysis, which we'll have to leave for the future, would try to account for the fact that smaller counties tend to have more extreme NMR and PTR values.

## Run all tests

This cell just re-runs all of the unit tests in the notebook, to summarize the results

In [85]:
# this cell runs all the tests at once!
print("Running all tests...")
_ = [hwk04.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('test')]
print("Finished running all tests.")

Running all tests...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 2
    Failed: 0
[ooooooooook] 100.0% passed

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running t

### SUBMIT your assignment

Please submit your lab in by running the cell below. You can submit as many times as you want, up to the due date and time. **No late submissions are allowed**, and the system will prevent you from being able to submit late.

In [None]:
_ = hwk04.submit()