In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("hwk03.ipynb")

In [1]:
from IPython.core.display import HTML
from datascience import *

import matplotlib
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import os
plt.style.use('fivethirtyeight')

import pandas as pd
import zipfile
import io
import math

# Hwk 03

## Part I: Population growth and changes in mortality and fertility

In Labs 05 and 06, we took a look at formal demography and population projections.  We discovered, among other things, that both fertility and mortality can affect the intrinsic growth rate: when fertility increases, the growth rate increases; and, when mortality decreases, the growth rate increases.

In this part of the homework, we're going to (1) be sure we understand what different growth rates imply about population size over time; and (2) use the tools we developed in class to understand the relative importance of changes in fertility and mortality. In other words, we're going to try to determine which one makes a bigger difference to the growth rate.

As we did in Lab 6, we'll load the `leslie` module to make use of the various population projection functions.

In [2]:
import leslie

We have added a new function, called `build_leslie_matrix`, to the `leslie` module. `build_leslie_matrix` takes two arguments:

* `lt`, the first argument, is a lifetable, such as would be returned by the function `get_lt`
* `asfr`, the second argument, is a set of age-specific fertility rates, such as would be returned by the function `get_asfr`

Here is an example of `build_leslie_matrix` in action. The following code makes a Leslie matrix using (1) death rates for 1990 France (i.e. the 1990 France life table); and (2) fertility rates for 2000 Italy.

In [3]:
demo_lt = leslie.get_lt('France', 1990)
demo_asfr = leslie.get_asfr('Italy', 2000)

demo_lm = leslie.build_leslie_matrix(demo_lt, demo_asfr)
demo_lm.shape

You'll remember that before we've only made Leslie matrices based on the real-world experience of a particular country in a particular year. The nice thing about `build_leslie_matrix` is that we can give it any life table and fertility data we like -- we're not restricted to fertility and mortality that actually happened.

What we're going to do now is to learn about whether changes in mortality or fertility have a bigger impact on the growth rate by constructing Leslie matrices for (i) a baseline scenario; (ii) fertility rates that are increased by 10%; (iii) mortality rates that are decreased by 10%. (We will look at mortality decreases and fertility increases since both with push the growth rate up, leading to more population growth.)

## What affects the growth rate more: mortality or fertility?

**Question - Grab the life tables and the fertility rates for 1990 Uganda and use `build_leslie_matrix` to make a Leslie matrix out of them.**
<!--
BEGIN QUESTION
name: q_uganda_build_leslie
points: 3
-->

In [4]:
...
...

uganda_90_lm = leslie.build_leslie_matrix(..., ...)
uganda_90_lm

In [None]:
grader.check("q_uganda_build_leslie")

**Question - calculate the growth rate associated with the Leslie matrix you just created?**   
*[HINT: the `leslie` package has a function called `get_growth_rate` based on the one you used in Lab 05]*.
<!--
BEGIN QUESTION
name: q_uganda_r
points: 1
-->

In [7]:
uganda_90_r = ...
uganda_90_r

In [None]:
grader.check("q_uganda_r")

<!-- BEGIN QUESTION -->

**Question - Does the growth rate you just calculated imply (a) long-term population growth; (b) no long-term population change; or (c) long-term population decline?**
<!--
BEGIN QUESTION
name: q_comment_uganda_r
points: 2
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



Great - so we have our baseline scenario. Now we'll investigate whether the growth rate gets changed more by increasing birth rates by 10 percent, or by lowering death rates by 10 percent.

**Question - Make a copy of the 1990 Uganda age-specific fertility rate data and then change the fertility rates (the `asfr` column) to make them 10% higher.**   
<!--
BEGIN QUESTION
name: q_uganda_high_asfr
points: 2
-->

In [9]:
uganda_90_high_asfr = uganda_90_asfr.copy()
...
uganda_90_high_asfr

In [None]:
grader.check("q_uganda_high_asfr")

**Question - Make a Leslie matrix out of the increased Uganda ASFRs and Uganda's 1990 life table.**
<!--
BEGIN QUESTION
name: uganda_high_asfr_lm
points: 2
-->

In [11]:
uganda_high_fert_lm = ...
uganda_high_fert_lm

In [None]:
grader.check("uganda_high_asfr_lm")

**Question - Now calculate the growth rate that results from increasing fertility rates by 10 percent.**
<!--
BEGIN QUESTION
name: q_uganda_high_fert_r
points: 2
-->

In [14]:
uganda_high_fert_r = ...
uganda_high_fert_r

In [None]:
grader.check("q_uganda_high_fert_r")

Now we'll turn to the final scenario, in which death rates are reduced by 10%.  This is a bit more complex than changing fertility rates: in the life table, when death rates change, that causes all of the other columns to change as well. So we've written a function that will take a life table and change the death rates (and all of the other columns) for you:

In [16]:
def change_lt_death_rates(lt, change, radix=100000):
    """
    Given a life table `lt`, such as would be returned by `get_lt()`,
    change the death rates by multiplying them by `change`
    """
    
    ## NOTE: this function assumes the a column does not change
    ## this is an approximation - a better approach would use graduation
    
    new_lt = lt.copy()

    # update m
    new_lt['death_rate'] = new_lt['death_rate'] * change

    # update q, assuming a stays the same (an approximation!)
    new_lt['q'] = (new_lt['age_interval_width']*new_lt['death_rate']) / \
                  (1 + (new_lt['age_interval_width'] - new_lt['a'])*new_lt['death_rate'])
    new_lt['q'][-1] = 0

    new_lt['p'] = 1 - new_lt['q']

    new_lt['l'][0] = radix

    for i in np.arange(start=1, stop=new_lt.num_rows):
        new_lt['l'][i] = new_lt['l'][i-1] * new_lt['p'][i-1]

    new_lt['d'] = np.append(new_lt['l'][:-1] - new_lt['l'][1:], new_lt['l'][-1])

    new_lt['L'] = np.append((new_lt['l'][1:]*new_lt['age_interval_width'][:-1]) + \
                                (new_lt['d'][:-1]*new_lt['a'][:-1]),\
                                new_lt['l'][-1]/new_lt['death_rate'][-1])
    
    new_lt['T'] = np.flip(np.cumsum(new_lt['L']), axis=0)
    
    new_lt['e'] = new_lt['T']/new_lt['l']
    
    return(new_lt)

## EXAMPLE: increase death rates for 2015 Canada by 10%
new_canada_lt = change_lt_death_rates(leslie.get_lt('Canada', 2015), 1.1)
new_canada_lt

**Question - Use `change_lt_death_rates` to produce a new lifetable with Uganda's 1990 death rates decreased by 10%.**
<!--
BEGIN QUESTION
name: q_uganda_low_mort_lt
points: 2
-->

In [17]:
uganda_90_low_mort = ...
uganda_90_low_mort

In [18]:
## SOLUTION ##
np.all(np.isclose(uganda_90_low_mort.sort('age').column('l'), np.array([ 100000.        ,   90404.77226289,   84020.22998037,   81512.80757699,   80148.28013978,   78597.21928117,   76512.43986196,   73780.81185709,   70778.20669182,   67418.65487262,   64028.97850365,   60586.81967234,   56724.82073579,   52201.82195324,   46391.54729107,   38784.81996299,   29262.78369323,   18933.37636218,    9576.92948656]), rtol=.005))

**Question - Make a Leslie matrix out of the decreased death rates and Uganda's 1990 fertility rates.**
<!--
BEGIN QUESTION
name: q_uganda_low_mort_lm
points: 2
-->

In [19]:
uganda_low_mort_lm = ...
uganda_low_mort_lm

In [None]:
grader.check("q_uganda_low_mort_lm")

**Question - what is the growth rate that results from reducing fertility rates by 10 percent?**
<!--
BEGIN QUESTION
name: q_uganda_low_mort_r
points: 2
-->

In [22]:
uganda_low_mort_r = ...
uganda_low_mort_r

In [23]:
np.isclose(uganda_low_mort_r, 0.034288990578645963, rtol=.005)

<!-- BEGIN QUESTION -->

**Question - what made a bigger difference to the growth rate: reducing mortality rates or increasing fertility rates?**
<!--
BEGIN QUESTION
name: q_comment_uganda_mortvsfert
points: 2
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



## Part II: Synthesizing in- and out-migration

In Lab 07, we looked at migration within the United States. We came up with two ways to measure migration for a particular county: the in-migration rate and the out-migration rate.

In this part of the homework, we're going to extend this analysis by trying to synthesize in-migration and out-migration for a given county.

I've created a library that has the code that we developed in Lab 07 for convenience; it's called `mig`. Let's load it now:

In [27]:
import mig

(Don't worry if you get a warrning and some red text from matplotlib.)   
The `mig` library has the `map_counties` function, which we'll use a little later.

In Lab 07, we made `in_migrants` and `out_migrants` Tables. We saved versions of these files to avoid having to repeat all of the analysis from Lab 07 for this homework. Let's load the files now:

In [29]:
out_migrants = Table.read_table('data/out_migrants.csv')
out_migrants

In [30]:
in_migrants = Table.read_table('data/in_migrants.csv')
in_migrants

Recall that, for out migrants, `omr` is the out migration rate and for in-migrants, `imr` is the in-migration rate.

### Net migration rates

One way to synthesize in- and out-migration rates is to define a *net migration rate*:

$$
\text{Net-Migration Rate (NMR)} = \frac{\text{# people moving into county} - \text{# people moving out of county}}{\text{# people in county}}.
$$

(You'll notice that this is also equal to the in-migration rate minus the out-migration rate.)

The net migration rate is helpful because it tells us whether, overall, migration is increasing or decreasing the county population. Counties with a negative net migration rate are losing residents, while counties with a positive net migration rate are gaining residents.


**Question - Join the `out_migrants` and `in_migrants` tables together to produce a table that has `omr` and `imr` for each county.**
<!--
BEGIN QUESTION
name: q_create_net_migrants
points: 2
-->

In [32]:
net_migrants = ...
net_migrants

In [None]:
grader.check("q_create_net_migrants")

**Question - Add a column to the `net_migrants` table with the net migration rate; you should call the new column `nmr`.**
<!--
BEGIN QUESTION
name: q_nmr
points: 3
-->

In [36]:
net_migrants = ...
net_migrants

In [None]:
grader.check("q_nmr")

<!-- BEGIN QUESTION -->

**Question - Make a histogram that shows the distribution of net migration rates across all of the counties in the dataset.**
<!--
BEGIN QUESTION
name: q_nmr_hist
points: 2
manual: true
-->

In [78]:
...

<!-- END QUESTION -->



Your histogram should reveal that most counties have net migration rates that are pretty close to zero, and that the distribution is pretty symmetric.

Now we'll try to understand which countries have high and low net migration rates.

**Question - write some code to show the rows in `net_migrants` corresponding to the three counties that have the lowest net migration rates?**
<!--
BEGIN QUESTION
name: q_bottom3_nmr
points: 3
-->

In [44]:
bottom3_nmr = ...
bottom3_nmr

In [None]:
grader.check("q_bottom3_nmr")

_Type your answer here, replacing this text._

**Question - which counties have highest net migration rates?**
<!--
BEGIN QUESTION
name: q_top3_nmr
points: 3
-->

In [52]:
top3_nmr = ...
top3_nmr

In [None]:
grader.check("q_top3_nmr")

_Type your answer here, replacing this text._

<!-- BEGIN QUESTION -->

**Question - Make a scatterplot that compares each county's population (x axis) to its net migration rate (y axis)**
<!--
BEGIN QUESTION
name: q_nmr_popn_scatterplot
points: 2
manual: true
-->

In [57]:
...

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question - What does the scatterplot lead you to conclude about the relationship between how extreme net migration rates are and the size of each county?**
<!--
BEGIN QUESTION
name: q_comment_nmr_popn
points: 2
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question - Make a map of net migration rates**   
*[NOTE: some counties which have little data will not show up on your map]*
<!--
BEGIN QUESTION
name: q_nmr_map
points: 2
manual: true
-->

In [58]:
...

<!-- END QUESTION -->



### Population turnover 

The net migration rate tells us how in-migration and out-migration balance out to affect population size. But there is another way to synthesize in- and out-migration rates that we will call *population turnover*:

$$
\text{Population turnover rate (PTR)} = \frac{\text{# people moving into county} + \text{# people moving out of county}}{\text{# people in county}}.
$$

(You'll notice that this is also equal to the in-migration rate plus the out-migration rate.)

The population turnover rate tells us how much movement there is into and out of the county, without worrying about whether this movement ends up increasing or decreasing the size of the population.

<!-- BEGIN QUESTION -->

**Question - Give an example of a situation in which two counties might have the same net migration rate of 0, but different population turnover rates.**
<!--
BEGIN QUESTION
name: q_comment_nmr_turnover
points: 2
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**Question - Add a new column to your `net_migrants` dataset that has population turnover; call the column `ptr`.**
<!--
BEGIN QUESTION
name: q_ptr
points: 3
-->

In [60]:
net_migrants = ...
net_migrants

In [None]:
grader.check("q_ptr")

<!-- BEGIN QUESTION -->

**Question - Make a map of population turnover rates**   
*[NOTE: some counties which have little data will not show up on your map]*
<!--
BEGIN QUESTION
name: q_ptr_map
points: 2
manual: true
-->

In [62]:
...

<!-- END QUESTION -->



Let's see if there appears to be any relationship between population turnover and net migration: do places with more change also tend to experience a net loss or gain of people?

<!-- BEGIN QUESTION -->

**Question - Make a scatterplot comparing the population turnover rate (x axis) and the net migration rate (y axis) across all counties in the datset.**
<!--
BEGIN QUESTION
name: q_scatter_nmr_ptr
points: 2
manual: true
-->

In [63]:
...

<!-- END QUESTION -->



It can be helpful to summarize the relationship shown in the scatterplot above with a single number. We'll use the correlation coefficient to do so. We haven't talked about the correlation coefficient in our class, but it was discussed in Data 8; briefly, the correlation coefficient summarizes the strength of the linear relationship between two variables: when the correlation coefficient is close to -1, the two variables have a very strong negative relationship; when the correlation coefficient is close to +1, the two variables have a very strong positive relationship; and when the correlation coefficients is near 0, the two variables are not related.

We'll use code from the Data 8 textbook to help calculate the correlation coefficient:

In [64]:
## from Data 8 textbook:
### https://www.inferentialthinking.com/chapters/15/2/Regression_Line.html
def standard_units(xyz):
    "Convert any array of numbers to standard units."
    return (xyz - np.mean(xyz))/np.std(xyz)  

def correlation(t, label_x, label_y):
    return np.mean(standard_units(t.column(label_x))*standard_units(t.column(label_y)))

**Question - Calculate the correlation between the net migration rate and the population turnover rate.**
<!--
BEGIN QUESTION
name: q_ptr_corr
points: 1
-->

In [65]:
nmr_ptr_corr = ...
nmr_ptr_corr

In [None]:
grader.check("q_ptr_corr")

<!-- BEGIN QUESTION -->

**Question - What does the correlation coefficient suggest about the relationship between turnover and net migration across US counties?**
<!--
BEGIN QUESTION
name: q_comment_ptr_nmr_corr
points: 2
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



Note that a more careful analysis, which we'll have to leave for the future, would try to account for the fact that smaller counties tend to have more extreme NMR and PTR values.

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Please be sure to upload your submission to Gradescope.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export()