In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab03.ipynb")

In [203]:
from IPython.core.display import HTML
from datascience import *

import matplotlib
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import os
plt.style.use('fivethirtyeight')

import pandas as pd
import zipfile
import io
import math

# Lab 03 - Fertility

So far, we've focused on one aspect of population wellbeing, and a key cause of population change: mortality.

Today, we're going to start to look at another way that populations change: fertility (i.e., having babies).

In many ways, fertility is more complex than mortality. Everyone who is born dies exactly once. On the other hand, there is no fixed number of children that women have: some women have many children, while others have none. Moreover, women who do have children can do so in many different ways: some women have children at young ages, some at older ages. The number of children women want to have may change with age, and the relationship between desired and realized fertility can be less than perfect. Generally, fertility is a complex social phenomenon that resists overly simplistic explanations. As we will start to see in this lab, though, there are still patterns and structure to human fertility that we can use to understand it better.

### Age-Specific Fertility Rates

$$
ASFR_a = \frac{\text{Number of births to women in age group}}{\text{Person-years of exposure among women in age group}}
$$

Note that, technically, what we'll look at today is called a *period age-specific fertility rate*. There are some subtle differences between period and cohort age-specific fertility rates, and we won't have time to go into details here. If you're curious to learn more, take Demography 110!

### The Total Fertility Rate

The TFR answers a hypothetical question: if a woman survived to the end of her childbearing years giving birth to children at the rate implied by the ASFRs, then the number of children she would, on average, give birth to is the TFR. Thus, the TFR can be roughly interpreted as an average number of children per woman.

Again, there are some subtleties here; to fully understand this approach, you'll have to take Demog 110.  

Mathematically, the TFR is fairly easy to define:

$$
TFR = \sum_a ASFR_a
$$

where $a$ runs across each year of age that women have children. (Often, this is taken to be from age 15 to about 50.)  

In the data we look at below, we'll be using five year age groups. In that case, a crude way to calculate the TFR from the ASFRs is to assume that the ASFR is the same for each year of age in the five year age groups. That means that our formula becomes

$$
TFR = 5 \sum_g ASFR_g,
$$

where $g$ ranges over the 5-year age groups. We multiply by 5 because there are 5 years in each age group.

### Mean age at childbearing

The age-specific fertility rates provide a lot of detail about the timing and level of fertility: we can see how high or low rates are, and we can see the pattern in which ages have the most childbearing.  The TFR, on the other hand, is a summary that only provides information about the fertility level; it does not tell us about the timing of fertility (at least not directly).

Since women in many countries are waiting until older ages to have children, it turns out to be useful to have an indicator that captures the timing of fertility. In fact, demographers tend to distinguish between the amount of fertility (captured by the TFR) and the *tempo* or timing of fertility.

The **mean age at childbearing (MAC)** is a way to summarize the timing of fertility. MAC can be derived from ASFRs as follows:

$$
MAC = \frac{\sum_a a \times ASFR_a}{\sum_a ASFR_a},
$$

where $a$ ranges over the ages of childbearing (usually 15-50).  

If you stare at the equation for a little while, you'll see that the mean age at childbearing is a weighted average of the childbearing ages, where the weights are given by the fertility rates at each age.

We'll get a chance to explore these three concepts -- age-specific fertility rate (ASFR), total fertility rate (TFR), and mean age at childbearing (MAC) in greater detail in the lab today.

## Introductions

<!-- BEGIN QUESTION -->

**What is your partner's name?**
<!--
BEGIN QUESTION
name: q_introname
points: 1
manual: True
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**What year is your partner in? (Freshman, Sophomore, etc)**
<!--
BEGIN QUESTION
name: q_introyear
points: 1
manual: True
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**How many siblings does your partner have? What are their names?**
<!--
BEGIN QUESTION
name: q_introsiblings
points: 1
manual: True
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



 

### Getting started

First, we'll read in a couple of datasets that have estimated fertility measures for all of the countries in the world. Like the life tables we used in the last lab, these fertility measures come from the United Nations Population Division.

We'll start by loading estimated Total Fertility Rates (TFRs):

In [204]:
unpd_tfr = Table.read_table('data/unpd_tfr_cleaned.csv')
unpd_tfr

We'll also load a datafile that has additional information about the countries in the UNPD dataset.

In [205]:
geo = Table.read_table('data/unpd_geo_cleaned.csv')
# we'll only use a few columns from this dataset
geo = geo.select(['area', 'region_name', 'subregion_name', 'income_level'])
geo

Like before, it will be useful to make a list that has the names of all of the countries in the dataset.

In [206]:
all_countries = np.unique(unpd_tfr['area'])
all_countries

In [207]:
print("There are", len(all_countries), "countries in the dataset.")

**Question - Make a dataset called `unpd_tfr_2015` that filters the TFR dataset down into rows that correspond to the 2015 time period only**
<!--
BEGIN QUESTION
name: q_create_tfr_dataset
points: 1
-->

In [208]:
unpd_tfr_2015 = ...
unpd_tfr_2015

In [None]:
grader.check("q_create_tfr_dataset")

<!-- BEGIN QUESTION -->

**Question - Make a histogram of the TFR values around the world in 2015**
<!--
BEGIN QUESTION
name: q_tfr_histogram
points: 1
manual: true
-->

In [211]:
...

<!-- END QUESTION -->



Your histogram should reveal that TFR values vary enormously: some countries seem to have TFRs that are quite high -- in the range of 6 or 7 children per woman, while other countries have very low TFRs -- two or fewer children per women.

One reason for this huge amount of variation is that we are looking at all of the countries in the world at once.  Let's dig a little deeper to see if we can find groups of countries that have more similar TFRs.

**Question - There's more information about the countries in the dataset in the `geo` table.  Please join the table `geo` onto  `unpd_tfr_2015` so we can use this information; call your new table `unpd_tfr_2015`.**
<!--
BEGIN QUESTION
name: q_join_geo_tfr
points: 2
-->

In [212]:
unpd_tfr_2015 = ...
unpd_tfr_2015

In [None]:
grader.check("q_join_geo_tfr")

<!-- BEGIN QUESTION -->

**Question - Fill in the loop below to make histograms of the total fertility rate separately for each region in the world. Make sure your histogram has bins of width 0.5 that go from 0 up to 8; this will make it easier to compare the regions.**  
*[HINT: the variable `region_name` has the region that each country is in]*
<!--
BEGIN QUESTION
name: q_tfr_hist_loop
points: 3
manual: true
-->

In [215]:
# list of all regions that we can use to loop through
all_regions = np.unique(...)

for region in all_regions:
    
    ## be sure to use the dataset you made with TFR values for 2015 (not all time periods)
    cur_region_tfr = ...
    cur_region_tfr.hist(..., bins=...)
    
    ## (this adds a title to your plot)
    plt.title(region)



<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question - Using the same pattern as above, make a histogram for each value of the `income_level` variable. This will show how TFR is distributed for countries that are relatively wealthy and poor.**  
*[NOTE: income level is not available for some small countries; these countries will be at income level nan (for 'not a number'. This is essentially missing data.]*
<!--
BEGIN QUESTION
name: q_tfr_income_hist_loop
points: 5
manual: true
-->

In [216]:
# list of all regions that we can use to loop through
all_incomes = ...

for ... in ...:
    
    cur_region_tfr = ...
    cur_region_tfr.hist(..., bins=...)
    
    ## (this adds a title to your plot)
    plt.title(income)

<!-- END QUESTION -->



To complement the visualization, let's calculate a quantitative summary for each region by taking the average of the region's TFR values.

**Question - Calculate the average TFR by region, and make a table that has a sorted list of the regions by average TFR**  
*[HINT: You can do this using the `select`, `group`, and `sort` methods]*
<!--
BEGIN QUESTION
name: q_regional_tfr_avg
points: 3
-->

In [217]:
regional_tfr = ...
regional_tfr

(If you've done the calculations correctly, you should see an average TFR very close to 3.05 for Oceania.)

If you have extra time, you could also calculate average TFR by income level; you should see a pretty clear pattern.

In [218]:
## SOLUTION
income_tfr = ...
income_tfr

### Join in economic indicators

The analysis above suggested that both region and income seem to be pretty strongly related to levels of fertility. Let's bring in some economic and social data to dig deeper.  
  
This code reads in a dataset that has a few of the World Bank's World Development Indicators:

In [219]:
wdi_dat = Table.read_table('data/wdi_health_2015_cleaned.csv')
wdi_dat

**Question - Join the WDI data onto the TFR data for 2015**
<!--
BEGIN QUESTION
name: q_join_wdi_tfr
points: 3
-->

In [220]:
unpd_tfr_2015_econ = unpd_tfr_2015.join(..., ..., ...)
unpd_tfr_2015_econ

In [None]:
grader.check("q_join_wdi_tfr")

Now we'll make a few plots to get a sense for how a couple of these economic/social indicators are associated with TFR.

<!-- BEGIN QUESTION -->

**Question - Make a scatterplot that compares GDP (x axis) and TFR (y axis)**
<!--
BEGIN QUESTION
name: q_tfr_vs_gdp_scatter
points: 1
manual: true
-->

In [223]:
...

<!-- END QUESTION -->



**Question - Now calculate the log of gdp and store it in a column called `loggdp`**

In [224]:
...

<!-- BEGIN QUESTION -->

**Question - Now make a scatterplot that compares log gdp (x axis) to TFR (y axis).**
<!--
BEGIN QUESTION
name: q_tfr_vs_loggdp
points: 1
manual: true
-->

In [225]:
...

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question - In general, what relationship is evident between a country's wealth (as measured by GDP per capita) and its fertility (as measured by TFR)?**
<!--
BEGIN QUESTION
name: q_tfr_vs_loggdp_comment
points: 1
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



**Question - Now make a scatterplot that shows the relationship between the percentage of secondary students that is female (`pctf_secondary`, x axis) and TFR (y axis)**

In [226]:
...

<!-- BEGIN QUESTION -->

**Question - In general, what relationship is evident between a country's gender equity in education (as measured by `pctf_secondary`) and its fertility (as measured by TFR)?**
<!--
BEGIN QUESTION
name: q_gender_tfr_comment
points: 1
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



## Fertility by age: Age-specific fertility rates

We've seen that aggregate levels of fertility, as captured by the TFR, vary tremendously around the world. From our analysis so far, it appears that wealther countries with more gender-balanced educational programs tend to have lower TFRs.

Now we'll dig deeper into the phenomenon of fertility, looking closely at age-specific fertility rates. These ASFRs contain much more detailed information about the fertility experience of women in the UNPD countries, as we will see.

We'll start by reading in Age-Specific Fertility Rates (ASFRs):

In [227]:
unpd_asfr = Table.read_table('data/unpd_asfr_cleaned.csv')
unpd_asfr

And we'll make a dataset that just has the ASFRs for 2015.

In [228]:
unpd_asfr_2015 = unpd_asfr.where('period', are.equal_to(2015))
unpd_asfr_2015

It will be helpful to work with a specific example to get started. Let's take a look at the age-specific rates in Canada:

In [229]:
can_asfr = unpd_asfr_2015.where('area', are.equal_to('Canada'))
can_asfr

<!-- BEGIN QUESTION -->

**Question - plot the age-specific fertility rates for Canada, with age on the x axis and the fertility rates on the y axis**
<!--
BEGIN QUESTION
name: q_asfr_canada_plot
points: 1
manual: true
-->

In [230]:
...

<!-- END QUESTION -->



You'll see that fertility in Canada looks like a hump: fertility rates increase with age until about 30-35, where they peak; then they decrease until 45-50.  This general shape is pretty common, though the level (height) of the ASFRs and the exact shape of the hump varies quite a bit from country to country.  

To help explore this variation, let's write some code to facilitate comparing two countries' ASFRs.

**Question - Fill in the missing code below to make a dataset that compares age-specific fertility rates for two countries**
<!--
BEGIN QUESTION
name: q_compare_asfr_fn
points: 6
-->

In [231]:
def compare_asfr(country1, country2, period):
    """
    This function returns a Table with three columns: the first column is age groups,
    the second column is the ASFRs of country1 in the given time period, and the third column
    is the ASFRs of country2 in the given time period.
    """
    
    c1_asfr = unpd_asfr.where('period', ...).where('area', ...)
    c2_asfr = unpd_asfr.where('period', ...).where('area', ...)
    
    result = Table().with_columns('age', ...,
                                  country1, ...,
                                  country2, ...)
    
    return(result)

canmex = compare_asfr('Canada', 'Mexico', 2015)
canmex

In [None]:
grader.check("q_compare_asfr_fn")

Now that we have written our function, we can easily compare two countries' ASFRs on a plot:

In [234]:
canmex.plot('age')

**Question - Pick two countries in each region (except North America, which we've done), and plot the ASFRs of the two countries to compare them.** 

In [235]:
...
...
...
...
...

<!-- BEGIN QUESTION -->

**Question - Do your plots suggest that the general pattern of ASFRs are similar within region? Between regions?**
<!--
BEGIN QUESTION
name: q_region_asfr_plot_comment
points: 2
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



### The relationship between TFR and ASFR

**Question - Using the `canmex` dataset you created above, check that the TFR for Canada is about 1.61.**  
*[HINT: Use the equation for TFR described at the top of the lab]*  
*[NOTE: the ASFRs in the UNPD data are shown per 1,000 to make them easier to read. So, for example, when the dataset has a value of 200, this corresponds to an ASFR of 200 per 1,000 which is otherwise known as 0.2]*
<!--
BEGIN QUESTION
name: q_canada_tfr
points: 1
-->

In [236]:
canada_tfr = ...
canada_tfr

In [None]:
grader.check("q_canada_tfr")

## Mean age at childbearing

Finally, we'll take a look at the timing of childbearing. As we discussed above, the mean age at childbearing is one indicator of timing.  

The ASFR dataset has age groups listed by the lowest age in the 5-year group. So, for example, the row in the dataset corresponding to age 15 means the age group of 15 to 19 year olds.  In order to calculate the mean age at childbearing, we need the midpoint of the age group, which we can find by adding 2.5 (which is half of 5) to the lowest age in the group.

Let's look again at the ASFR data for Canada in 2015:

In [238]:
can_asfr

**Question - Create a new column in `can_asfr` that has the midpoint of each age group. Call the new column `mid_age`.**
<!--
BEGIN QUESTION
name: q_mid_age_asfr
points: 1
-->

In [239]:
can_asfr = ...
can_asfr

In [None]:
grader.check("q_mid_age_asfr")

**Question - Using your new column `mid_age` and the formula at the start of the lab, calculate the mean age at childbearing for Canada in 2015.**
<!--
BEGIN QUESTION
name: q_canada_mac
points: 1
-->

In [242]:
canada_mac = ...
canada_mac

In [None]:
grader.check("q_canada_mac")

You should get an answer close to 30.27.  
  
Following our usual pattern, let's take this example and generalize it by writing a function. Then we can use the function to calculate the mean age at childbearing for every country in the world.

**Question - Fill in the code below to write a function that calculates MAC given the age-specific fertility rates**
<!--
BEGIN QUESTION
name: q_calculate_mac_fn
points: 2
-->

In [244]:
def calculate_mac(asfr_data):
    asfr_data = asfr_data.with_column('mid_age', ...)
    mac = ...
    return(mac)

canada_mac_from_function = calculate_mac(can_asfr)
canada_mac_from_function # you should get the same answer you did above for Canada's 2015 MAC

In [None]:
grader.check("q_calculate_mac_fn")

**Question - Now complete the loop below to calculate MAC for all of the countries in the world in 2015**  
*[Hint: loop through each country in the dataset to do this]*
<!--
BEGIN QUESTION
name: q_loop_mac_2015
points: 4
-->

In [246]:
mac_values = make_array()

for country in ... :
    cur_asfrs = unpd_asfr_2015.where(..., ...)
    mac_values = np.append(mac_values, ...)
    
mac_2015 = Table().with_columns('area', all_countries,
                                'mac', mac_values)
mac_2015

In [None]:
grader.check("q_loop_mac_2015")

**Question - Finally, join the MAC values onto the TFR values and economic indicators in `unpd_tfr_2015_econ`**
<!--
BEGIN QUESTION
name: q_mac_tfr_join
points: 2
-->

In [248]:
unpd_tfr_2015_econ = ...
unpd_tfr_2015_econ

In [None]:
grader.check("q_mac_tfr_join")

**Question - Use your combined dataset to plot the mean age at childbearing (x axis) and the TFR (y axis).**
<!--
BEGIN QUESTION
name: q_plot_tfr_vs_mac
points: 1
-->

In [251]:
...

<!-- BEGIN QUESTION -->

**Question - What relationship do you see between MAC and TFR? Does it appear as though the amount of fertility (TFR) and the timing of fertility (MAC) are linked?**
<!--
BEGIN QUESTION
name: q_comment_tfr_vs_mac
points: 1
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



### Submit your assignment by MIDNIGHT on the day of class

Please submit your lab in by running the cell below at the end of the lab. You can submit as many times as you want, up to midnight on the day of the class. No late submissions are allowed, and the system will prevent you from being able to submit late.

## Optional challenge questions

**Question - Using a loop similar to the ones we constructed above, make scatterplots that show the relationship between MAC (x axis) and TFR (y axis) within each region. Do the within-region relationships look qualitatively different from the global analysis we performed above?**

In [252]:
...

### Look at Age-Specific Fertility Rates over time

So far, we've examined age-specific fertility rates in one time period (2010-15), and we've summarized the timing of age-specific fertility by looking at the mean age at childbearing.

Now we're going to explore another visualization that will help us understand how a country's fertility has changed over time.

**Question - Fill in the missing code to produce a function that will plot the age-specific fertility rates for a single country over many time periods**

In [253]:
def compare_asfr_bytime(country):
    
    ## get the data for the country (across all periods)
    c1_asfr = unpd_asfr.where(..., ...)
    
    ## get a list with each period
    all_periods = ...
    
    ## get a list with all of the ages, which we'll include with the output
    all_ages = ...
    
    ## we'll store our output in this table;
    ## to start, we'll include the ages
    result = Table().with_columns('age', ...)
    
    for period in all_periods:
        
        these_asfrs = ...
        result = result.with_column(str(period), these_asfrs)

    return(result)

can_history = compare_asfr_bytime('Canada')
can_history

You can use your function to make plots like this one:

In [254]:
can_history = compare_asfr_bytime('Canada')
can_history.plot('age')
plt.title('Canada')

**Question - Pick several countries that you expect to have very different socioeconomic conditions and use the `compare_asfr_bytime` function to look at how the shape of their ASFRs has changed. Do you notice anything systematic about how fertility has been changing?**

In [255]:
...

In [256]:
...

In [257]:
...

**Challenge question (optional) -- See if you can get the colors on that plot to smoothly change from light to dark, based on the period. (I couldn't figure out how to do this!)**

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export()