## Global Population Growth

Welcome to Lab 10! In this lab, we will use data to learn demography, practice plots, and review table methods.

In [None]:
# Run this cell to set up the notebook, but please don't change it.

# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *

# These lines do some fancy plotting magic.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
Table.interactive_plots()

# These lines load the tests.
import otter
grader = otter.Notebook()

The global population of humans reached 1 billion around 1800, 3 billion around 1960, and 7 billion around 2011. (Estimates before 1800, pictured below, are sparse.)  The potential impact of exponential population growth has concerned scientists, economists, and politicians alike.

**Question 1.** Load the table `old_pop.csv` using `Table().read_table("file.csv")`, which contains population data back to the year 1 CE.  Make a line plot of the population over time. 

Remember that the line plot function, `tbl.plot("x", "y")`, takes 2 arguments: the name of the column for quantitative variable you want to graph on the X-axis first, followed by the name of the column for the quantitative variable for the Y-axis second.

In this case, we want to use a line plot rather than a scatter plot because the data follows a specific order (from 1 CE to the 21st century) with only one y-value per x-value.

In [None]:
old_pop = ...
...

Notice the exponential growth curve. Demographers attribute the rapid growth of human populations, especially after 1500, to various advances in technology, such as agriculture and sanitation. 

The UN Population Division estimates that the world population will likely continue to grow throughout the 21st century, but at a slower rate, perhaps reaching 11 billion by 2100. However, the UN does not rule out scenarios of more extreme growth as well as major events such as the COVID-19 pandemic.

<a href="http://www.pewresearch.org/fact-tank/2015/06/08/scientists-more-worried-than-public-about-worlds-growing-population/ft_15-06-04_popcount/"> 
 <img src="pew_population_projection.png"/> 
</a>

In this lab, we will examine some of the more recent factors that influence population growth and how they are changing around the world.

The main dataset that we'll use today, `population.csv`, contains the total population of each country over time in more recent years as well as predictions for the future. This information comes from [Gapminder](https://www.gapminder.org/), a nonprofit that promotes sustainable global development. Run the cell below to load it.

In [None]:
# The population.csv file can also be found online here:
# https://github.com/open-numbers/
# The version here was downloaded in July, 2020.
population = Table.read_table('population.csv')
population.show(5)

### Bangladesh

In the `population` table, the `geo` column contains three-letter codes established by the [International Organization for Standardization](https://en.wikipedia.org/wiki/International_Organization_for_Standardization) (ISO) in the [Alpha-3](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3#Current_codes) standard. We will begin by taking a close look at Bangladesh. Inspect the standard to find the 3-letter code for Bangladesh.

**Question 2.** Create a table called `b_pop` that has **two columns** labeled `time` and `population_total` containing the corresponding values for Bangladesh. The first column should only contain the years from 1970 through 2015 (including both 1970 and 2015) and the second should contain the population of Bangladesh in each of those years.

*Hint:* How do we create a table that contains rows with only a certain value? What are the values we're looking for? Make sure you read the Alpha-3 standard.

<img src="predicates.jpg"> 

In [None]:
b_pop = ...
b_pop

In [None]:
grader.check('q2')

Run the following cell to create a table called `b_five` that has the population of Bangladesh every five years. At a glance, it appears that the population of Bangladesh has been growing quickly indeed!

In [None]:
## Just run this cell.

b_pop.set_format('population_total', NumberFormatter)

fives = np.arange(1970, 2016, 5) # 1970, 1975, 1980, ...
b_five = b_pop.sort('time').where('time', are.contained_in(fives))
b_five

**Question 3.** Create a table called `b_five_growth` that includes three columns: `time`, `population_total`, and `annual_growth`. There should be one row for every five years from **1970 through 2010 (but not 2015)**. The first two columns are the same as `b_five`. The third column is the **annual** growth rate for each five-year period. For example, the annual growth rate for 1975 is the yearly exponential growth rate that describes the total growth from 1975 to 1980 when applied 5 times.

Recall that the formula for exponential growth rate is as follows:
(`changed` / `initial`) ** (1/`time unit`) - 1

What is the time unit equal to here (i.e. how many years have passed between `changed` and `initial`?) What `are.???` predicates do we have to use to get the years?

*Hint*: Only your `b_five_growth` table is graded for correctness; the other names are suggestions that you are welcome to use, change, or delete.

In [None]:
b_1970_through_2010 = ...
initial = ...
changed = ...
b_five_growth = ...
b_five_growth

In [None]:
grader.check('q3')

In [None]:
## This cell creates a line plot of the changes in annual growth rate. Just run it. 
b_five_growth.plot("time", "annual_growth")

While the population has grown every five years since 1970, the annual growth rate decreased dramatically from 1985 to 2005. Let's look at some other information in order to develop a possible explanation. Run the next cell to load three additional tables of measurements about countries over time.

In [None]:
## Just run this cell.
life_expectancy = Table.read_table('life_expectancy.csv')
child_mortality = Table.read_table('child_mortality.csv').relabeled(2, 'child_mortality_under_5_per_1000_born')
fertility = Table.read_table('fertility.csv')
life_expectancy.show(5)

The `life_expectancy` table contains a statistic that is often used to measure how long people live, called *life expectancy at birth*. This number, for a country in a given year, [does not measure how long babies born in that year are expected to live](http://blogs.worldbank.org/opendata/what-does-life-expectancy-birth-really-mean). Instead, it measures how long someone would live, on average, if the *mortality conditions* in that year persisted throughout their lifetime. These "mortality conditions" describe what fraction of people at each age survived the year. So, it is a way of measuring the proportion of people that are staying alive, aggregated over different age groups in the population.

**Question 4.** Perhaps population is growing more slowly because people aren't living as long. 

Using the `life_expectancy` table, create a line plot using `tbl.plot`, placing the years 1970 to 2010 (including 2010) on the horizontal axis and the *life expectancy at birth* for each year in Bangladesh. What trend do you see?

In [None]:
...

**Question 5.** Does the graph above help directly explain why the population growth rate decreased from 1985 to 2010 in Bangladesh? Why or why not? What happened in Bangladesh in 1991, and does that event explain the change in population growth rate?

*Hint:* Try googling what happened in Bangladesh in 1991.

*Write your answer here, replacing this text.*

The `fertility` table contains a statistic that is often used to measure how many babies are being born, the *total fertility rate*. This number describes the [number of children a woman would have in her lifetime](https://www.measureevaluation.org/prh/rh_indicators/specific/fertility/total-fertility-rate), on average, if the current rates of birth by age of the mother persisted throughout her child bearing years, assuming she survived through age 49. 

**Question 6.** Using the `fertility` table, create a line plot that shows how Bangladesh's fertility rate has changed from 1970 to 2010, but not including 2015. 

If you're interested in looking at other countries as well, we've created some variables that you may use when you write your code. If you want to view other countries, simply reassign `country_code` or `start` to another value (such as country_code = "phl" for Philippines and start = 1980 for 1980). We'll learn in class and in the next lab more useful ways to create "adaptable" code through a process called *defining a function*. 

In [None]:
country_code = "bgd"
start = 1970

...

**Question 7.** Does the graph above help directly explain why the population growth rate decreased from 1985 to 2010 in Bangladesh? Why or why not?

*Write your answer here, replacing this text.*

It has been observed that lower fertility rates are often associated with lower child mortality rates. The link has been attributed to family planning: if parents can expect that their children will all survive into adulthood, then they will choose to have fewer children. We can see if this association is evident in Bangladesh by plotting the relationship between total fertility rate and [child mortality rate per 1000 children](https://en.wikipedia.org/wiki/Child_mortality).

In short, the cell below combines the `fertility` and `child_mortality` tables together into a single table, to facilitate comparisons. Run it and read the output.

In general, we can use the `tbl.join("col1", tbl2, "col2")` method to join 2 tables that have different data but share values (such as country codes) in a column.

In [None]:
## Just run this cell. 
fertility_and_child_mortality = fertility.where('geo', 'bgd').where("time", are.above_or_equal_to(1950)).drop('geo').join('time', child_mortality.where('geo', 'bgd').drop('geo')).relabeled(1, 'Children per woman').relabeled(2, 'Child deaths per 1000 born')
fertility_and_child_mortality

**Question 8.** 
There's a lot going on in the line of code above and the description we provided doesn't explain too much. Take some time to read through it, and explain in words what is happening step by step with each method call in the Markdown cell below.

*Replace the ellipses below.*

1. Using the fertility table, create a table with only the rows that have the value "bgd" in the "geo" column.

2. ...

3. ...

4. Combine the table with the child_mortality table with "bgd" rows only and without the "geo" column by joining them together at their year values.

5. ...

6. ...

**Question 9.** Using the new `fertility_and_child_mortality` table, draw a **scatter diagram** that graphs Bangladesh's fertility rate (in children per woman) on the horizontal axis against its child mortality (in child deaths per 1000 born). As with the previous graphs, only show the data from **1970 to 2010, but not including 2015.**

In this case, we are graphing two quantitative variables that do not necessarily follow a specific order, so we should use a scatter plot. While we could theoretically use a line plot (since this particular dataset only has 1 y-value per x-value), a line plot would **not** necessarily the best plot to use since the x-axis points are not ordered in the same sense as other ordered variables such as time. Scatter plots in general are better at finding the correlation or association between any two quantitative variables, but line plots are better tools at showing the change in a variable over time. 

In [None]:
...

**Question 10.** In one or two sentences, describe the association (if any) that is illustrated by this scatter diagram. Does the diagram show that reduced child mortality causes parents to choose to have fewer children?

*Write your answer here, replacing this text.*

### The World (Histograms Review)

The change observed in Bangladesh since 1970 can also be observed in many other developing countries: health services improve, life expectancy increases, and child mortality decreases. At the same time, the fertility rate often plummets, and so the population growth rate decreases despite increasing longevity.

Run the cell below to generate two overlaid histograms, one for 1960 and one for 2010, that show the distributions of total fertility rates for these two years among all 201 countries in the `fertility` table.

In [None]:
Table().with_columns(
    '1960', fertility.where('time', 1960).column(2),
    '2010', fertility.where('time', 2010).column(2)
).hist(bins=np.arange(0, 10, 0.5), unit='child')
_ = plots.xlabel('Children per woman')
_ = plots.xticks(np.arange(10))

**Question 11.** Assign `fertility_statements` to an array of the numbers for each statement below that can be correctly inferred from these histograms.
1. About the same number of countries had a fertility rate between 3.5 and 4.5 in both 1960 and 2010.
1. In 2010, about 40% of countries had a fertility rate between 1.5 and 2.
1. In 1960, less than 20% of countries had a fertility rate below 3.
1. More countries had a fertility rate above 3 in 1960 than in 2010.
1. At least half of countries had a fertility rate between 5 and 8 in 1960.
1. At least half of countries had a fertility rate below 3 in 2010.

To approach this question, you should evaluate each statement separately. If it is true, include it in the array, but if it is not true, do not include it in the array. We recommend working through the whole problem for your own learning before you run the grader, since it may reveal the correct answer once you run it.

In [None]:
fertility_statements = ...

In [None]:
grader.check('q11')

If you still have time, take a break and watch this spirited presentation by the late [Hans Rosling in a 2010 TEDx talk](https://www.gapminder.org/videos/reducing-child-mortality-a-moral-and-environmental-imperative) on mortality and fertility in the world.

## Submission

You're done with this lab!

To submit this notebook, please download your notebook as a .ipynb file and submit to Gradescope. You can do so by navigating to the toolbar at the top of this page, clicking File > Download as... > Notebook (.ipynb). Then, go to our class's Gradescope page [here](https://www.gradescope.com/courses/136698) and upload your file under "Lab 10." 

To check your work for all autograded questions, run the cell below. 

It's fine to submit multiple times, but we will only grade the final notebook you submit for each assignment. Make sure you pass all tests to receive credit.

In [None]:
# For your convenience, you can run this cell to run all the tests at once!
grader.check_all()