In [None]:
from IPython.core.display import HTML
from datascience import *

import matplotlib
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import os
plt.style.use('fivethirtyeight')

import pandas as pd
import zipfile
import io
import math

def css_styling():
    styles = open('../notebook_styles.css', 'r').read()
    return HTML(styles)
css_styling()

In [None]:
#Loading testing data
from client.api.notebook import Notebook 
lab04 = Notebook('lab04.ok')
_ = lab04.auth(inline=True)

# Lab 04 - Introduction to population growth

## Introductions

**What is your partner's name?**

[ANSWER HERE]

**What year is your partner in? (Freshman, Sophomore, etc)**

[ANSWER HERE]

**Who is your partner's favorite band or singer?**

[ANSWER HERE]

## Growth

Today, we started class by talking about population growth in general. [Here are the slides](https://docs.google.com/presentation/d/1GGKyk3dl_yB3Gd5EO9rM_nLSgqRytQf5H9Gpm333lpQ/edit?usp=sharing), which have the formulas we developed.

We'll start by opening up estimating population counts from the United Nations Population Division.

In [None]:
unpd_pop = Table.read_table('../data/UNPD/unpd_pop_cleaned.csv')
unpd_pop

Note that the population counts are reported in thousands of people. So Djibouti's 1950 `population` value of 62.001 means that UNPD estimates that there were 62.001 X 1000 = 62,001 people in Djibouti in 1950.

Following the pattern we've used in previous labs, we'll look closely at one country, and then we'll generalize our analysis.

**Question - Plot the population for Malawi over time (so, time on the x axis and population on the y axis)**

In [None]:
...

Now we'll deepen our understanding of growth rates by calculating the growth rate in Malawi from 1960 to 1961 by hand. We'll do this in two steps: first, we'll filter down to Malawi data; second, we'll look at those filtered data and plug the population for 1961 and 1960 into the formula for the growth rate $r$.

**Question - Grab the population data for Malawi**

In [None]:
malawi = ...
malawi.show()

### The growth rate $r$

As we discussed, we can measure the rate at which a population's size is changing over time by calculating the growth rate $r$. If a population starts out being size $K(0)$ at time $t=0$ and ends up being size $K(T)$ at time $t=T$, then the growth rate $r$ satisfies the equation

$$
K(T) = e^{rT}K(0).
$$

In particular, when we consider a time period of $T=1$ year, we have

$$
K(1) = e^{r}K(0).
$$
  
We can now solve for $r$:
  
$$
\begin{aligned}
e^{r} &= \frac{K(1)}{K(0)}\\
\Leftrightarrow r &= \log\left(\frac{K(1)}{K(0)}\right)\\
\Leftrightarrow r &= \log\left(K(1)\right) - \log\left(K(0)\right).
\end{aligned}
$$

If we wish to express $r$ as a percentage, we can multiply it by 100.

**Question - Looking at the table above, calculate the growth rate for Malawi from 1960 to 1961. Express your answer as a percentage.**

In [None]:
## Malawi growth rate from 1960 to 1961
malawi_r_1960 = ...
malawi_r_1960

In [None]:
_ = lab04.grade('test_malawi_1960_r')

You should get an answer of $r \approx 0.0222$, which is a growth rate of about 2.2 percent

To develop some intuition for growth rates, we'll calculate growth rates for Malawi over time.  There are different ways to do this; since we're practicing loops in this class, we'll write a loop.  
  
The first step is to get a set of periods to loop over.

**Question - Get a list of the unique periods in the dataset**

In [None]:
all_periods = ...
all_periods

In [None]:
_ = lab04.grade('test_all_periods')

**Question - Fill in the code of the loop below to calculate the growth rate in Malawi for all periods in the dataset, and to then plot the growth rates and the population.**  
*[HINT: Since a growth rate is calculated from two time periods, you will end up with one fewer growth rate than time periods. For example, if there are 10 time periods, you will only be able to calculate 9 growth rates. This is why the `for` loops over `all_periods[1:]`]*

In [None]:
country = 'Malawi'
country_pop = unpd_pop.where(...)

## SOLUTION
growth_rates = make_array()

for period in all_periods[1:]:
    t0_pop = ...
    t1_pop = ...
    growth_rates = np.append(growth_rates,
                             ...)

country_r = Table().with_columns('period', ...,
                                'r', ...)
country_r
country_r.plot('period')
unpd_pop.where('area', are.equal_to(country)).sort('period').plot('period', 'population')

In [None]:
_ = lab04.grade('test_malawi_r')

**Question - The code you filled in above is very general, in the sense that you can change the value of the `country` variable to make it do the same analysis for any other country. To make it even more useful, turn it into a function called `plot_country_growth`. Your function should take the name of the country to plot as an argument, and it should produce the two plots that the loop above produced.**

In [None]:
# (NB: this just means you have to fill in several lines, not necessarily exactly 3 of them)
...
...
... 

In [None]:
# test your function out
plot_country_growth('Sweden')

**Question - Now use your function to plot the growth rate and population size over time for several different countries. (Try to pick a diverse set to explore.)  What does your exploration suggest about the difference between the growth rate $r$ and the population size over time? For example, which of the two quantities tends to vary more?**

In [None]:
### explore several countries
...

[ANSWER HERE]

Now we want to explore growth across many countries at a fixed point in time.

**Question - Get a list of the unique countries in the dataset**

In [None]:
all_countries = ...
all_countries

In [None]:
_ = lab04.grade('test_all_countries')

**Fill in the code below to calculate the growth rate r from 1990 to 1991 in each country in the UNPD dataset.**

In [None]:
growth_rates = make_array()

for country in all_countries:
    t0_pop = ### take the population for the country/year from the unpd_pop dataset
    t1_pop = ### take the population for the country/year from the unpd_pop dataset
    
    growth_rates = np.append(growth_rates,
                            ...)

unpd_1990_r = Table().with_columns('area', ...,
                                   'r', ...)
unpd_1990_r

In [None]:
unpd_1990_r

In [None]:
_ = lab04.grade('test_unpd_1990_r')

**Question - Make a histogram of growth rates across the world in 1990**

In [None]:
...

**Question - How much variation is there in growth rates around the world? What is a high and what is a low growth rate?**

[ANSWER HERE]

### Doubling times

**Question - Given a population's growth rate, what is the formula for its doubling time? What does the doubling time mean?**  
*[HINT: You can look at the slides for today if you have forgotten the formula for doubling time]*

[ANSWER HERE]

**Question - Write some code to add a new column `Tdbl` to the `unpd_1990_r` dataset; the new column should have the doubling time for each country.**  
*[HINT: When calculating doubling time, be sure that the growth rate $r$ is not expressed as a percentage, but in its natural units]*

In [None]:
unpd_1990_r = ...
unpd_1990_r

In [None]:
_ = lab04.grade('test_unpd_doubling')

**Question - Make a histogram of doubling times across all of the countries in the world in 1990**

In [None]:
...

Your plot probably looks a little strange. This is because some of the growth rates are near 0. (Why might that cause the doubling time estimate to take on extreme values?)

**Question - Make a histogram of doubling times for countries whose growth rate is greater than 0.1%**

In [None]:
...

**Question - If growth rates were held constant at their 1990 values, how many countries would double in size within 50 years?**

In [None]:
## calculate your answer here

In [None]:
## SOLUTION
unpd_1990_r.where('Tdbl', are.above(0)).where('Tdbl', are.below(50)).num_rows

**Crude birth and death rates**

Our final look at population growth will compare an alternate measure of growth, called the **rate of natural increase** (RONI) to the growth rate $r$.  
  
The rate of natural increase is defined as

$$
\begin{aligned}
\text{RONI} &= \text{CBR} - \text{CDR}\\
&= \frac{B}{K} - \frac{D}{K},
\end{aligned}
$$

where $\text{CBR}$ is the crude birth rate, which is equal to the number of births $B$ divided by population size $K$; similarly, the crude death rate $\text{CDR}$ is equal to the number of deaths $D$ divided by population size $K$.

**Question - Given the definition above, what range of RONI values would you expect a population to have when it is:  
(a) Not changing in size?  
(b) Shrinking?  
(c) Growing?**

[ANSWER HERE]

Now we'll calculate the rate of natural increase from UNPD data.

First, we'll open up the UNPD crude birth rate estimates:

In [None]:
unpd_cbr = Table.read_table('../data/UNPD/unpd_cbr_cleaned.csv')
unpd_cbr

And then we'll open up the UNPD crude birth rate estimates:

In [None]:
unpd_cdr = Table.read_table('../data/UNPD/unpd_cdr_cleaned.csv')
unpd_cdr

**Question - Fill in the code below to   
(1) filter the crude birth and death rate data down to 1990 only  
(2) then join the CBR and CDR data together  
(3) finally, calculate RONI from the joined dataset; store it in a column called `roni`  
*[NOTE: the CBR and CDR are reported in 1000s**

In [None]:
unpd_cbr_1990 = ...
unpd_cdr_1990 = ...
unpd_roni_1990 = ... # join cbr and cdr together
unpd_roni_1990 = ... # calculate RONI
unpd_roni_1990

In [None]:
_ = lab04.grade('test_roni')

What is the difference between the growth rate, $r$, and RONI? Let's find out!

**Join the growth rates (from `unpd_1990_r`) into the RONI values you calculated in `unpd_roni_1990`.**

In [None]:
unpd_roni_r = ...
unpd_roni_r

**Question - Make a scatterplot comparing RONI and the growth rate r**

In [None]:
...

**Question: RONI and r and very similar, but they are not identical. Thinking about how they are calculated, what might explain the differences between them?**

[ANSWER HERE]

## Run all tests

This cell just re-runs all of the unit tests in the notebook, to summarize the results

In [None]:
# this cell runs all the tests at once!
print("Running all tests...")
_ = [lab04.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('test')]
print("Finished running all tests.")

### Submit your assignment by MIDNIGHT on the day of class

Please submit your lab in by running the cell below. You can submit as many times as you want, up to midnight on the day of the class. No late submissions are allowed, and the system will prevent you from being able to submit late.

In [None]:
_ = lab04.submit()