# 1.3.4: World Population (Interpretation: Prediction)

<br>

---

*Modeling and Simulation in Python*

Copyright 2021 Allen Downey, (License: [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-nc-sa/4.0/))

Revised, Mike Augspurger (2021-present)

<br>

---

First, download the population data and functions we'll need:

In [None]:
#@title
# Import libraries
from os.path import basename, exists
from os import mkdir

def download(url,folder):
    filename = folder + basename(url)
    if not exists(folder):
        mkdir(folder)
    # fetches the file at the given url if it is not already present
    if not exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)

download('https://github.com/MAugspurger/ModSimPy_MAugs/raw/main/Notebooks/'
        + 'ModSimPy_Functions/modsim.py', 'ModSimPy_Functions/')
download('https://github.com/MAugspurger/ModSimPy_MAugs/raw/main/Notebooks/'
        + 'ModSimPy_Functions/chap02.py', 'ModSimPy_Functions/')

from ModSimPy_Functions.modsim import *
from ModSimPy_Functions.chap02 import *
import pandas as pd
import numpy as np

In [None]:
#@title
filename = 'https://github.com/MAugspurger/ModSimPy_MAugs/raw/main/Images_and_Data/Data/World_population_estimates.html'
# If you are using this notebook offline, you will need to upload this data
# from the Images_and_Data folder on your local computer.  
# Comment out the line above, and uncomment the
# line below this one, and run this cell
# filename = '../Images_and_Data/Data/World_population_estimates.html'

tables = pd.read_html(filename, header=0, index_col=0, decimal='M')
table2 = tables[2]
table2.columns = ['census', 'prb', 'un', 'maddison', 
                  'hyde', 'tanton', 'biraben', 'mj', 
                  'thomlinson', 'durand', 'clark']
un = table2.un / 1e9
census = table2.census / 1e9

def plot_estimates():
    census.plot(style=':', label='US Census',legend=True)
    un.plot(style='--', label='UN DESA', xlabel='Year', 
             ylabel='World population (billion)',
           legend=True)
    


---

## Part 1: Generating a projection

In the previous notebook we developed a quadratic model of world
population growth from 1950 to 2016. It is a simple model, but it fits
the data well and the mechanisms upon which it is based are plausible.  This has served one purpose of modeling: we have a better understanding of some of the "rules" that (seem to) govern this system.   We can better interpret the system: the idea of *carrying capacity*, for instance, provides a plausible explanation for why the growth rate is getting lower with time.

<br>

Now that we've found a plausible model, we can move on to one of the other interpretive aims of modeling: predicting future behavior.  Let's run the quadratic model, extending the results until 2100, and see how our projections compare to the professionals'.  Here's the quadratic growth function again, and the system parameters:

In [None]:
def growth_func_quad(t, pop, system):
    return system['alpha'] * pop + system['beta'] * pop**2

t_0 = census.index[0]
p_0 = census[t_0]
t_end = census.index[-1]

system = dict(t_0 = t_0,
                p_0 = p_0,
                alpha = 25 / 1000,
                beta = -1.8 / 1000,
                t_end = t_end)

def run_simulation(system, change_func):
    results = pd.Series([],dtype=object)
    results[system['t_0']] = system['p_0']
    
    for t in range(system['t_0'], system['t_end']):
        growth = change_func(t, results[t], system)
        results[t+1] = results[t] + growth
        
    return results

✅ What can we change to make the simulation run until 2100?  Do we need to change anything in `run_simulation`? 

In [None]:
results = run_simulation(system, growth_func_quad)
results.plot(color='gray', label='model',xlabel='Year', 
         ylabel='World population (billion)',
         title='Quadratic Model Projection',
            legend=True);

According to the model, population growth will slow gradually after 2020, approaching 12.6 billion by 2100.  Notice that even in this amount of time, the population never gets close to our projected carrying capacity of 13.8 billion.

## Part 2: Comparing Projections

From the same Wikipedia page where we got the past population estimates, we'll read `table3`, which contains predictions for population growth over the next 50-100 years, generated by the U.S. Census, U.N. DESA, and the Population Reference Bureau.

In [None]:
table3 = tables[3]
table3.head()

Some values are `NaN`, which indicates missing data, because some organizations did not publish projections for some years.  The column names are long strings; for convenience, we'll replace them with abbreviations.

In [None]:
table3.columns = ['census', 'prb', 'un']

The following function plots projections from the U.N. DESA and U.S. Census.  It uses a function called `dropna` to remove the `NaN` values from each series before plotting it.

In [None]:
def plot_projections(table):
    """Plot world population projections.
    
    table: DataFrame with columns 'un' and 'census'
    """
    census_proj = table.census.dropna() / 1e9
    un_proj = table.un.dropna() / 1e9
    
    census_proj.plot(style=':', label='US Census',legend=True)
    un_proj.plot(style='--', label='UN DESA',xlabel='Year', 
             ylabel='World population (billion)',legend=True)

Here are the professional projections compared to the results of the quadratic model.

In [None]:
plot_projections(table3)
results.plot(color='gray', label='model',
             title='Quadratic Model Projection',
            legend=True);

The U.N. DESA expects the world population to reach 11 billion around 2100, and then level off.
Projections by U.S. Census are a little lower, and they only go until 2050.  Real demographers expect world population to grow more slowly than our model, probably because their models are broken down by region and country, where conditions are different, and they take into account expected economic development.

<br>

Nevertheless, their projections are qualitatively similar to ours, and
theirs differ from each other almost as much as they differ from ours.
So the results from our model, simple as it is, are not entirely unreasonable.

<br> 

---

## Part 3: Predict Future Growth by Extrapolating growth rate trends

The net growth rate of world population has been declining for several decades.  That observation suggests one more way to generate more realistic projections, by extrapolating observed changes in growth rate.  'Extrapolate' means that we will discover a pattern in existing data, and extend that pattern into future behavior.

<br>

To compute past growth rates, we'll use a "method" of Series called `diff`, which computes the difference between successive elements in a `Series`.  For example, here are the changes from one year to the next in `census`:

In [None]:
diff = census.diff()
pd.DataFrame(diff.head())
pd.DataFrame(diff)


The first element is `NaN` because we don't have the data for 1949, so we can't compute the first difference.  

<br>

✅ How can we calculate the growth rate *alpha* for `census`?

In [None]:
# Calculate alpha for the Series census

The following function computes and plots the growth rates for the `census` and `un` estimates:

In [None]:
def plot_alpha():
    alpha_census = census.diff() / census
    alpha_census.plot(style='.', label='US Census',legend=True)

    alpha_un = un.diff() / un
    alpha_un.plot(style='.', label='UN DESA',
                  xlabel='Year',ylabel='Growth Rate',
                  title='Annual Growth Rate, 1950-present',
                 legend=True)

It uses `style='.'` to plot each data point with a small circle.
And here's what it looks like.

In [None]:
plot_alpha()

Other than a bump around 1990, net growth rate has been declining roughly linearly since 1970.  We can model the decline by fitting a line to this data and extrapolating into the future.
Here's a function that takes a time stamp and computes a growth rate for a give year.  Notice that in 1970, the rate is defined as `intercept`, which here is 0.02.

In [None]:
def alpha_func(t):
    intercept = 0.02
    slope = -0.0001
    return intercept + slope * (t - 1970)

To see what it looks like, we'll create an array of time stamps from 1960 to 2020 and use `alpha_func` to compute the corresponding growth rates.  Notice that we are using an array (the `linspace`) of numbers as the argument for `alpha_func`: the result is an array of points along a line. 

In [None]:
t_array = linspace(1960, 2020, 5)
alpha_array = alpha_func(t_array)

To see what it looks like, we'll create a `Series` out of the array of dates and the array of linear growth values, and compare it to the known data:

In [None]:
linear_alpha = pd.Series(data=alpha_array,index=t_array)
plot_alpha()
linear_alpha.plot(label='model', legend=True,color='gray',
                  ylabel='Net growth rate', xlabel='Year',
                  title='Linear model of net growth rate');

The line here doesn't fit the data very well, so first adjust the parameters and rerun the last couple cells.  Do this until you get a decent "fit".

Now let's use these values to create projection:

1. Create a system `dictionary` object that includes `alpha_func` as a system parameter.

2. Define a growth function that uses `alpha_func` to compute the net growth rate at the given time `t`.

3. Run a simulation from 1960 to 2100 with your growth function, and plot the results.

4. Compare your projections with those from the US Census and UN.

In [None]:
# Create a `System` object that includes `alpha_func` as a system parameter.
# Set the start and end dates, and starting population p_0 in the ssytem object



In [None]:
# Here are the alpha function and the run_simulation function from above
def alpha_func(t):
    intercept = 0.02
    slope = -0.0001
    return intercept + slope * (t - 1970)

def run_simulation(system, growth_func):
    results = pd.Series([],dtype=object)
    results[system['t_0']] = system['p_0']
    
    for t in range(system['t_0'], system['t_end']):
        growth = growth_func(t, results[t], system)
        results[t+1] = results[t] + growth
        
    return results

# Now define a growth function called `growth_func_lin_change`
# that uses `alpha_func` to compute 
# the net growth rate for each time step.
def growth_func_lin_change(t, pop, system):


In [None]:
# Bonus! Define a function that defines an exponential decrease in alpha


In [None]:
# Run a simulation from 1960 to 2100 with your growth function



In [None]:
# Here is a function that plots the Census and UN projections
def plot_estimates():
    census.plot(style=':', label='US Census',legend=True)
    un.plot(style='--', label='UN DESA', xlabel='Year', 
             ylabel='World population (billion)',
           legend=True)
    

# The next two lines plot the un and census estimates (up to the present))
# and also the projections (through 2100)
plot_estimates()
plot_projections(table3)

# Add your model to the plot in a line below this
# `results` is a Series, so we can use the Series method `plot()`
# Provide a label for the plot of the model
