## Population Lecture I



### Introduction



Today we&rsquo;ll introduce some key &ldquo;stylized facts&rdquo; about human
population and its growth.  None of these are &ldquo;causal&rdquo; statements,
just observations about relationships.

-   **Fact I:** Population growth is fundamentally exponential, but the
    rate of growth has fallen over time.
-   **Fact II:** Population growth rates are generally higher in places
    where people are poorer.
-   **Fact III:** Variation in growth rates across countries is
    accounted for more by variation in fertility than by mortality.



### Getting Data



#### The World Development Indicators & `wbdata`



The World Bank maintains a large set of &ldquo;World Development Indicators&rdquo; (WDI),
including information on population.  

-   API for WDI is available at [https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation](https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation)

-   A `python` module that uses the API is `wbdata`, written by Oliver Sherouse.

-   Available at [http://github.com/OliverSherouse/wbdata](http://github.com/OliverSherouse/wbdata).

-   Documented at [https://wbdata.readthedocs.io](https://wbdata.readthedocs.io).



#### Getting Population Data Using wbdata



##### Goals



We want to devise ways to visualize the following:

-   Global population growth from 1960 to the present;
-   Population growth rates versus GDP per capita;
-   Age-sex population pyramids.



##### Methods (using wbdata)



We walk through the process of getting data from the WDI into a
`pandas` DataFrame. 

The `wbdata` module has several key functions we&rsquo;ll want to use:

-   **`search_countries()`:** Returns code for different countries or
    regions.
-   **`get_source()`:** Gives list of different data sources that can
    be accessed using the module; returns a numeric key;
-   **`get_indicator()`:** Given a source, this returns a list of
    available variables (indicators).
-   **`get_dataframe()`:** Given a source and a list of indicators,
    this returns a dataframe populated with the requested data
    for whatever

Begin by importing the module:



In [2]:
## If import fails with "ModuleNotFoundError"
## uncomment below & try again
!pip install wbdata

import wbdata

Collecting wbdata
  Using cached wbdata-0.3.0-py3-none-any.whl (14 kB)
Installing collected packages: wbdata
Successfully installed wbdata-0.3.0


###### `wbdata.search_countries()`



What countries and regions are available?  Looking up the country
  codes, or searching for particular strings:



In [3]:
import wbdata

# Return list of all country/region codes:
wbdata.get_country()

# Return list matching a query term:
#wbdata.search_countries("World")

## Try your own search!
# wbdata.search_countries("")

id    name
----  --------------------------------------------------------------------------------
ABW   Aruba
AFE   Africa Eastern and Southern
AFG   Afghanistan
AFR   Africa
AFW   Africa Western and Central
AGO   Angola
ALB   Albania
AND   Andorra
ARB   Arab World
ARE   United Arab Emirates
ARG   Argentina
ARM   Armenia
ASM   American Samoa
ATG   Antigua and Barbuda
AUS   Australia
AUT   Austria
AZE   Azerbaijan
BDI   Burundi
BEA   East Asia & Pacific (IBRD-only countries)
BEC   Europe & Central Asia (IBRD-only countries)
BEL   Belgium
BEN   Benin
BFA   Burkina Faso
BGD   Bangladesh
BGR   Bulgaria
BHI   IBRD countries classified as high income
BHR   Bahrain
BHS   Bahamas, The
BIH   Bosnia and Herzegovina
BLA   Latin America & the Caribbean (IBRD-only countries)
BLR   Belarus
BLZ   Belize
BMN   Middle East & North Africa (IBRD-only countries)
BMU   Bermuda
BOL   Bolivia
BRA   Brazil
BRB   Barbados
BRN   Brunei Darussalam
BSS   Sub-Saharan Africa (IBRD-only countries)
BTN   Bhutan
BWA  

###### `wbdata.get_source()`



To see possible datasets we can access via the API, use `get_source()`



In [4]:
wbdata.get_country('ABW')[0]['name']

'Aruba'

In [5]:
wbdata.get_source()

  id  name
----  --------------------------------------------------------------------
   1  Doing Business
   2  World Development Indicators
   3  Worldwide Governance Indicators
   5  Subnational Malnutrition Database
   6  International Debt Statistics
  11  Africa Development Indicators
  12  Education Statistics
  13  Enterprise Surveys
  14  Gender Statistics
  15  Global Economic Monitor
  16  Health Nutrition and Population Statistics
  18  IDA Results Measurement System
  19  Millennium Development Goals
  20  Quarterly Public Sector Debt
  22  Quarterly External Debt Statistics SDDS
  23  Quarterly External Debt Statistics GDDS
  25  Jobs
  27  Global Economic Prospects
  28  Global Financial Inclusion
  29  The Atlas of Social Protection: Indicators of Resilience and Equity
  30  Exporter Dynamics Database – Indicators at Country-Year Level
  31  Country Policy and Institutional Assessment
  32  Global Financial Development
  33  G20 Financial Inclusion Indicators
  34  Glob

###### `wbdata.get_indicator()`



&ldquo;Population estimates and projections&rdquo; looks promising.
 See what indicators/variables are available?



In [6]:
SOURCE = 40 # "Population estimates and projections
indicators = wbdata.get_indicator(source=SOURCE)
indicators

id                 name
-----------------  -------------------------------------------------------------------
SH.DTH.0509        Number of deaths ages 5-9 years
SH.DTH.1014        Number of deaths ages 10-14 years
SH.DTH.1019        Number of deaths ages 10-19 years
SH.DTH.1519        Number of deaths ages 15-19 years
SH.DTH.2024        Number of deaths ages 20-24 years
SH.DTH.IMRT        Number of infant deaths
SH.DTH.IMRT.FE     Number of infant deaths, female
SH.DTH.IMRT.MA     Number of infant deaths, male
SH.DTH.MORT        Number of under-five deaths
SH.DTH.MORT.FE     Number of under-five deaths, female
SH.DTH.MORT.MA     Number of under-five deaths, male
SH.DTH.NMRT        Number of neonatal deaths
SH.DYN.0509        Probability of dying among children ages 5-9 years (per 1,000)
SH.DYN.1014        Probability of dying among adolescents ages 10-14 years (per 1,000)
SH.DYN.1019        Probability of dying among adolescents ages 10-19 years (per 1,000)
SH.DYN.1519        Probabil

##### Getting Population Over Time



Let&rsquo;s get data on the global population and see how it has changed over
 time. The variable `SP.POP.TOTL` seems like a reasonable place to
 start.  

We want to get a `pandas.DataFrame` of total population:



In [7]:
# Give variable for clarity
variable_labels = {"SP.POP.TOTL":"World Population"}

world = wbdata.get_dataframe(variable_labels, country="WLD")

# Date index is of type string; change to integers
world.index = world.index.astype(int)

# Print a few years' data
world.head()

Unnamed: 0_level_0,World Population
date,Unnamed: 1_level_1
2021,7888409000.0
2020,7820982000.0
2019,7742682000.0
2018,7661776000.0
2017,7578158000.0


In [8]:
def dataframefunction(agelowerbound=0, ageupperbound=80, givencountry='world'):
    age_ranges = []

    # Ranges top out at 80, and go in five year increments
    for i in range(agelowerbound,ageupperbound,5):
        age_ranges.append(f"{i:02d}"+f"{i+4:02d}")

    if ageupperbound == 80:
        age_ranges.append("80UP")

    male_variables = {"SP.POP."+age_range+".MA":"Males "+age_range for age_range in age_ranges}
    female_variables = {"SP.POP."+age_range+".FE":"Females "+age_range for age_range in age_ranges}

    variables = male_variables
    variables.update(female_variables)
    
    country = givencountry
    df = wbdata.get_dataframe(variables, country)
    df['Country'] = [country for i in range(0, len(df))]
    df.insert(0, "Country", df.pop("Country"))
    return df

dataframefunction(0, 50, 'MMR')

Unnamed: 0_level_0,Country,Males 0004,Males 0509,Males 1014,Males 1519,Males 2024,Males 2529,Males 3034,Males 3539,Males 4044,...,Females 0004,Females 0509,Females 1014,Females 1519,Females 2024,Females 2529,Females 3034,Females 3539,Females 4044,Females 4549
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021,MMR,2310513.0,2271404.0,2287917.0,2351085.0,2280229.0,2234744.0,2139374.0,2099590.0,1893296.0,...,2180883.0,2154367.0,2180882.0,2249244.0,2197402.0,2174846.0,2094718.0,2073224.0,1895750.0,1731185.0
2020,MMR,2318768.0,2268374.0,2300425.0,2360501.0,2278275.0,2232782.0,2138928.0,2078186.0,1866538.0,...,2188793.0,2152601.0,2192878.0,2258470.0,2196937.0,2172995.0,2093747.0,2052893.0,1869607.0,1709572.0
2019,MMR,2320731.0,2272722.0,2315383.0,2362041.0,2284447.0,2229585.0,2147964.0,2048967.0,1839680.0,...,2190786.0,2157646.0,2207231.0,2260617.0,2204992.0,2169927.0,2102526.0,2025198.0,1843339.0,1687972.0
2018,MMR,2318211.0,2282127.0,2335591.0,2358455.0,2295470.0,2221996.0,2163090.0,2016469.0,1815137.0,...,2188786.0,2167221.0,2226698.0,2258212.0,2217477.0,2162127.0,2117284.0,1994137.0,1819268.0,1665460.0
2017,MMR,2313547.0,2294378.0,2360918.0,2350164.0,2305666.0,2213232.0,2171965.0,1985639.0,1792966.0,...,2184875.0,2179329.0,2251046.0,2251712.0,2228820.0,2153031.0,2126060.0,1964709.0,1797561.0,1641722.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1964,MMR,1940328.0,1624079.0,1378332.0,1062397.0,915000.0,864320.0,820503.0,720349.0,618131.0,...,1866845.0,1586730.0,1359506.0,1036670.0,881613.0,840488.0,802957.0,721353.0,613565.0,542941.0
1963,MMR,1905143.0,1584722.0,1319238.0,1024788.0,903413.0,864333.0,809531.0,704501.0,611037.0,...,1835101.0,1550729.0,1306283.0,991008.0,871601.0,839544.0,796443.0,703279.0,606083.0,528105.0
1962,MMR,1872075.0,1542314.0,1251804.0,1000488.0,896565.0,864135.0,796036.0,688656.0,605449.0,...,1805248.0,1512001.0,1241512.0,961421.0,866552.0,838177.0,788338.0,683943.0,600948.0,511596.0
1961,MMR,1839764.0,1497163.0,1190921.0,976197.0,893178.0,862143.0,781703.0,674505.0,598370.0,...,1775951.0,1470935.0,1177277.0,937232.0,864493.0,836335.0,778041.0,666607.0,594470.0,496160.0


In [9]:
dataframefunction(0, 80, 'MMR')

Unnamed: 0_level_0,Country,Males 0004,Males 0509,Males 1014,Males 1519,Males 2024,Males 2529,Males 3034,Males 3539,Males 4044,...,Females 3539,Females 4044,Females 4549,Females 5054,Females 5559,Females 6064,Females 6569,Females 7074,Females 7579,Females 80UP
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021,MMR,2310513.0,2271404.0,2287917.0,2351085.0,2280229.0,2234744.0,2139374.0,2099590.0,1893296.0,...,2073224.0,1895750.0,1731185.0,1563398.0,1349404.0,1115453.0,867252.0,576126.0,335041.0,276239.0
2020,MMR,2318768.0,2268374.0,2300425.0,2360501.0,2278275.0,2232782.0,2138928.0,2078186.0,1866538.0,...,2052893.0,1869607.0,1709572.0,1537268.0,1316917.0,1089436.0,839680.0,544629.0,326585.0,274011.0
2019,MMR,2320731.0,2272722.0,2315383.0,2362041.0,2284447.0,2229585.0,2147964.0,2048967.0,1839680.0,...,2025198.0,1843339.0,1687972.0,1509382.0,1284612.0,1061486.0,810525.0,512123.0,318683.0,269853.0
2018,MMR,2318211.0,2282127.0,2335591.0,2358455.0,2295470.0,2221996.0,2163090.0,2016469.0,1815137.0,...,1994137.0,1819268.0,1665460.0,1479138.0,1254048.0,1032064.0,775241.0,486284.0,312733.0,265549.0
2017,MMR,2313547.0,2294378.0,2360918.0,2350164.0,2305666.0,2213232.0,2171965.0,1985639.0,1792966.0,...,1964709.0,1797561.0,1641722.0,1447066.0,1224864.0,1001402.0,733651.0,469619.0,308673.0,260970.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1964,MMR,1940328.0,1624079.0,1378332.0,1062397.0,915000.0,864320.0,820503.0,720349.0,618131.0,...,721353.0,613565.0,542941.0,442358.0,372893.0,298392.0,212654.0,139769.0,77578.0,45136.0
1963,MMR,1905143.0,1584722.0,1319238.0,1024788.0,903413.0,864333.0,809531.0,704501.0,611037.0,...,703279.0,606083.0,528105.0,433735.0,367092.0,289902.0,206746.0,135677.0,74951.0,43332.0
1962,MMR,1872075.0,1542314.0,1251804.0,1000488.0,896565.0,864135.0,796036.0,688656.0,605449.0,...,683943.0,600948.0,511596.0,426896.0,361291.0,280987.0,201382.0,131782.0,72361.0,41637.0
1961,MMR,1839764.0,1497163.0,1190921.0,976197.0,893178.0,862143.0,781703.0,674505.0,598370.0,...,666607.0,594470.0,496160.0,420669.0,354902.0,272419.0,196191.0,128005.0,69823.0,40003.0


In [10]:
wbdata.get_country('ABW')[0]['name']
wbdata.search_countries('Zimbabwe')[0]['id']

'ZWE'

In [11]:
def population(year='', sex='', age_range=(0), place=''): 
    newplace = place
    if len(place) != 3:
        newplace = wbdata.search_countries(place)[0]['id']
    upperage = age_range[1]
    lowerage = age_range[0]
    if upperage > 80:
        upperage = 80
    theage = (upperage - lowerage)//5 + 1
    theyear = 2021 - int(year) 
    if sex == 'Male':
        value = dataframefunction(lowerage, upperage, newplace).iloc[theyear,1:(theage+1)].sum()
    if sex == 'Female':
        value = dataframefunction(lowerage, upperage, newplace).iloc[theyear,18:18+(theage+1)].sum()
    return 'In ' + str(year) + ', there are ' + str(value) + ' ' + str(sex) + " aged" + " 00 to aged " + str(age_range) + " in the " + wbdata.get_country(newplace)[0]['name']


population(year='2019', sex='Female', age_range=(0,100), place='WLD')

'In 2019, there are 3847484070.0 Female aged 00 to aged (0, 100) in the World'

In [7]:
sample = wbdata.get_dataframe({'SP.POP.TOTL.FE.ZS':'Female Population', 'SP.POP.TOTL.MA.ZS':'Male Population' }, country='KOR')
sample
#gotta do this for hw#

Unnamed: 0_level_0,Female Population,Male Population
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2021,50.058097,49.941903
2020,50.040110,49.959892
2019,50.014382,49.985618
2018,49.981562,50.018440
2017,49.949341,50.050659
...,...,...
1964,49.687409,50.312591
1963,49.754419,50.245581
1962,49.826077,50.173927
1961,49.894037,50.105963


### Plotting Data



##### Plotting data from pandas.DataFrame



Let&rsquo;s make a time-series plot of global population.  We&rsquo;ll use the
 `plot.ly` `cufflinks` module, which integrates with `pandas`.  Here&rsquo;s two lines to set up the plotting environment:



In [8]:
#!pip install cufflinks # IF NECESSARY
import cufflinks as cf
cf.go_offline()


The Shapely GEOS version (3.10.3-CAPI-1.16.1) is incompatible with the GEOS version PyGEOS was compiled with (3.10.4-CAPI-1.16.2). Conversions between both will be slow.



##### Plotting Global Population Over time



With that done, after we have a DataFrame making a plot is just one
 line of code:



In [9]:
# Useful arguments to pass include xTitle, yTitle, Title
world.iplot(title="Fact I: Growth Rates Falling over Time",xTitle='Year',yTitle='Population')

##### Plotting Different Countries&rsquo; Population Growth Rates



Globally, population growth has been basically linear over the last 60
 years.

-   Increases by 1 billion about every 12 years.
-   Implies *rate* of growth falling over time.

How do population growth rates vary by country?



In [10]:
import numpy as np

variable_labels = {"SP.POP.TOTL":"Population"}

# Three letter codes come from wbdata.get_country()
countries = {"WLD":"World",
             "LIC":"Low income",
             "LMC":"Low-medium income",
             "UMC":"Upper-medium income",
             "HIC":"High income",
            }

df = wbdata.get_dataframe(variable_labels, country = countries).squeeze()

df = df.unstack('country')
# Date index is of type string; change to integers
df.index = df.index.astype(int)

# Differences (over time) in logs give us growth rates
np.log(df).diff().iplot(title="Fact II: Poorer places have higher growth rates",
                        yTitle="Growth Rate",xTitle='Year')

##### Population Growth vs Per capita GDP



Our second stylized fact was that there&rsquo;s an inverse association between
 income and population growth.  We&rsquo;ll investigate this fact here, 
 constructing a scatter plot relating population growth rates to (log) GDP per capita.



In [11]:
import numpy as np
# wbdata.search_indicators("GDP per capita")

indicators = {"NY.GDP.PCAP.CD":"GDP per capita",
              "SP.DYN.TFRT.IN":"Total Fertility Rate",
              "SP.POP.GROW":"Population Growth Rate",
              "SP.DYN.AMRT.MA":"Male Mortality",
              "SP.DYN.AMRT.FE":"Female Mortality",
              "SP.POP.1564.FE.ZS":"% Adult Female",
              "SP.POP.TOTL.FE.ZS":"% Female"}

data = wbdata.get_dataframe(indicators)

# Make years ints instead of strings
data.reset_index(inplace=True)
data['date'] = data['date'].astype(int)
data.set_index(['country','date'],inplace=True)

df = data.query("date==2020") # Latest year missing some data

# All dates now the same; not a useful index
df.index = df.index.droplevel('date')

df['Log GDP per capita'] = np.log(df['GDP per capita'])

df.iplot(kind='scatter', mode='markers', symbol='circle-dot',
         x="Log GDP per capita",y="Population Growth Rate",
         text=df.reset_index('country')['country'].values.tolist(),
         xTitle="Log GDP per capita",yTitle="Population Growth Rate",
         title="Fact II: Population growth is lower in higher-income countries")

##### Decomposing Population Growth



Consider the human population at a particular time $t$, and let the
 size of the population be given by $N_t$ at time $t$.  Also, let
 $\phi_t$ be the *share* of the population at time $t$ that are women
 of child-bearing age (e.g., 15&#x2013;49).

Now, as a matter of accounting, population in the next period $t+1$ will be given by
$$
    N_{t+1} = (1-\mbox{mortality rate})N_t + \mbox{TFR}\cdot\phi_t N_t.
 $$

Thus, we can think of population growth as depending on mortality, fertility, and the share of the population that can bear children.  

We&rsquo;ve seen that population growth is falling over time.  Is the fall due to changes in mortality, fertility, or $\phi_t$?



##### Mortality Over Time



Can mortality changes account for declining population?  Look at
 deaths per 10,000 people.



In [12]:
world = data.query("country=='World'")

# Drop country index for World data
world.index = world.index.droplevel('country')

world[["Male Mortality","Female Mortality"]].iplot(title="Deaths per 10,000")

##### Adult female share of population over time



Decreases in population growth could also be due to a decreasing share of adult women, perhaps due to gender selection at birth.  How does this share ($\phi_t$) vary over time?



In [13]:
# % Adult Female is % of females who are adult.
# To make a share of total population take product
world["% Adult Female"] = world["% Adult Female"]*world["% Female"]/100

world["% Adult Female"].iplot(title="% of Adult Females in World Population")

##### Fertility over time



Finally, decreases in population growth could be due to reduced fertility.  How does global fertility vary over time?



In [14]:
world["Total Fertility Rate"].iplot()

##### Relation between income and fertility



In [15]:
df.iplot(kind='scatter', mode='markers', symbol='circle-dot',
         x="Log GDP per capita",y="Total Fertility Rate",
         text=df.reset_index('country')['country'].values.tolist(),
         xTitle="Log GDP per capita",yTitle="Total Fertility Rate",
         title="Fact II: Women in Poorer Countries Have Higher Fertility")

### Understanding Age-Sex Composition



To relate the total fertility rate (TFR) of a country to population
 growth, we need to know some other things about the country:

1.  Women of child-bearing age, as a proportion of population
2.  Mortality rates (which will vary with age)
3.  Rates of net migration

We won&rsquo;t have much to say about migration yet, but the number of
women of child-bearing age and rates of mortality can both be
helpfully visualized by constructing *population pyramids* that
report information on the age and sex composition of a population at
a point in time.



#### Building a population pyramid



The next code builds a list of the age-sex counts we want
 (e.g., how many males are there between the ages of 10-14?).



In [16]:
# Data from WDI on age-sex comes in the forms of variables
# which take the form "SP.POP.LLHH.MA" for males
# and "SP.POP.LLHH.FE" for females, where LL is the *low* end of
# age range, like "05" for 5-yo, and HH is the *high* end.

# We construct a list of age-ranges.

# Start with an empty list of age-rages
age_ranges = []

# Ranges top out at 80, and go in five year increments
for i in range(0,80,5):
    age_ranges.append(f"{i:02d}"+f"{i+4:02d}")

age_ranges.append("80UP")

print(age_ranges)

['0004', '0509', '1014', '1519', '2024', '2529', '3034', '3539', '4044', '4549', '5054', '5559', '6064', '6569', '7074', '7579', '80UP']


Next we construct a dictionary of indicators, with labels, that we
 want to grab.



In [17]:
male_variables = {"SP.POP."+age_range+".MA":"Males "+age_range for age_range in age_ranges}
female_variables = {"SP.POP."+age_range+".FE":"Females "+age_range for age_range in age_ranges}

variables = male_variables
variables.update(female_variables)

print(variables)

{'SP.POP.0004.MA': 'Males 0004', 'SP.POP.0509.MA': 'Males 0509', 'SP.POP.1014.MA': 'Males 1014', 'SP.POP.1519.MA': 'Males 1519', 'SP.POP.2024.MA': 'Males 2024', 'SP.POP.2529.MA': 'Males 2529', 'SP.POP.3034.MA': 'Males 3034', 'SP.POP.3539.MA': 'Males 3539', 'SP.POP.4044.MA': 'Males 4044', 'SP.POP.4549.MA': 'Males 4549', 'SP.POP.5054.MA': 'Males 5054', 'SP.POP.5559.MA': 'Males 5559', 'SP.POP.6064.MA': 'Males 6064', 'SP.POP.6569.MA': 'Males 6569', 'SP.POP.7074.MA': 'Males 7074', 'SP.POP.7579.MA': 'Males 7579', 'SP.POP.80UP.MA': 'Males 80UP', 'SP.POP.0004.FE': 'Females 0004', 'SP.POP.0509.FE': 'Females 0509', 'SP.POP.1014.FE': 'Females 1014', 'SP.POP.1519.FE': 'Females 1519', 'SP.POP.2024.FE': 'Females 2024', 'SP.POP.2529.FE': 'Females 2529', 'SP.POP.3034.FE': 'Females 3034', 'SP.POP.3539.FE': 'Females 3539', 'SP.POP.4044.FE': 'Females 4044', 'SP.POP.4549.FE': 'Females 4549', 'SP.POP.5054.FE': 'Females 5054', 'SP.POP.5559.FE': 'Females 5559', 'SP.POP.6064.FE': 'Females 6064', 'SP.POP.6569.

Get the data!



In [18]:
# WLD is the World; substitute your own code or list of codes.
# Remember you can search for the appropriate codes using
# wbdata.search_countries("")

df = wbdata.get_dataframe(variables,country="WLD")
print(df.query("date=='2020'").sum(axis=0))

Males 0004      348843527.0
Males 0509      350861843.0
Males 1014      336694403.0
Males 1519      319186586.0
Males 2024      307711749.0
Males 2529      306243509.0
Males 3034      306298288.0
Males 3539      278670739.0
Males 4044      248265041.0
Males 4549      239380760.0
Males 5054      220302564.0
Males 5559      189880511.0
Males 6064      154455368.0
Males 6569      126216520.0
Males 7074       87919280.0
Males 7579       55253441.0
Males 80UP       57363155.0
Females 0004    329071642.0
Females 0509    329225029.0
Females 1014    315221758.0
Females 1519    299112617.0
Females 2024    289115754.0
Females 2529    290548637.0
Females 3034    293679967.0
Females 3539    269562260.0
Females 4044    242158101.0
Females 4549    236459269.0
Females 5054    221445209.0
Females 5559    195902317.0
Females 6064    165545454.0
Females 6569    141973645.0
Females 7074    103935445.0
Females 7579     70462837.0
Females 80UP     94014398.0
dtype: float64


#### Plotting Population Pyramid



Now we put together some code for the population pyramid.  The structure
 of the DataFrames is more complicated than it was above, so using the simple `cufflinks` library won&rsquo;t work here (or at least I don&rsquo;t see quite how to do it).   We use a more general `plot.ly` library instead.



In [19]:
import plotly.offline as py
import plotly.graph_objs as go
import pandas as pd
import numpy as np

py.init_notebook_mode(connected=True)

layout = go.Layout(barmode='overlay',
                   yaxis=go.layout.YAxis(range=[0, 90], title='Age'),
                   xaxis=go.layout.XAxis(title='Number'))

year = 2020

bins = [go.Bar(x = df.loc[str(year),:].filter(regex="Male").values,
               y = [int(s[:2])+1 for s in age_ranges],
               orientation='h',
               name='Men',
               marker=dict(color='purple'),
               hoverinfo='skip'
               ),

        go.Bar(x = -df.loc[str(year),:].filter(regex="Female").values,
               y=[int(s[:2])+1 for s in age_ranges],
               orientation='h',
               name='Women',
               marker=dict(color='pink'),
               hoverinfo='skip',
               )
        ]
py.iplot(dict(data=bins, layout=layout))

#### Changes in Pyramid Over Time



Let&rsquo;s try a more ambitious visualization, showing how the shape of the population pyramid has changed decade by decade.



In [20]:
# Count down by increments of 20 years
years = range(2020,1960,-20)

# This makes a list of graphs, year by year
bins = [go.Bar(x = df.loc[str(year),:].filter(regex="Male").values,
               y = [int(s[:2])+1 for s in age_ranges],
               orientation='h',
               name='Men {:d}'.format(year),
               hoverinfo='skip'
              )
        for year in years]
          
bins += [go.Bar(x = -df.loc[str(year),:].filter(regex="Female").values,
                y=[int(s[:2])+1 for s in age_ranges],
                orientation='h',
                name='Women {:d}'.format(year),
                hoverinfo='skip',
               )
         for year in years]

py.iplot(dict(data=bins, layout=layout))