# Project 1: Population and Poverty

In this project, you'll explore data from [Gapminder.org](http://gapminder.org), which
collects data from many sources and compiles them into tables that describe many countries around the world. All of the data they aggregate are published in the [Systema Globalis](https://github.com/open-numbers/ddf--gapminder--systema_globalis/blob/master/README.md). Their goal is "to compile all public statistics; Social, Economic and Environmental; into a comparable total dataset [sic]." All data sets in this project are copied directly from the Systema Globalis without any changes.

For some interesting lectures with reference to the material investigated in this project please see the TED talks by Hans Rosling [here](https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen?referrer=playlist-the_best_hans_rosling_talks_yo) and [here](https://www.ted.com/talks/hans_rosling_reveals_new_insights_on_poverty?referrer=playlist-the_best_hans_rosling_talks_yo).

### Partners

You may work alone or with one partner, who must be from your recitation section. Use Vocareum to invite your partner to join your team.  Once you've formed a team, you will share the same notebook in Vocareum, and any changes to it or submission of it will reflect on both partners.  **We strongly recommend that you form your Vocareum partnership before either partner begins editing the notebook,** so that you don't have to redo any work.

### Checkpoint

There are two parts to this assignment in Vocareum, although both parts have you editing the same notebook.  For full credit, you must complete the first eight questions and submit them **to Part 1** by 11:59pm on Thursday 2/21. You will have some lab time to work on these questions, but we recommend that you start the project before lab and leave time to finish the checkpoint afterward.

### Deadline

Your final submission of the notebook will be **to Part 2**.  You are welcome to change any answers to the first eight questions in your final submission. The final submission of the project is due at 11:59 pm on Monday 3/4. Late submissions will be accepted until 11:59 pm on Wednesday March 6th, but a 33.33% late penalty will be applied after 11:59 pm on Monday, and a 66.66% late penalty after 11:59 pm on Tuesday.  It's much better to be early than late, so start working now.

### Other Logistics

**Academic Integrity.** Do not share your code with anybody but your partner. You are welcome to discuss questions with other students, but don't share the answers. The experience of solving the problems in this project will prepare you for exams (and life). If someone asks you for an answer or a hint, resist! Instead, you can demonstrate how you would solve a similar problem, or point them to somewhere in the course materials that solves a similar problem.

**Support.** You are not alone! Come to office hours, come to recitation, and talk to your partner.

**Checks and tests.** Projects are an opportunity for you to demonstrate what you have learned from labs and homeworks.  So the checks in this project are relatively minimal. Passing the checks for a question does not mean that you answered the question correctly. More rigorous tests will be used to ascertain the correctness of your answers during grading.

**Advice.** Develop your answers incrementally. To perform a complicated table manipulation, break it up into steps, perform each step on a different line, give a new name to each result, and check that each intermediate result is what you expect. You can add any additional names or functions you want to the provided cells.

### <font color="red">Please run the next cell to import the packages required for the lab.</font>

In [None]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

from test import *

## 1. Global Population Growth


The global population of humans reached 1 billion around 1800, 3 billion around 1960, and 7 billion around 2011. The potential impact of exponential population growth has concerned scientists, economists, and politicians alike.

The UN Population Division estimates that the world population will likely continue to grow throughout the 21st century, but at a slower rate, perhaps reaching 11 billion by 2100. However, the UN does not rule out scenarios of more extreme growth.

<a href="http://www.pewresearch.org/fact-tank/2015/06/08/scientists-more-worried-than-public-about-worlds-growing-population/ft_15-06-04_popcount/"> 
 <img src="pew_population_projection.png"/> 
</a>
*Note:* The green shaded bars around the graphs are *confidence intervals* which we will discuss later in the course.

In this section, we will examine some of the factors that influence population growth and how they are changing around the world.

The first table we will consider is the total population of each country over time. Run the cell below.

In [None]:
# The population.csv file can also be found online here:
# https://github.com/open-numbers/ddf--gapminder--systema_globalis/raw/master/ddf--datapoints--population_total--by--geo--time.csv
# The version in this project was downloaded in February, 2019.
population = Table.read_table('population.csv')
population.show(3)

### Bangladesh

In the `population` table, the `geo` column contains three-letter codes established by the [International Organization for Standardization](https://en.wikipedia.org/wiki/International_Organization_for_Standardization) (ISO) in the [Alpha-3](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3#Current_codes) standard. We will begin by taking a close look at Bangladesh. Inspect the standard to find the 3-letter code for Bangladesh.

**Question 1.** Create a table called `b_pop` that has two columns labeled `time` and `population_total`. The first column should contain the years from 1970 through 2015 (including both 1970 and 2015) and the second should contain the population of Bangladesh in each of those years.

In [None]:
b_pop = ...
# Uncomment the next line to improve the formatting of the table
#b_pop.set_format('population_total', NumberFormatter(0))
b_pop

In [None]:
check1_1(b_pop)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


Uncomment and run the following cell to create a table called `b_five` that has the population of Bangladesh every five years. At a glance, it appears that the population of Bangladesh has been growing quickly indeed!

In [None]:
# Uncomment and run this cell
#fives = np.arange(1970, 2016, 5) # 1970, 1975, 1980, ...
#b_five = b_pop.sort('time').where('time', are.contained_in(fives))
#b_five

**Question 2.** Create a table called `b_five_growth` that includes three columns, `time`, `population_total`, and `annual_growth`. There should be one row for every five years from 1970 through 2010 (but not 2015). The first two columns are the same as `b_five`. The third column is the *annual growth rate* for each five-year period. Consult the [Growth Rates](http://www.cs.cornell.edu/courses/cs1380/2018sp/textbook/chapters/03/3/1/example-growth-rates.html) example in the textbook to understand how to compute the annual growth rate.

*Hint:* You may need to use the `are.contained_in` in a where statement for a table. See the documentation [here](http://data8.org/datascience/predicates.html).

*Hint 2:* More information about the PercentFormatter is [here](http://data8.org/datascience/formats.html).

In [None]:
# Only your `b_five_growth` table will be graded for correctness; the other 
# variable names below are suggestions that you are welcome to use, change, or delete.
b_1970_2010 = ...
b_1975_2015 = ...
initial = ...
changed = ...
annual_growth_rate = ...
b_five_growth = ...
# Uncomment the following line to make your table a little easier to read.
# b_five_growth.set_format('annual_growth', PercentFormatter)



In [None]:
check1_2(b_five_growth)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


Even though the population has grown every five years since 1970, the annual growth rate decreased dramatically from 1985 to 2015. Let's look at some other information and attempt to develop a possible explanation. Run the next cell to load three tables of measurements about countries over time.

Often times when doing data science our data comes from many different sources (either tables, or csv files) and it is our duty to figure out ways to combine the tables together to tell a complete story.

In [None]:
# Source: https://github.com/open-numbers/ddf--gapminder--systema_globalis/raw/master/ddf--datapoints--life_expectancy_years--by--geo--time.csv
life_expectancy = Table.read_table('life_expectancy.csv')
# Source: https://github.com/open-numbers/ddf--gapminder--systema_globalis/raw/master/ddf--datapoints--child_mortality_0_5_year_olds_dying_per_1000_born--by--geo--time.csv
child_mortality = Table.read_table('child_mortality.csv').relabeled(2, 'child_mortality_under_5_per_1000_born')
# Source: https://github.com/open-numbers/ddf--gapminder--systema_globalis/raw/master/ddf--datapoints--children_per_woman_total_fertility--by--geo--time.csv
fertility = Table.read_table('fertility.csv')

The `life_expectancy` table contains a statistic that is often used to measure how long people live, called *life expectancy at birth*. This number, for a country in a given year, [does not measure how long babies born in that year are expected to live](http://blogs.worldbank.org/opendata/what-does-life-expectancy-birth-really-mean). Instead, it measures how long someone would live, on average, if the *mortality conditions* in that year persisted throughout their lifetime. These "mortality conditions" describe what fraction of people at each age survived the year. So, it is a way of measuring the proportion of people that are staying alive, aggregated over different age groups in the population.

**Question 3.** Perhaps population is growing more slowly because people aren't living as long. Use the `life_expectancy` table to draw a line graph with the years 1970 and later on the horizontal axis that shows how the *life expectancy at birth* has changed in Bangladesh.

In [None]:
...


**Question 4.** Does the graph above help directly explain why the population growth rate decreased from 1985 to 2015 in Bangladesh? Why or why not? What happened in Bangladesh in 1991, and does that event explain the change in population growth rate?

*Write your answer here, replacing this text.*

In [None]:
# DO NOT CHANGE THIS CELL


The `fertility` table contains a statistic that is often used to measure how many babies are being born, the *total fertility rate*. This number describes the [number of children a woman would have in her lifetime](https://www.measureevaluation.org/prh/rh_indicators/specific/fertility/total-fertility-rate), on average, if the current rates of birth by age of the mother persisted throughout her child bearing years, assuming she survived through age 49. 

**Question 5.** Write a function `fertility_over_time` that takes the Alpha-3 code of a `country` and a `start` year. It returns a two-column table with labels "`Year`" and "`Children per woman`" that can be used to generate a line chart of the country's fertility rate each year, starting at the `start` year. The plot should include the `start` year and all later years that appear in the `fertility` table (which includes all years up to 2018). 

Then, in the next cell, call your `fertility_over_time` function on the Alpha-3 code for Bangladesh and the year 1970 in order to plot how Bangladesh's fertility rate has changed since 1970.

In [None]:
def fertility_over_time(country, start):
    """Create a two-column table that describes a country's total fertility rate each year."""
    country_fertility = ...
    country_fertility_after_start = ...
    ...
    


In [None]:
# You should uncomment the next line and run this cell to make sure your code works correctly, 
# but otherwise DO NOT change the code.
#fertility_over_time('bgd', 1970).plot('Year', 'Children per woman')

In [None]:
check1_5(fertility_over_time)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


**Question 6.** Does the graph above help directly explain why the population growth rate decreased from 1985 to 2018 in Bangladesh? Why or why not?

*Write your answer here, replacing this text.*

In [None]:
# DO NOT CHANGE THIS CELL


It has been observed that lower fertility rates are often associated with lower child mortality rates. The link has been attributed to family planning: if parents can expect that their children will all survive into adulthood, then they will choose to have fewer children. We can see if this association is evident in Bangladesh by plotting the relationship between total fertility rate and [child mortality rate per 1000 children](https://en.wikipedia.org/wiki/Child_mortality).

**Question 7.** Using both the `fertility` and `child_mortality` tables, draw a scatter diagram with one point for each year, starting with 1970 and ending in 2018, that has Bangladesh's total fertility on the horizontal axis and its child mortality on the vertical axis. 

**The expression that draws the scatter diagram is provided for you; please don't change it.** Instead, create a table called `fertility_and_child_mortality` with the appropriate column labels and data in order to generate the chart correctly. Use the label "`Children per woman`" to describe total fertility and the label "`Child deaths per 1000 born`" to describe child mortality.

In [None]:
country_fertility = ...
country_child_mortality = ...
fertility_and_child_mortality = ...

fertility_and_child_mortality

In [None]:
# You should uncomment the next line and run this cell to make sure your code works correctly, 
# but otherwise DO NOT change the code.
#fertility_and_child_mortality.scatter('Children per woman', 'Child deaths per 1000 born') 

From looking at the scatterplot you should see three different points where the children per woman are 7 but the child deaths per 1000 born are slightly higher than the lines. These points are called *outliers*. While the data is accurate, it shows that sometimes the trend in the data is not always perfect. We will see more of this when we start to talk about regression.

In [None]:
check1_7(fertility_and_child_mortality)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


**Question 8.** In one or two sentences, describe the association (if any) that is evidenced by this scatter diagram. Does the diagram show that reduced child mortality causes parents to choose to have fewer children?

*Write your answer here, replacing this text.*

In [None]:
# DO NOT CHANGE THIS CELL


### Congratulations: Checkpoint Reached

You have reached the project checkpoint. Please submit now **to Part 1** in order to record your progress. If you go back and revise your answers in the section above after the checkpoint is due, that's ok. Your revised answers will be graded. However, you will only get credit for your checkpoint submission if you have passed the autograder tests provided for every question above.

If you are working with a partner, only one of you needs to submit. For both of you to receive credit, you must have already formed a partnership in Vocareum before you submit&mdash;and really you should form that partnership before even beginning work on the assignment.

Now, switch to **Part 2** in Vocareum.  All of your work from Part 1 will still be in the notebook after you switch.

### Beyond Bangladesh

The change observed in Bangladesh since 1970 can also be observed in many other developing countries: health services improve, life expectancy increases, and child mortality decreases. At the same time, the fertility rate often plummets, and so the population growth rate decreases despite increasing longevity.

Run the cell below to generate two overlaid histograms, one for 1960 and one for 2010, that show the distributions of total fertility rates for these two years among all 201 countries in the `fertility` table.

In [None]:
Table().with_columns(
    '1960', fertility.where('time', 1960).column('children_per_woman_total_fertility'),
    '2010', fertility.where('time', 2010).column('children_per_woman_total_fertility')
).hist(bins=np.arange(0, 10, 0.5), unit='child')
_ = plots.xlabel('Children per woman')
_ = plots.xticks(np.arange(10))

**Question 9.** Assign `fertility_statements` to a list of the numbers for each statement below that can be correctly inferred from these histograms.
1. About the same number of countries had a fertility rate between 3.5 and 4.5 in both 1960 and 2010.
1. In 2010, about 40% of countries had a fertility rate between 1.5 and 2 (inclusive).
1. In 1960, less than 20% of countries had a fertility rate below 3.
1. More countries had a fertility rate above 3 in 1960 than in 2010.
1. At least half of countries had a fertility rate between 5 and 8 (inclusive) in 1960.
1. At least half of countries had a fertility rate below 3 in 2010.

In [None]:
fertility_statements = [...]


In [None]:
check1_9(fertility_statements)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


**Question 10.** Draw a line plot of the world population from 1800 through 2018. The world population is the sum of all the country's populations. *Hint: recall the `population` table from the beginning of this notebook.*

In [None]:
...


**Question 11.** Create a function `stats_for_year` that takes a `year` and returns a table of statistics. The table it returns should have four columns: `geo`, `population_total`, `children_per_woman_total_fertility`, and `child_mortality_under_5_per_1000_born`. Each row should contain one Alpha-3 country code and three statistics: population, fertility rate, and child mortality for that `year` from the `population`, `fertility`, and `child_mortality` tables. Include only those rows for which all three statistics are available for the country and year.  Your function needs to work for any year between 1960 and 2018.

In addition, restrict the result to country codes that appears in `big_50`, an array (created below) of the 50 most populous countries in 2010. This restriction will speed up computations later in the project.

*Hint*: Nearly all the code is provided; you need to write only about one line yourself.

In [None]:
# We first create an array with the codes of the 50 biggest countries.
big_50 = population.where('time', 2010).sort('population_total', descending=True).take(np.arange(50)).column('geo')

# Next we create a population table that includes only the 
# 50 countries with the largest 2010 populations, and only the
# years after 1959.
population_big_50 = population.where('time', are.above(1959)).where('geo', are.contained_in(big_50))

def stats_for_year(year):
    """Return a table of the stats for each country that year."""
    p = population_big_50.where('time', year).drop('time')
    f = fertility.where('time', year).drop('time')
    c = child_mortality.where('time', year).drop('time')
    # Hint: you need just one more, short line of code
    ...
    


Try calling your function `stats_for_year` on any year between 1960 and 2010 in the cell below.  Try to understand the output of `stats_for_year`.

In [None]:
...

In [None]:
check1_11(stats_for_year)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


**Question 12.** Create a table called `pop_by_decade` with two columns named `decade` and `population`. The `decade` column contains the years that commence a decade: 1960, 1970, ..., 2010. The `population` column contains the total population of all `big_50` countries in the corresponding year.

*Hint:* We suggest that you define a function `pop_for_year` that computes the total population (the `stats_for_year` function from the previous question is useful here).  Then `apply` it to the `decade` column.

In [None]:
def pop_for_year(year):
    ...

decades = Table().with_column(
    'decade', ...
)

pop_by_decade = ...

# Uncomment the next line to improve the formatting of the table
#pop_by_decade.set_format('population', NumberFormatter(0))
pop_by_decade

In [None]:
check1_12(pop_by_decade)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


Next we'll load a new table.  The `countries` table describes various characteristics of countries. The `country` column contains the same codes as the `geo` column in each of the other data tables (`population`, `fertility`, and `child_mortality`). The `world_6region` column classifies each country into a region of the world. Run the cell below to inspect the data.

In [None]:
# Source: https://github.com/open-numbers/ddf--gapminder--systema_globalis/raw/master/ddf--entities--geo--country.csv
countries = Table.read_table('countries.csv').where('country', are.contained_in(population.group('geo').column(0)))
countries.select('country', 'name', 'world_6region')

**Question 13.** Create a table called `region_counts` that has two columns, `region` and `count`. It should describe the count of how many of the `big_50` countries are in each region. For example, one row would have `south_asia` as its `region` value and an integer as its `count` value.  That integer would be the number of `big_50` countries in `south_asia`.

In [None]:
region_counts = ...
region_counts

In [None]:
check1_13(region_counts)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


The following scatter diagram compares total fertility rate and child mortality rate for each country in 1960. The area of each dot represents the population of the country, and the color represents its region of the world. Run the cell, but don't worry about understanding all the code.

In [None]:
from functools import lru_cache as cache

# This cache annotation makes sure that if the same year
# is passed as an argument twice, the work of computing
# the result is only carried out once. That speeds up 
# the computation.
@cache(None)
def stats_relabeled(year):
    """Relabeled and cached version of stats_for_year."""
    return stats_for_year(year)\
           .relabeled('children_per_woman_total_fertility', 'Children per woman')\
           .relabeled('child_mortality_under_5_per_1000_born', 'Child deaths per 1000 born')

def fertilty_vs_child_mortality(year):
    """Draw a color scatter diagram comparing child mortality and fertility."""
    with_region = stats_relabeled(year).join('geo', 
                                             countries.select('country', 'world_6region'), 
                                             'country')
    with_region.scatter('Children per woman', 
                        'Child deaths per 1000 born', 
                        sizes='population_total', 
                        colors='world_6region', 
                        s=500)
    plots.xlim(0,10)
    plots.ylim(-50, 500)
    plots.title(year)

In [None]:
# Uncomment and run the next line to actually draw the scatter diagram
# for Question 14.
#fertilty_vs_child_mortality(1960)

**Question 14.** Assign `scatter_statements` to a list of the numbers for each statement below that can be inferred from this scatter diagram for 1960. 
1. The `europe_central_asia` region has the most countries with child mortality rates below 100.
2. The lowest child mortality rate of any country is from an `east_asian_pacific` country.
3. Most countries have a fertility rate above 5.
4. There is an association between child mortality and fertility.
5. The two largest countries by population also have the two highest child mortality rates.

In [None]:
scatter_statements = ...


In [None]:
check1_14(scatter_statements)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


The result of the cell below is interactive. Drag the slider to the right to see how countries have changed over time. You'll find that the great divide between so-called "developed" and "developing" countries that existed in the 1960's has nearly disappeared. This shift in fertility rates is the reason that the global population is expected to grow more slowly in the 21st century than it did in the 19th and 20th centuries.

On the flip-side, a lot of countries still put a lot of importance on having multiple children. For example, in Ghana people often have multiple children with the hope that when they grow older they will be able to help their parents out on the farm. An old tradition also rewards mothers who have their 10th children in a ceremony called `nyongmato`. Read more about it [here](https://www.bbc.com/news/world-africa-45344870).

In [None]:
import ipywidgets as widgets

# If you uncomment the next two lines, the animation will go a lot more smoothly,
# but it will take some time to run the cell.
#for y in np.arange(1960, 2015+1):
#    stats_relabeled(y)

_ = widgets.interact(fertilty_vs_child_mortality, 
                     year=widgets.IntSlider(min=1960, max=2015, value=1960))

Now is a great time to take a break and [watch the same data](https://www.gapminder.org/videos/reducing-child-mortality-a-moral-and-environmental-imperative) presented by Hans Rosling (d. 2017) in a 2010 TEDx talk with smoother animation and witty commentary (and a perfect Sweedish accent).

## 2. Global Poverty


In 1800, 85% of the world's 1 billion people lived in *absolute poverty*, [defined by the United Nations](http://www.un.org/esa/socdev/wssd/text-version/agreements/poach2.htm) as "a condition characterized by severe deprivation of basic human needs, including food, safe drinking water, sanitation facilities, health, shelter, education and information." One measure of absolute *aka* [extreme poverty](https://en.wikipedia.org/wiki/Extreme_poverty) is a person living on less than $1.00 per day adjusted to 1996 US prices.

Although the world rate of extreme poverty has declined consistently for hundreds of year, a [recent estimate](http://www.worldbank.org/en/publication/poverty-and-shared-prosperity) of the proportion of people living in extreme poverty is 10.7%&mdash;that's about 1 in 10 people. That estimate, made in 2016, is of the poverty rate in 2013. The United Nations recently adopted an [ambitious goal](http://www.un.org/sustainabledevelopment/poverty/): "By 2030, eradicate extreme poverty for all people everywhere."
In this section, we will examine extreme poverty trends around the world.

First, (re)load some tables. Although the `population` table has values for every recent year for many countries, the `poverty` table includes only years for each country in which a measurement of the rate of extreme poverty was available (and the data stops in the year 2015). The table defines extreme poverty as below 1.90 a day in 2011 US dollars, based on a [World Bank report](https://openknowledge.worldbank.org/bitstream/handle/10986/25078/9781464809583.pdf#page=55). It contains data for many [periphery](https://en.wikipedia.org/wiki/World-systems_theory) and semi-periphery countries, but not for many core countries.

In [None]:
# These two tables were already loaded above, but we'll reload them here just in case.
population = Table.read_table('population.csv')
countries = Table.read_table('countries.csv').where('country', are.contained_in(population.group('geo').column(0)))

# Source: https://github.com/open-numbers/ddf--gapminder--systema_globalis/raw/master/ddf--datapoints--extreme_poverty_percent_people_below_190_a_day--by--geo--time.csv
poverty = Table.read_table('poverty.csv')
poverty

In [None]:
poverty.where('geo', 'usa')

**Question 1.** Assign `latest` to a three-column table with one row for each country that appears in the `poverty` table. The first column should be labeled `geo` and contain the 3-letter code for the country. The second column should be labeled `time` and contain the *most recent* year for which an extreme poverty rate is available for the country. The third column should be labeled `poverty_percent` and contain the poverty rate in that year.

*Hint*: Sort `poverty` appropriately, then call `group` on it, passing `first` as an argument.

In [None]:
def first(values):
    """Return the first element of an array."""
    return values.item(0)

latest = ...
latest

In [None]:
check2_1(latest)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


**Question 2.** Using both `latest` and `population`, create a four-column table called `recent` with one row for each country in `latest`. The four columns should have the following labels and contents:
1. `geo` contains the 3-letter country code,
1. `poverty_percent` contains the most recent poverty percent,
1. `population_total` contains the population of the country in 2010,
1. `poverty_total` contains the number of people in poverty **rounded to the nearest integer** (*hint*: `np.round`), based on the 2010 population and most recent poverty rate (which might not be from 2010, but don't worry about that).

In [None]:
recent = ...
# Uncomment the line below to improve the formatting of the table
#recent.set_format(['population_total', 'poverty_total'], NumberFormatter(0))
#recent

In [None]:
check2_2(recent)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


**Question 3.** Assign the name `poverty_percent` to the percentage of the world's 2010 population that were living in extreme poverty.  To do that, you need to know the number of people in extreme poverty in 2010, as well as the total world population in 2010.  

* For the former, use the `poverty_total` numbers from the `recent` table.  (Those numbers, as we observed above in Question 2, are not completely accurate for 2010, because not all countries had a 2010 poverty rate in the `latest` table.  Don't worry about that.)  

* For the latter, use the `population` table, not the `recent` table, because the sum of the `population_total` column in the `recent` table is not the world population: only a subset of the world's countries have known poverty rates.

*Hint*: you should get an answer that is close to the 10.7% estimate mentioned at the beginning of this problem.  But yours should be a little higher, since your estimate is for 2010 rather than 2013, and poverty is generally declining.

In [None]:
poverty_percent = ...
poverty_percent

In [None]:
check2_3(poverty_percent)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


For most countries, the `countries` table includes their position on the globe.  Some countries are missing position data, so the table has `nan` for their position. (Recall that `nan` [means *not a number*](https://en.wikipedia.org/wiki/NaN).)  Don't worry about those; they won't be part of our analysis.

In [None]:
countries.select('country', 'name', 'world_4region', 'latitude', 'longitude')

**Question 4.** Using `countries` and `recent`, create a five-column table called `poverty_map` with one row for every country in `recent`.  The columns should have the following labels and contents:

1. `name` contains the country's name
2. `region` contains the country's region from the `world_4region` column of `countries`
3. `latitude` contains the country's latitude
4. `longitude` contains the country's longitude
5. `poverty_total` contains the country's poverty total

In [None]:
poverty_map = ...
poverty_map

In [None]:
check2_4(poverty_map)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


Uncomment and run the cell below to draw a map of the world in which the areas of circles represent the number of people living in extreme poverty. (Don't worry about understanding the code.) Double-click on the map to zoom in.

In [None]:
# Uncomment and run this code
#colors = {'africa': 'blue', 'europe': 'black', 'asia': 'red', 'americas': 'green'}
#scaled = poverty_map.select('latitude', 'longitude', 'name', 'region', 'poverty_total')\
#   .with_column(
#     'poverty_total', 2e4 * poverty_map.column('poverty_total'),
#     'region', poverty_map.apply(colors.get, 'region')
# )
#Circle.map_table(scaled)

Although people live in extreme poverty throughout the world, the largest numbers in this dataset are in Asia and Africa.

**Question 5.** Assign `largest` to a two-column table with the `name` (not the 3-letter code) and `poverty_total` of the 10 countries with the largest number of people living in extreme poverty, according to the `poverty_map` table.

In [None]:
largest = ...
largest

In [None]:
check2_5(largest)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


**Question 6.** Write a function called `country_poverty` that takes the name of a country as its argument. It should return a table with two columns, `Year` and `Number in poverty`.  The values in the `Year` column should be all the years for which the `poverty` table has data for that country.  The values in the `Number in poverty` column should be the number of people in extreme poverty in that country in that year, rounded to the nearest integer.  You can compute that quantity from the poverty percentage in the `poverty` table and the population (for that year and country) in the `population` table.

*Hint*: Any way that you want to compute the table is fine. Our solution used about five assignment statements and computed some intermediate tables.

In [None]:
def country_poverty(country):
    ...
    

country_poverty('India')

In [None]:
check2_6(country_poverty)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


Finally, use the function below to draw timelines to see how the world is changing. You can check your work by comparing your graphs to the ones on [gapminder.org](https://goo.gl/lPujuh).

In [None]:
def poverty_timeline(country_name):
    """Draw a timeline of the poverty in a country."""
    country_poverty(country_name).plot('Year')    

Call `poverty_timeline` on India, Nigeria, Ghana, and China to see some interesting charts.

In [None]:
# try calling poverty_timeline on some countries
...

Although the number of people living in extreme poverty has been increasing in Nigeria, the massive decreases in China and India have shaped the overall trend that extreme poverty is decreasing worldwide, both in percentage and in absolute number. 

To learn more, watch [Hans Rosling in a 2015 film](https://www.gapminder.org/videos/dont-panic-end-poverty/) about the UN goal of eradicating extreme poverty from the world. 

## 3. Submit

**You're finished!** Congratulations on mastering data visualization and table manipulation. Time to submit. Make sure you're submitting to **Part 2** of the assignment.