![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Finteresting-problems&branch=main&subPath=notebooks/populations-of-countries.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Populations of Countries

[Watch on YouTube](https://www.youtube.com/watch?v=RajGSduDvqo&list=PL-j7ku2URmjZYtWzMCS4AqFS5SXPXRHwf)

What are the most and least populated countries in the world?

We are going to use Gapminder data from http://gapm.io/dpop to find out.

First we need to download the data from the Google spreadsheet. Select the following code cell and use the `Run` button to run the code.

In [None]:
spreadsheet_key = '18Ep3s1S0cvlT1ovQG9KdipLEoQ1Ktz5LtTTQpDcWbX0' # from the URL
spreadsheet_gid = '1668956939' # the first sheet
csv_link = 'https://docs.google.com/spreadsheets/d/'+spreadsheet_key+'/export?gid='+spreadsheet_gid+'&format=csv'
import pandas as pd
data = pd.read_csv(csv_link)
data

## Current Population

Since that data set contains the years 1800 to 2100 (including expected future population sizes), we need to filter the data by year.

The following code will also sort the countries by population (in descending order) and re-number the rows. Run the code cell.

In [None]:
year = 2020
current_population = data[data['time']==year].sort_values('population', ascending=False).reset_index(drop=True)
current_population

### Ten Most Populated Countries

To find the 10 most populated countries from this data set, we use the `.head(10)` method. Run the cell.

Notice that the row numbering starts from 0.

In [None]:
current_population.head(10)

### Ten Least Populated Countries

Display the 10 least populated countries using `.tail(10)`.

In [None]:
current_population.tail(10)

## Population Rank of a Specific Country

To see population, and rank, of a particular country, run the following code cell. Change the name in the first line to look at a different country.

In [None]:
country = 'Canada'
current_population[current_population['name']==country]

## Listing All Countries

To see a list of all countries in the data set, run the next cell.

In [None]:
current_population['name'].values

## Comparing Populations

To compare populations of countries, we can make a horizontal bar graph.

After you run the following cell, you can mouse over and zoom in on parts of the graph to have a closer look.

In [None]:
import plotly.express as px
year = 2020
px.bar(data[data['time']==year].sort_values('population'), y='name', x='population', 
       title='Populations of Countries in '+str(year), orientation='h', height=1500)

Or if we want to just chart the 20 largest countries by population.

In [None]:
year = 2020
px.bar(data[data['time']==year].sort_values('population').tail(20), y='name', x='population', 
       title='Populations of Countries in '+str(year), orientation='h')

## Population Change

To look at the population of a country over time, we can make a line graph.

In [None]:
country = 'Canada'
px.line(data[data['name']==country], x='time', y='population', title='Population of '+country)

Or we can compare a few different countries over time.

In [None]:
countries = ['Canada', 'Mexico', 'Costa Rica']
px.line(data[data['name'].isin(countries)], x='time', y='population', color='name', title='Populations of Countries over Time')

# Summary

This notebook allowed you to discover and rank the populations of countries. For more information about this  data set, and other data sets to explore, check out [Gapminder.org](https://www.gapminder.org).

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)