![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Countries - Becoming a Global Citizen!

**Submitted by: A, B, C, D**

<table><tr>
<td> <img src="data/map.png" alt="World Map" style="width: 550px;"/> </td>
<td> <img src="data/globe.jpeg" alt="Globe" style="width: 420px;"/> </td>
</tr></table>

The world is a big place with over [7.7 billion](https://www.worldometers.info/) people distibuted in over [193](https://www.worldometers.info/geography/how-many-countries-are-there-in-the-world/) nations. It's not hard to imagine that there are many similarities and differences between people and nations from across the globe with respect to cultural, environmental, political, or economic systems. One way to get to know and engage our global neighbours better is through data. In this hackathon notebook you will play with some global datasets that will hopefully encourage you to become a better global citizen. Happy hacking!

## Getting ready

This section sets up many things behind the scenes which are required for the rest of this notebook. Most of the code blocks in this section are *ready-to-run* so you won't have to do any modifications. You don't need to know everything about various tasks being accomplished by the code cell in this section to complete the challenges. However feel free to ask mentors about anything that makes you curious.

### 1. Install/Import libraries

Run the cell below to download and install required Python libraries. It may take few minutes to complete the execution of the cell.

In [None]:
# load libraries
import pandas as pd
import IPython
from plotly.offline import init_notebook_mode

#to enable plotting in Colab
def enable_plotly_in_cell():
    display(IPython.core.display.HTML('''<script src="/static/components/requirejs/require.js"></script>'''))
    init_notebook_mode(connected=False)
get_ipython().events.register('pre_run_cell', enable_plotly_in_cell)

### 2. Import data and create a dataframe

This dataset was created by the [Bootstrap](https://www.bootstrapworld.org/index.shtml) program based out of Brown University and can be downloaded from  [here](https://docs.google.com/spreadsheets/d/19VoYxPw0tmuSViN1qFIkyUoepjNSRsuQCe0TZZDmrZs/edit#gid=213565368).

The data were aggregated from the following sources:
 - The World Factbook:
  - [GDP (PPP)](https://www.cia.gov/library/publications/the-world-factbook/rankorder/2001rank.html)
  - [Life expectancy at birth](https://www.cia.gov/library/publications/the-world-factbook/fields/355rank.html)
  - [Population](https://www.cia.gov/library/publications/the-world-factbook/fields/335rank.html) 

- Wikipedia: 
 - [Universal Health Care](https://en.wikipedia.org/wiki/List_of_countries_with_universal_health_care)

Some countries/territories/regions were omitted from the dataset due to incomplete data.

For this hackathon the dataset is stored in cloud storage so we can import it into this notebook. Executing cells below will also create a dataframe and make you aware of some interesting facts about the dataset.   

In [None]:
#reading from cloud object storage
target_url="https://swift-yeg.cloud.cybera.ca:8080/v1/AUTH_d22d1e3f28be45209ba8f660295c84cf/hackaton/countries2.csv"

In [None]:
#reading the input file and creating dataframe
countries = pd.read_csv(target_url) 

In [None]:
#how many rows and colums does the dataframe have?
countries.shape

How many rows are in this dataset (also known as a dataframe)?

* Would you consider this a **big** dataset?

In [None]:
#what are the column names?
countries.columns

Now you know which columns are there in the dataset, but what do those columns refer to? Here is the description for some of the columns:

* **life-expectancy (yrs)** -  the average number of years to be lived by a group of people born in the same year, if mortality at each age remains constant in the future. Life expectancy at birth is also a measure of overall quality of life in a country and summarizes the mortality at all ages.

* **gdp (\$US)**  - the sum value of all goods and services produced in the country valued at prices prevailing in the United States.

* **population** - population of the country.

* **has-univ-healthcare** - Universal health coverage is a broad concept that has been implemented in several ways. The common denominator for all such programs is some form of government action aimed at extending access to health care as widely as possible and setting minimum standards.

* **code** - Country code

In [None]:
#display first 5 rows to explore what the data look like
countries.head()

In [None]:
#let's create another column - GDP per person
countries['gdp ($US) per person'] = countries['gdp ($US)']/countries["population"]
countries.head()

Now everything is set up for crunching the **countries** dataframe. Your group can go through the rest of the notebook and work on challenges.

**While working on the challenges, feel free to add new code/markdown cells as needed.**

## Part A: Exploring country data

We can  plot a map of all the countries we have in our dataframe using the `px.choropleth()`  function. Lets try creating a map and coloring countries differently depending on life expectancy. 

Look closely at the map and try hovering over different countries. Which country has the highest life expectancy? 

In [None]:
#library should be installed already, if not then uncomment the next line and run this cell
#!pip install plotly_express

In [None]:
import plotly.express as px

In [None]:
fig = px.choropleth(countries, locations="code",
                    color="life-expectancy (yrs)", #coloring by life-expectancy
                    hover_name="country") #country name will appear when you hover your mouse over it
fig.show()

### Challenges:

Ok you probably spent a bunch of time hovering over different countries to see which one had the largest value for life expectancy. Let's see if we can use our **countries** dataframe to verify the result of our hovering. Don't worry we'll get back to mapping shortly :)

* Using your **countries** dataframe, which country has the highest life expectancy and what is the exact number? HINT, remember your python and dataframes tutorial. 
> `countries.sort_values("my favourite column", ascending=False)`
* Which country has the lowest life expectancy and what is the exact number?
* What is the exact life expectancy for people in China? HINT, this is how you could do this for Japan
> `countries[countries["country"]=="Japan"]`
* What is the exact life expactancy for people in Canada? How different are the life expectancies in China and Canada?

* Ok back to mapping! Using the cells above as an example, can you draw a country map that is coloured by `gdp ($US) per person`?
* You have now created two different maps, do they look similar or different? Any ideas as to why they might look similar or different?

## Part B: Diving deeper into country data

Let us explore the **countries** dataframe even further by finding the top 20 countries with the highest "gdp per person" values.   

In [None]:
#library should be installed already
#!pip install cufflinks ipywidgets

In [None]:
#we are using this library to create plots
import cufflinks as cf
cf.go_offline()

In [None]:
#select only two columns - "gdp ($US) per person" and "country"
gdp_person = countries[["gdp ($US) per person","country"]]

#order by "gdp ($US) person", having highest numbers on top and get top 20
gdp_person = gdp_person.sort_values("gdp ($US) per person", ascending = False).head(20)

gdp_person

In [None]:
#plotting top 20 countries, setting index to country - so the bars are marked with country names
gdp_person.set_index("country").iplot(kind = "bar",  yTitle='GDP (USD) Per Person', xTitle="Country")

It looks like some of the countries in the top 20 are quite small, like Luxembourg and Brunei. 

Let's find out what the population is for these countries.

In [None]:
# creating a new column  - population in thousands 
countries["population_t"] = countries["population"]/1000

In [None]:
#this time we select 3 columns - "gdp ($US) per person", "population_t" and "country"
gdp_person_pop = countries[["gdp ($US) per person","population_t" ,"country"]]

#sorting again by "gdp ($US) per person"
gdp_person_pop = gdp_person_pop.sort_values("gdp ($US) per person", ascending = False).head(20)


gdp_person_pop.set_index("country").iplot(kind = "bar",yTitle="GDP per person, Population (thousands)",xTitle="Country")

We can see that the majority of countries in the top 10 have smaller populations, except the United States whose population is significantly larger than all the others in our list. GDP per person (or per capita) is often used as an indicator of how successful a country is economically. The United States and Germany are examples of countries with large populations that have also have large economies.

### Challenges 

* Using the cells above as an example, can you find the top 20 countries with the lowest life expectancy?
 * Do any of these countries have Universal Health Care?
* How many countries in the entire **countries** dataframe have Universal Health Care?

## Part C: Exploring data by continent

In addition to countries, our dataframe also contains information about continents. There are 7 large landmass regions known as continents including Antarctica. 

How are countries distributed amongst the continents?

In [None]:
#unique continents
continents = countries["continent"].unique()

#how many of them?
print(len(continents)," continents")

continents

In [None]:
#group by continent and calclulate how many rows/countries
counts_by_continent = countries.groupby("continent").size()

#Create additional column - count
counts_by_continent = counts_by_continent.reset_index(name="count")

counts_by_continent

In [None]:
#using kind = "pie" to create a pie chart
counts_by_continent.iplot(kind="pie",labels = "continent",values = "count")

It looks like Asia, Africa, and Europe have almost an equal number of countries. 

What does the continental **population** distribution look like?

In [None]:
#create a new sum_by_continent dataframe by
#grouping by continent and calclulating the sum for every column
sum_by_continent = countries.groupby("continent").sum()

#convert index(row names) into additional column
sum_by_continent = sum_by_continent.reset_index()

sum_by_continent

In [None]:
# we select only one column, population, and create a pie chart
sum_by_continent.iplot(kind="pie", values="population",labels="continent")

Wow, looks like the majority of the world's population lives in Asia!

### Challenges 

* Using the cell above as an example, can you plot the number of countries by continent that have Universal Health Care?
 * Which continent has the most countries with Universal Health Care?
* Create a **mean_by_continent** dataframe that calculates the average for every column in the **countries** dataframe. HINT, use the `mean()` function.
* Using, the **mean_by_continent** dataframe that you created, plot the average life expectancy per continent.
 * Which continents have the highest and lowest average life expectancy? 
 * What is the difference between the two values?

## Summary

This notebook analyzes some global datasets and analyzed population, life expectancy, health care, and [GDP](https://en.wikipedia.org/wiki/Gross_domestic_product).

Through these hackathon challenges you learned how to analyze a dataset, create visualizations, and develop [*computational thinking*](https://en.wikipedia.org/wiki/Computational_thinking) skills that can be used to solve various problems. Hopefully this also encourages you to become a better global citizen.

## Hackathon Reflections
Write about some or all of the following questions, either individually in separate markdown cells or as a group.
- What is something you learned through this process?
- How well did your group work together? Why do you think that is?
- What were some of the hardest parts?
- What are you proud of? What would you like to show others?
- Are you curious about anything else related to this? Did anything surprise you?
- How can you apply your learning to future activities?

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)