In [72]:
import pandas as pd
import os

## Data from World Happiness Report

The World Happiness Report is an annual publication of the United Nations Sustainable Development Solutions Network. It contains articles, and rankings of national happiness based on respondent ratings of their own lives, which the report also correlates with various life factors.

In this notebook we will explore the happiness of different countries and the features associated.
The datasets that we will use are available in *Data*: **happiness2020.pkl** and **countries_info.csv**.

Although the features are self-explanatory, here a summary: 

**happiness2020.pkl**
* country: *Name of the country*
* happiness_score: *Happiness score*
* social_support: *Social support (mitigation the effects of inequality)*
* healthy_life_expectancy: *Healthy Life Expectancy*
* freedom_of_choices: *Freedom to make life choices*
* generosity: *Generosity (charity, volunteers)*
* perception_of_corruption: *Corruption Perception*
* world_region: *Area of the world of the country*

**countries_info.csv**
* country_name: *Name of the country*
* area: *Area in sq mi*
* population: *Number of people*
* literacy: *Literacy percentage*

In [73]:
!head Data/countries_info.csv

country_name,area,population,literacy
afghanistan,647500,31056997,"36,0"
albania,28748,3581655,"86,5"
algeria,2381740,32930091,"70,0"
argentina,2766890,39921833,"97,1"
armenia,29800,2976372,"98,6"
australia,7686850,20264082,"100,0"
austria,83870,8192880,"98,0"
azerbaijan,86600,7961619,"97,0"
bahrain,665,698585,"89,1"


In [74]:
import pandas as pd
%matplotlib inline
COUNTRIES_DATASET = DATA_FOLDER+"countries_info.csv"

## Task 1: Load the data

Load the 2 datasets in Pandas dataframes (called *happiness* and *countries*), and show the first rows.


**Hint**: Use the correct reader and verify the data has the expected format.

In [75]:
PATH_IN = './data/'
fname_h = os.path.join(PATH_IN, 'happiness2020.csv')
df_happiness= pd.read_csv(fname_h, decimal=',')
df_happiness.country = df_happiness.country.apply(lambda r: r.lower())

fname_c = os.path.join(PATH_IN, 'countries_info.csv')
df_countries= pd.read_csv(fname_c, decimal=',')
df_countries.rename(columns={'country_name': 'country'}, inplace=True)

In [76]:
df_happiness.head(5)

Unnamed: 0,country,happiness_score,social_support,healthy_life_expectancy,freedom_of_choices,generosity,perception_of_corruption,world_region
0,afghanistan,2.566900015,0.470366955,52.59000015,0.396573007,-0.0964294,0.933686554,South Asia
1,albania,4.882699966000001,0.671070457,68.70813751,0.7819942240000001,-0.042309489,0.896303713,Central and Eastern Europe
2,algeria,5.005099773,0.8033851390000001,65.90517426,0.4666109089999999,-0.121105164,0.7354851370000001,Middle East and North Africa
3,argentina,5.974699974,0.900567949,68.80380249,0.831132412,-0.194913864,0.8420098420000001,Latin America and Caribbean
4,armenia,4.676799774,0.7574794290000001,66.75065613,0.7120178340000001,-0.13877961,0.7735447879999999,Commonwealth of Independent States


In [77]:
df_countries.head(5)

Unnamed: 0,country,area,population,literacy
0,afghanistan,647500,31056997,36.0
1,albania,28748,3581655,86.5
2,algeria,2381740,32930091,70.0
3,argentina,2766890,39921833,97.1
4,armenia,29800,2976372,98.6


## Task 2: Let's merge the data

Create a dataframe called *country_features* by merging *happiness* and *countries*. A row of this dataframe must describe all the features that we have about a country.

**Hint**: Verify to have all the rows in the final dataframe

In [88]:
country_features = pd.merge(df_happiness, df_countries, on="country")

In [89]:
country_features.head(5)

Unnamed: 0,country,happiness_score,social_support,healthy_life_expectancy,freedom_of_choices,generosity,perception_of_corruption,world_region,country_name,area,population,literacy
0,afghanistan,2.566900015,0.470366955,52.59000015,0.396573007,-0.0964294,0.933686554,South Asia,afghanistan,647500,31056997,36.0
1,albania,4.882699966000001,0.671070457,68.70813751,0.7819942240000001,-0.042309489,0.896303713,Central and Eastern Europe,albania,28748,3581655,86.5
2,algeria,5.005099773,0.8033851390000001,65.90517426,0.4666109089999999,-0.121105164,0.7354851370000001,Middle East and North Africa,algeria,2381740,32930091,70.0
3,argentina,5.974699974,0.900567949,68.80380249,0.831132412,-0.194913864,0.8420098420000001,Latin America and Caribbean,argentina,2766890,39921833,97.1
4,armenia,4.676799774,0.7574794290000001,66.75065613,0.7120178340000001,-0.13877961,0.7735447879999999,Commonwealth of Independent States,armenia,29800,2976372,98.6


## Task 3: Where do people are happier?

Print the top 10 countries based on their happiness score (high is better).

In [80]:
print('Top 10 countries based on their happiness score:')
top_10 = country_features.sort_values(by=['happiness_score'],ascending=False).country.values[:10]
for x in top_10:
    print('- '+x)

Top 10 countries based on their happiness score:
- finland
- denmark
- switzerland
- iceland
- norway
- netherlands
- sweden
- new zealand
- austria
- luxembourg


We are interested to know in what world region the people are happier. 

Create and print a dataframe with the (1) average happiness score and (2) the number of contries for each world region.
Sort the result to show the happiness ranking.

The first region has only a few countries! What are them and what is their score?

In [90]:
country_features[country_features.world_region=='North America and ANZ'][['country', 'happiness_score']]

Unnamed: 0,country,happiness_score
5,australia,7.222799778
21,canada,7.23210001
88,new zealand,7.299600123999999
127,united states,6.939599991000001


## Task 4: How literate is the world?

Print the name of countries with a level of literacy of 100%. 

For each country, print the name and the world region with the format: *{region name} - {country name} ({happiness score})*

In [91]:
country_features.head(5)

Unnamed: 0,country,happiness_score,social_support,healthy_life_expectancy,freedom_of_choices,generosity,perception_of_corruption,world_region,country_name,area,population,literacy
0,afghanistan,2.566900015,0.470366955,52.59000015,0.396573007,-0.0964294,0.933686554,South Asia,afghanistan,647500,31056997,36.0
1,albania,4.882699966000001,0.671070457,68.70813751,0.7819942240000001,-0.042309489,0.896303713,Central and Eastern Europe,albania,28748,3581655,86.5
2,algeria,5.005099773,0.8033851390000001,65.90517426,0.4666109089999999,-0.121105164,0.7354851370000001,Middle East and North Africa,algeria,2381740,32930091,70.0
3,argentina,5.974699974,0.900567949,68.80380249,0.831132412,-0.194913864,0.8420098420000001,Latin America and Caribbean,argentina,2766890,39921833,97.1
4,armenia,4.676799774,0.7574794290000001,66.75065613,0.7120178340000001,-0.13877961,0.7735447879999999,Commonwealth of Independent States,armenia,29800,2976372,98.6


In [92]:
for idx, row in country_features[country_features.literacy=='100,0'].iterrows():
    print("{} - {} ({})".format(row.world_region, row.country, row.happiness_score))

What is the global average?

In [93]:
country_features.literacy.mean()

81.85112781954886

Calculate the proportion of countries with a literacy level below 50%. Print the value in percentage, formatted with 2 decimals.

In [96]:
percentage = len(country_features[country_features.literacy<50])/len(country_features)
print("Percentage of countries with literacy level < 50%: {:.2%}".format(percentage))

Percentage of countries with literacy level < 50%: 11.85%


Print the raw number and the percentage of world population that is illiterate.

In [97]:
illiterate_people = country_features.population * (100 - country_features.literacy)/100
illiterate_fraction = illiterate_people.sum() / country_features.population.sum()

print("Illiterate people: {:.0f} ({:.2%})".format(illiterate_people.sum(), illiterate_fraction))

Illiterate people: 1249372988 (20.33%)


## Task 5: Population density

Add to the dataframe a new field called *population_density* computed by dividing *population* by *area*.

In [None]:
# Write your code here

What is the happiness score of the 3 countries with lowest population density?

In [None]:
# Write your code here

## Task 6: Healty and happy?

Plot in scatter plot the happiness score (x) and healty like expectancy (y).

In [None]:
# Write your code here

Feel free to continue the exploration of the dataset! We'll release the solutions next week.

----
Enjoy EPFL and be happy, next year Switzerland must be #1.