## Data from World Happiness Report

The World Happiness Report is an annual publication of the United Nations Sustainable Development Solutions Network. It contains articles, and rankings of national happiness based on respondent ratings of their own lives, which the report also correlates with various life factors.

In this notebook we will explore the happiness of different countries and the features associated.
The datasets that we will use are available in *Data*: **happiness2020.pkl** and **countries_info.csv**.

Although the features are self-explanatory, here a summary: 

**happiness2020.pkl**
* country: *Name of the country*
* happiness_score: *Happiness score*
* social_support: *Social support (mitigation the effects of inequality)*
* healthy_life_expectancy: *Healthy Life Expectancy*
* freedom_of_choices: *Freedom to make life choices*
* generosity: *Generosity (charity, volunteers)*
* perception_of_corruption: *Corruption Perception*
* world_region: *Area of the world of the country*

**countries_info.csv**
* country_name: *Name of the country*
* area: *Area in sq mi*
* population: *Number of people*
* literacy: *Literacy percentage*

In [1]:
!head Data/countries_info.csv

country_name,area,population,literacy
afghanistan,647500,31056997,"36,0"
albania,28748,3581655,"86,5"
algeria,2381740,32930091,"70,0"
argentina,2766890,39921833,"97,1"
armenia,29800,2976372,"98,6"
australia,7686850,20264082,"100,0"
austria,83870,8192880,"98,0"
azerbaijan,86600,7961619,"97,0"
bahrain,665,698585,"89,1"


In [2]:
import pandas as pd
%matplotlib inline

DATA_FOLDER = 'Data/'

HAPPINESS_DATASET = DATA_FOLDER+"happiness2020.csv"
COUNTRIES_DATASET = DATA_FOLDER+"countries_info.csv"

## Task 1: Load the data

Load the 2 datasets in Pandas dataframes (called *happiness* and *countries*), and show the first rows.


**Hint**: Use the correct reader and verify the data has the expected format.

In [4]:
# load the dataset happiness

happiness = pd.read_csv(HAPPINESS_DATASET)
happiness.head()



Unnamed: 0,country,happiness_score,social_support,healthy_life_expectancy,freedom_of_choices,generosity,perception_of_corruption,world_region
0,Afghanistan,2.5669,0.470367,52.59,0.396573,-0.096429,0.933687,South Asia
1,Albania,4.8827,0.67107,68.708138,0.781994,-0.042309,0.896304,Central and Eastern Europe
2,Algeria,5.0051,0.803385,65.905174,0.466611,-0.121105,0.735485,Middle East and North Africa
3,Argentina,5.9747,0.900568,68.803802,0.831132,-0.194914,0.84201,Latin America and Caribbean
4,Armenia,4.6768,0.757479,66.750656,0.712018,-0.13878,0.773545,Commonwealth of Independent States


In [5]:
# load the dataset countries

countries = pd.read_csv(COUNTRIES_DATASET)
countries.head()

Unnamed: 0,country_name,area,population,literacy
0,afghanistan,647500,31056997,360
1,albania,28748,3581655,865
2,algeria,2381740,32930091,700
3,argentina,2766890,39921833,971
4,armenia,29800,2976372,986


## Task 2: Let's merge the data

Create a dataframe called *country_features* by merging *happiness* and *countries*. A row of this dataframe must describe all the features that we have about a country.

**Hint**: Verify that all the rows are in the final dataframe.

In [27]:
# merge the two datasets

# Clean country names in both dataframes
happiness['country_cleaned'] = happiness['country'].str.strip().str.lower()
countries['country_name_cleaned'] = countries['country_name'].str.strip().str.lower()

# Merge on the cleaned country name columns
country_features = happiness.merge(countries, left_on='country_cleaned', right_on='country_name_cleaned', how='outer')

country_features.head()


Unnamed: 0,country,happiness_score,social_support,healthy_life_expectancy,freedom_of_choices,generosity,perception_of_corruption,world_region,country_cleaned,country_name,area,population,literacy,country_name_cleaned
0,Afghanistan,2.5669,0.470367,52.59,0.396573,-0.096429,0.933687,South Asia,afghanistan,afghanistan,647500,31056997,360,afghanistan
1,Albania,4.8827,0.67107,68.708138,0.781994,-0.042309,0.896304,Central and Eastern Europe,albania,albania,28748,3581655,865,albania
2,Algeria,5.0051,0.803385,65.905174,0.466611,-0.121105,0.735485,Middle East and North Africa,algeria,algeria,2381740,32930091,700,algeria
3,Argentina,5.9747,0.900568,68.803802,0.831132,-0.194914,0.84201,Latin America and Caribbean,argentina,argentina,2766890,39921833,971,argentina
4,Armenia,4.6768,0.757479,66.750656,0.712018,-0.13878,0.773545,Commonwealth of Independent States,armenia,armenia,29800,2976372,986,armenia


## Task 3: Where are people happier?

Print the top 10 countries based on their happiness score (higher is better).

In [28]:
# sort the countries by their happiness score

country_features.sort_values(by='happiness_score', ascending=False).head(10)

Unnamed: 0,country,happiness_score,social_support,healthy_life_expectancy,freedom_of_choices,generosity,perception_of_corruption,world_region,country_cleaned,country_name,area,population,literacy,country_name_cleaned
38,Finland,7.8087,0.95433,71.900825,0.949172,-0.059482,0.195445,Western Europe,finland,finland,338145,5231372,1000,finland
31,Denmark,7.6456,0.955991,72.402504,0.951444,0.066202,0.168489,Western Europe,denmark,denmark,43094,5450661,1000,denmark
115,Switzerland,7.5599,0.942847,74.102448,0.921337,0.105911,0.303728,Western Europe,switzerland,switzerland,41290,7523934,990,switzerland
50,Iceland,7.5045,0.97467,73.0,0.948892,0.246944,0.71171,Western Europe,iceland,iceland,103000,299388,999,iceland
92,Norway,7.488,0.952487,73.200783,0.95575,0.134533,0.263218,Western Europe,norway,norway,323802,4610820,1000,norway
87,Netherlands,7.4489,0.939139,72.300919,0.908548,0.207612,0.364717,Western Europe,netherlands,netherlands,41526,16491461,990,netherlands
114,Sweden,7.3535,0.926311,72.600769,0.939144,0.111615,0.25088,Western Europe,sweden,sweden,449964,9016596,990,sweden
88,New Zealand,7.2996,0.949119,73.202629,0.936217,0.191598,0.221139,North America and ANZ,new zealand,new zealand,268680,4076140,990,new zealand
6,Austria,7.2942,0.928046,73.002502,0.899989,0.085429,0.499955,Western Europe,austria,austria,83870,8192880,980,austria
72,Luxembourg,7.2375,0.906912,72.599998,0.905636,-0.004621,0.367084,Western Europe,luxembourg,luxembourg,2586,474413,1000,luxembourg


We are interested to know in what world region people are happier. 

Create and print a dataframe with the (1) average happiness score and (2) the number of contries for each world region.
Sort the result to show the happiness ranking.

In [29]:
# Create and print a dataframe with the (1) average happiness score and (2) the number of contries for each world region. Sort the result to show the happiness ranking.

happiness_region = country_features.groupby('world_region').agg({'happiness_score':'mean', 'country':'count'}).sort_values(by='happiness_score', ascending=False)
happiness_region

Unnamed: 0_level_0,happiness_score,country
world_region,Unnamed: 1_level_1,Unnamed: 2_level_1
North America and ANZ,7.173525,4
Western Europe,6.967405,20
Latin America and Caribbean,5.97128,20
Central and Eastern Europe,5.891393,14
Southeast Asia,5.517788,8
East Asia,5.483633,3
Commonwealth of Independent States,5.358342,12
Middle East and North Africa,5.269306,16
Sub-Saharan Africa,4.393856,32
South Asia,4.355083,6


The first region has only a few countries! What are them and what is their score?

In [30]:
# Write your code here
# print the countries of the first region and their happiness score
country_features[country_features.world_region == happiness_region.index[0]][['country', 'happiness_score']]

Unnamed: 0,country,happiness_score
5,Australia,7.2228
21,Canada,7.2321
88,New Zealand,7.2996
127,United States,6.9396


## Task 4: How literate is the world?

Print the names of the countries with a level of literacy of 100%. 

For each country, print the name and the world region in the format: *{region name} - {country name} ({happiness score})*

In [35]:
# print the countries with a 100 literacy rate, their happiness score and the region to which they belong
country_features[country_features.literacy == "100,0"][['world_region','country', 'happiness_score' ]]

Unnamed: 0,world_region,country,happiness_score
5,North America and ANZ,Australia,7.2228
31,Western Europe,Denmark,7.6456
38,Western Europe,Finland,7.8087
72,Western Europe,Luxembourg,7.2375
92,Western Europe,Norway,7.488


What is the global average?

In [40]:
# global average literacy rate
#country_features.literacy = country_features.literacy.str.replace(",",".").astype(float)
global_avg_literacy = country_features.literacy.mean()
global_avg_literacy

81.85112781954888

Calculate the proportion of countries with a literacy level below 50%. Print the value in percentage, formatted with 2 decimals.

In [47]:
# Write your code here
below_fifty = country_features[country_features.literacy < 50].literacy.count()
total =country_features.literacy.count()
print((below_fifty/total)*100)

12.030075187969924


Print the raw number and the percentage of world population that is illiterate.

In [52]:
# Write your code here
#print the proportion of world population that is literate
#country_features.literacy = country_features.literacy.str.replace(",",".").astype(float)
country_features['literacy_population'] = (country_features['literacy']*country_features['population'])/100
total_population = country_features['population'].sum()
total_literacy_population = country_features['literacy_population'].sum()
print(total_literacy_population)
print(total_population)
print(100-(total_literacy_population/total_population)*100)


4888612110.896999
6145475101
20.45184415275692


## Task 5: Population density

Add to the dataframe a new field called *population_density* computed by dividing *population* by *area*.

In [12]:
# Write your code here

What is the happiness score of the 3 countries with the lowest population density?

In [13]:
# Write your code here

## Task 6: Healty and happy?

Plot in a scatter plot the happiness score (x) and healty life expectancy (y).

In [14]:
# Write your code here

Feel free to continue the exploration of the dataset! We'll release the solutions next week.

----
Enjoy EPFL and be happy, next year Switzerland must be #1.