## Data from World Happiness Report

The World Happiness Report is an annual publication of the United Nations Sustainable Development Solutions Network. It contains articles, and rankings of national happiness based on respondent ratings of their own lives, which the report also correlates with various life factors.

In this notebook we will explore the happiness of different countries and the features associated.
The datasets that we will use are available in *Data*: **happiness2020.pkl** and **countries_info.csv**.

Although the features are self-explanatory, here a summary: 

**happiness2020.pkl**
* country: *Name of the country*
* happiness_score: *Happiness score*
* social_support: *Social support (mitigation the effects of inequality)*
* healthy_life_expectancy: *Healthy Life Expectancy*
* freedom_of_choices: *Freedom to make life choices*
* generosity: *Generosity (charity, volunteers)*
* perception_of_corruption: *Corruption Perception*
* world_region: *Area of the world of the country*

**countries_info.csv**
* country_name: *Name of the country*
* area: *Area in sq mi*
* population: *Number of people*
* literacy: *Literacy percentage*

In [1]:
!head Data/countries_info.csv

'head' n'est pas reconnu en tant que commande interne
ou externe, un programme ex�cutable ou un fichier de commandes.


In [2]:
import pandas as pd
%matplotlib inline


DATA_FOLDER = 'Data/'

HAPPINESS_DATASET = DATA_FOLDER+"happiness2020.csv"
COUNTRIES_DATASET = DATA_FOLDER+"countries_info.csv"

## Task 1: Load the data

Load the 2 datasets in Pandas dataframes (called *happiness* and *countries*), and show the first rows.


**Hint**: Use the correct reader and verify the data has the expected format.

In [3]:
happiness = pd.read_csv(HAPPINESS_DATASET,index_col=['country'])
countries = pd.read_csv(COUNTRIES_DATASET,index_col=['country_name'])
countries["literacy"] = countries["literacy"].str.replace(",",".").astype(float)
happiness.index = happiness.index.str.lower()
happiness .head(20)


Unnamed: 0_level_0,happiness_score,social_support,healthy_life_expectancy,freedom_of_choices,generosity,perception_of_corruption,world_region
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
afghanistan,2.5669,0.470367,52.59,0.396573,-0.096429,0.933687,South Asia
albania,4.8827,0.67107,68.708138,0.781994,-0.042309,0.896304,Central and Eastern Europe
algeria,5.0051,0.803385,65.905174,0.466611,-0.121105,0.735485,Middle East and North Africa
argentina,5.9747,0.900568,68.803802,0.831132,-0.194914,0.84201,Latin America and Caribbean
armenia,4.6768,0.757479,66.750656,0.712018,-0.13878,0.773545,Commonwealth of Independent States
australia,7.2228,0.944855,73.604538,0.915432,0.19046,0.415169,North America and ANZ
austria,7.2942,0.928046,73.002502,0.899989,0.085429,0.499955,Western Europe
azerbaijan,5.1648,0.819308,65.5084,0.786824,-0.240255,0.552538,Commonwealth of Independent States
bahrain,6.2273,0.876342,68.5,0.905856,0.133729,0.739347,Middle East and North Africa
bangladesh,4.8328,0.687293,64.503067,0.900625,-0.033665,0.661844,South Asia


In [4]:
countries.head(20)

Unnamed: 0_level_0,area,population,literacy
country_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
afghanistan,647500,31056997,36.0
albania,28748,3581655,86.5
algeria,2381740,32930091,70.0
argentina,2766890,39921833,97.1
armenia,29800,2976372,98.6
australia,7686850,20264082,100.0
austria,83870,8192880,98.0
azerbaijan,86600,7961619,97.0
bahrain,665,698585,89.1
bangladesh,144000,147365352,43.1


## Task 2: Let's merge the data

Create a dataframe called *country_features* by merging *happiness* and *countries*. A row of this dataframe must describe all the features that we have about a country.

**Hint**: Verify that all the rows are in the final dataframe.

In [5]:
country_features=pd.merge(happiness ,countries,right_index=True,left_index=True)
country_features


Unnamed: 0_level_0,happiness_score,social_support,healthy_life_expectancy,freedom_of_choices,generosity,perception_of_corruption,world_region,area,population,literacy
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
afghanistan,2.5669,0.470367,52.590000,0.396573,-0.096429,0.933687,South Asia,647500,31056997,36.0
albania,4.8827,0.671070,68.708138,0.781994,-0.042309,0.896304,Central and Eastern Europe,28748,3581655,86.5
algeria,5.0051,0.803385,65.905174,0.466611,-0.121105,0.735485,Middle East and North Africa,2381740,32930091,70.0
argentina,5.9747,0.900568,68.803802,0.831132,-0.194914,0.842010,Latin America and Caribbean,2766890,39921833,97.1
armenia,4.6768,0.757479,66.750656,0.712018,-0.138780,0.773545,Commonwealth of Independent States,29800,2976372,98.6
...,...,...,...,...,...,...,...,...,...,...
venezuela,5.0532,0.890408,66.505341,0.623278,-0.169091,0.837038,Latin America and Caribbean,912050,25730435,93.4
vietnam,5.3535,0.849987,67.952736,0.939593,-0.094533,0.796421,Southeast Asia,329560,84402966,90.3
yemen,3.5274,0.817981,56.727283,0.599920,-0.157735,0.800288,Middle East and North Africa,527970,21456188,50.2
zambia,3.7594,0.698824,55.299377,0.806500,0.078037,0.801290,Sub-Saharan Africa,752614,11502010,80.6


## Task 3: Where are people happier?

Print the top 10 countries based on their happiness score (higher is better).

In [6]:
country_features.happiness_score.sort_values(ascending=False).head(10)

country
finland        7.8087
denmark        7.6456
switzerland    7.5599
iceland        7.5045
norway         7.4880
netherlands    7.4489
sweden         7.3535
new zealand    7.2996
austria        7.2942
luxembourg     7.2375
Name: happiness_score, dtype: float64

In [7]:
df = country_features.reset_index()
df.pivot_table(index='world_region',columns='country',values='happiness_score').mean(axis='columns',numeric_only=True)

world_region
Central and Eastern Europe            5.891393
Commonwealth of Independent States    5.358342
East Asia                             5.483633
Latin America and Caribbean           5.971280
Middle East and North Africa          5.269306
North America and ANZ                 7.173525
South Asia                            4.355083
Southeast Asia                        5.517788
Sub-Saharan Africa                    4.393856
Western Europe                        6.967405
dtype: float64

We are interested to know in what world region people are happier. 

Create and print a dataframe with the (1) average happiness score and (2) the number of contries for each world region.
Sort the result to show the happiness ranking.

In [8]:
df = country_features.reset_index()
country_count=pd.DataFrame(country_features.world_region.value_counts())
avg_happiness = pd.DataFrame(df.pivot_table(index='world_region',columns='country',values='happiness_score').mean(axis='columns',numeric_only=True))
avg_happiness.rename(columns={0:'average_happiness'}, inplace=True)
region_happiness = pd.merge(country_count,avg_happiness,right_index=True,left_index=True)
region_happiness.head(15)

Unnamed: 0_level_0,count,average_happiness
world_region,Unnamed: 1_level_1,Unnamed: 2_level_1
Sub-Saharan Africa,32,4.393856
Latin America and Caribbean,20,5.97128
Western Europe,20,6.967405
Middle East and North Africa,16,5.269306
Central and Eastern Europe,14,5.891393
Commonwealth of Independent States,12,5.358342
Southeast Asia,8,5.517788
South Asia,6,4.355083
North America and ANZ,4,7.173525
East Asia,3,5.483633


The first region has only a few countries! What are them and what is their score?

In [9]:
country_features[country_features.world_region=='North America and ANZ'].happiness_score

country
australia        7.2228
canada           7.2321
new zealand      7.2996
united states    6.9396
Name: happiness_score, dtype: float64

## Task 4: How literate is the world?

Print the names of the countries with a level of literacy of 100%. 

For each country, print the name and the world region in the format: *{region name} - {country name} ({happiness score})*

In [10]:
for i in country_features[country_features['literacy']==100.0].index :
    print(f"{country_features.loc[i]['world_region']} - {i} {country_features.loc[i]['happiness_score']}")

North America and ANZ - australia 7.222799778
Western Europe - denmark 7.645599842
Western Europe - finland 7.808700085
Western Europe - luxembourg 7.237500191
Western Europe - norway 7.487999916000001


What is the global average?

In [11]:
print(country_features.literacy.mean())

81.85112781954888


Calculate the proportion of countries with a literacy level below 50%. Print the value in percentage, formatted with 2 decimals.

In [12]:
print(f"{len(country_features.query('literacy>50')) / len(country_features):.2%} ")

86.67% 


Print the raw number and the percentage of world population that is illiterate.

In [13]:
country_features["illiterate_population"]=(100-country_features["literacy"])*country_features["population"]/100
print(f"there are approximately {int(country_features.illiterate_population.sum())} illiterate individuals in the world, that represent {country_features.illiterate_population.sum()/country_features.population.sum():.2%} of the world population")

there are approximately 1249372988 illiterate individuals in the world, that represent 20.33% of the world population


## Task 5: Population density

Add to the dataframe a new field called *population_density* computed by dividing *population* by *area*.

In [14]:
country_features["population_density"]=country_features["population"]/country_features["area"]
country_features.head(15)

Unnamed: 0_level_0,happiness_score,social_support,healthy_life_expectancy,freedom_of_choices,generosity,perception_of_corruption,world_region,area,population,literacy,illiterate_population,population_density
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
afghanistan,2.5669,0.470367,52.59,0.396573,-0.096429,0.933687,South Asia,647500,31056997,36.0,19876480.0,47.964474
albania,4.8827,0.67107,68.708138,0.781994,-0.042309,0.896304,Central and Eastern Europe,28748,3581655,86.5,483523.4,124.587971
algeria,5.0051,0.803385,65.905174,0.466611,-0.121105,0.735485,Middle East and North Africa,2381740,32930091,70.0,9879027.0,13.826065
argentina,5.9747,0.900568,68.803802,0.831132,-0.194914,0.84201,Latin America and Caribbean,2766890,39921833,97.1,1157733.0,14.428413
armenia,4.6768,0.757479,66.750656,0.712018,-0.13878,0.773545,Commonwealth of Independent States,29800,2976372,98.6,41669.21,99.878255
australia,7.2228,0.944855,73.604538,0.915432,0.19046,0.415169,North America and ANZ,7686850,20264082,100.0,0.0,2.636201
austria,7.2942,0.928046,73.002502,0.899989,0.085429,0.499955,Western Europe,83870,8192880,98.0,163857.6,97.685466
azerbaijan,5.1648,0.819308,65.5084,0.786824,-0.240255,0.552538,Commonwealth of Independent States,86600,7961619,97.0,238848.6,91.935554
bahrain,6.2273,0.876342,68.5,0.905856,0.133729,0.739347,Middle East and North Africa,665,698585,89.1,76145.77,1050.503759
bangladesh,4.8328,0.687293,64.503067,0.900625,-0.033665,0.661844,South Asia,144000,147365352,43.1,83850890.0,1023.3705


What is the happiness score of the 3 countries with the lowest population density?

In [15]:
country_features.loc[country_features.population_density.sort_values(ascending=True).head(3).index].happiness_score

country
mongolia     5.4562
australia    7.2228
botswana     3.4789
Name: happiness_score, dtype: float64

## Task 6: Healty and happy?

Plot in a scatter plot the happiness score (x) and healty life expectancy (y).

In [18]:
country_features.plot.scatter(x='happiness_score',y='healthy_life_expectancy')

<Axes: xlabel='happiness_score', ylabel='healthy_life_expectancy'>

Feel free to continue the exploration of the dataset! We'll release the solutions next week.

----
Enjoy EPFL and be happy, next year Switzerland must be #1.