# World Happiness Report Analysis

## Importing Libraries

In [90]:
import numpy as np
import pandas as pd

Read the raw data into Python:

## Reading Data

In [105]:
data_2015 = pd.read_csv('data/2015.csv', header = 0, 
                        names=['country', 'region', 'rank', 
                               'score', 'se', 'economy', 
                               'family', 'health', 'freedom', 
                               'trust', 'generosity', 'dystopian'])

data_2016 = pd.read_csv('data/2016.csv', header = 0, 
                        names=['country', 'region', 'rank', 
                               'score', 'lower', 'upper', 
                               'economy', 'family', 'health', 
                               'freedom', 'trust', 'generosity', 
                               'dystopian'])

data_2017 = pd.read_csv('data/2017.csv', header = 0, 
                        names=['country', 'rank', 'score', 
                               'high', 'low', 'economy', 'family', 
                               'health', 'freedom', 'generosity', 
                               'trust', 'dystopian'])

data_2018 = pd.read_csv('data/2018.csv', header = 0, 
                        names=['rank', 'country', 'score', 
                               'economy', 'family', 'health', 
                               'freedom', 'generosity', 'trust'])

data_2019 = pd.read_csv('data/2019.csv', header = 0, 
                        names=['rank', 'country', 'score', 
                               'economy', 'family', 'health', 
                               'freedom', 'generosity', 'trust'])

### Data manipulation and standardization:

The 2015 dataset included region labels for every country. In order to facilitate adding region to every DataFrame, we needed to create a DataFrame that matches countries to their corresponding regions. This new indexing DataFrame can then be used to incorporate region into every dataset. We needed to do this before modifying the underlying DataFrame of the 2015 data, so we decided to save this information first.

In [92]:
country_region_df = data_2015[['country', 'region']]
country_region_df

Unnamed: 0,country,region
0,Switzerland,Western Europe
1,Iceland,Western Europe
2,Denmark,Western Europe
3,Norway,Western Europe
4,Canada,North America
...,...,...
153,Rwanda,Sub-Saharan Africa
154,Benin,Sub-Saharan Africa
155,Syria,Middle East and Northern Africa
156,Burundi,Sub-Saharan Africa


Variables were different across the four years, so we created a list of variables common to all DataFrames that were sufficient for analysis. These variables are sliced from each dataset to allow for merging.

In [124]:
standard_variables = ['country', 'rank', 
                      'score', 'economy', 'family', 
                      'health', 'freedom', 'trust', 
                      'generosity']

We can then index every DataFrame by the standard variables given, making for easy merging.

In [158]:
data_2015 = data_2015[standard_variables]
data_2016 = data_2016[standard_variables]
data_2017 = data_2017[standard_variables]
data_2018 = data_2018[standard_variables]
data_2019 = data_2019[standard_variables]

From here, we can construct a DataFrame containing all of the years, by using `concat`.

In [159]:
full_data = pd.concat([data_2015, data_2016, data_2017, 
                            data_2018, data_2019], 
                           keys = ['2015', '2016', '2017', 
                                   '2018', '2019'], 
                           names = ['year'],
                           axis = 0)

We have now created a DataFrame indexed by both year and the initial implicit indices. We will reset the index, and get rid of the implicit index column. 

In [160]:
full_data.reset_index(inplace = True)
full_data.drop('level_1', inplace = True, axis = 1)
full_data

Unnamed: 0,year,country,rank,score,economy,family,health,freedom,trust,generosity
0,2015,Switzerland,1,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678
1,2015,Iceland,2,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.43630
2,2015,Denmark,3,7.527,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139
3,2015,Norway,4,7.522,1.45900,1.33095,0.88521,0.66973,0.36503,0.34699
4,2015,Canada,5,7.427,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811
...,...,...,...,...,...,...,...,...,...,...
777,2019,Rwanda,152,3.334,0.35900,0.71100,0.61400,0.55500,0.41100,0.21700
778,2019,Tanzania,153,3.231,0.47600,0.88500,0.49900,0.41700,0.14700,0.27600
779,2019,Afghanistan,154,3.203,0.35000,0.51700,0.36100,0.00000,0.02500,0.15800
780,2019,Central African Republic,155,3.083,0.02600,0.00000,0.10500,0.22500,0.03500,0.23500


We can now merge the full DataFrame with the region labels, which were not included in every year's data. This is an overall easier process than adding the region column to every DataFrame individually.

In [156]:
full_data = pd.merge(full_data,country_region_df)
full_data

Unnamed: 0,year,country,rank,score,economy,family,health,freedom,trust,generosity,region
0,2015,Switzerland,1,7.587,1.396510,1.349510,0.941430,0.665570,0.419780,0.296780,Western Europe
1,2016,Switzerland,2,7.509,1.527330,1.145240,0.863030,0.585570,0.412030,0.280830,Western Europe
2,2017,Switzerland,4,7.494,1.564980,1.516912,0.858131,0.620071,0.367007,0.290549,Western Europe
3,2018,Switzerland,5,7.487,1.420000,1.549000,0.927000,0.660000,0.357000,0.256000,Western Europe
4,2019,Switzerland,6,7.480,1.452000,1.526000,1.052000,0.572000,0.343000,0.263000,Western Europe
...,...,...,...,...,...,...,...,...,...,...,...
752,2015,Togo,158,2.839,0.208680,0.139950,0.284430,0.364530,0.107310,0.166810,Sub-Saharan Africa
753,2016,Togo,155,3.303,0.281230,0.000000,0.248110,0.346780,0.115870,0.175170,Sub-Saharan Africa
754,2017,Togo,150,3.495,0.305445,0.431883,0.247106,0.380426,0.095665,0.196896,Sub-Saharan Africa
755,2018,Togo,139,3.999,0.259000,0.474000,0.253000,0.434000,0.101000,0.158000,Sub-Saharan Africa


### Reindexing with heirarchical index

We can now perform the final step of creating our full DataFrame, which is to create a heirarchical index by region, then country, then year. This will allow for ease of analysis across the multiple variables.

In [157]:
full_data.set_index(['region','country', 'year'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,rank,score,economy,family,health,freedom,trust,generosity
region,country,year,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Western Europe,Switzerland,2015,1,7.587,1.396510,1.349510,0.941430,0.665570,0.419780,0.296780
Western Europe,Switzerland,2016,2,7.509,1.527330,1.145240,0.863030,0.585570,0.412030,0.280830
Western Europe,Switzerland,2017,4,7.494,1.564980,1.516912,0.858131,0.620071,0.367007,0.290549
Western Europe,Switzerland,2018,5,7.487,1.420000,1.549000,0.927000,0.660000,0.357000,0.256000
Western Europe,Switzerland,2019,6,7.480,1.452000,1.526000,1.052000,0.572000,0.343000,0.263000
...,...,...,...,...,...,...,...,...,...,...
Sub-Saharan Africa,Togo,2015,158,2.839,0.208680,0.139950,0.284430,0.364530,0.107310,0.166810
Sub-Saharan Africa,Togo,2016,155,3.303,0.281230,0.000000,0.248110,0.346780,0.115870,0.175170
Sub-Saharan Africa,Togo,2017,150,3.495,0.305445,0.431883,0.247106,0.380426,0.095665,0.196896
Sub-Saharan Africa,Togo,2018,139,3.999,0.259000,0.474000,0.253000,0.434000,0.101000,0.158000
