# World Happiness Report Exploratory Data Analysis
----------------------------------------------------------------
- Course : UCS548, Foundations of Data Science
- Name : Jasween Kaur Brar
- Roll No : 102017187
- Sub-Group : 3CS8
----------------------------------------------------------------

- The World Happiness Report is released by the UN Sustainable Development Solutions Network every year. 
- It is a landmark survey of the state of global happiness that ranks countries by how happy their citizens perceive themselves to be.
- The rankings are based on several factors such as GDP per capita, social support, healthy life expectancy, the freedom to make life choices, generosity and trust in government.

In [1]:
# Importing Libraries
%matplotlib inline
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import sklearn
import pandas as pd
import geopandas as gpd
import plotly
import plotly.express as px

- Source of data : UN Sustainable Development Solutions Network (https://worldhappiness.report/)

In [2]:
# Read world happiness dataset files
df_2015 = pd.read_csv("2015.csv")
df_2016 = pd.read_csv("2016.csv")
df_2017 = pd.read_csv("2017.csv")
df_2018 = pd.read_csv("2018.csv")
df_2019 = pd.read_csv("2019.csv")
df_2020 = pd.read_csv("2020.csv")
df_2021 = pd.read_csv("2021.csv")
df_2022 = pd.read_csv("2022.csv")

### Defining the columns

- Happiness_Rank - Rank 1 means the happiest nation
- year- from 8 years (2015 to 2022) 
- Happiness_Score - means if you have greater score you can have better possibilities in life
- GDP_per_Capita - means Gross Domestic Product (GDP) per capita
- Social_support - means that person has a family, friends and other people, who can help them in difficult life situation
- Life_Expectancy - years of healthy life which newborn can expect to have
- Freedom - means that people can choose job, family, friends, goverment and place to live according to their choice
- Trust_in_Government - means degree of corruption in the country and how much people trust in government
- Generosity - means the quality of kindness and generous

### Data transormation & data preprocessing
Steps :

In [3]:
# A quick view of the dataset helps us understand the inconsistencies in the dataset.
# We are going to be tackle these and make our dataset consistent:

# 1) 2020 & 2021 datsets do not have columns representing "happiness rank" and "happiness score". 
#    Happiness Score is called "ladder score". 
#    Adding the Happiness Rank by sorting the Ladder Score column and getting the highest score as number one rank.

adding_index1 = range(1,len(df_2020)+1)
df_2020.sort_values('Ladder score')
df_2020['Happiness_Rank'] = adding_index1

adding_index2 = range(1,len(df_2021)+1)
df_2021.sort_values('Ladder score')
df_2021['Happiness_Rank'] = adding_index2



In [4]:
# 2) Some datasets have "Region" column mapping each country to a specific region.
#    We want to add this column across all years. 
#    Here we will store this mapping from 2015 data and later merge it with our consolidated dataset.

regions_df = df_2015[['Country','Region']]

In [5]:
# 3) we have inconsistent column names across the different years.
#    So, let us make them consistent.

# 2015 world happiness report data
df_2015 = df_2015[['Country','Happiness Rank','Happiness Score','Economy (GDP per Capita)','Family', 'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)','Generosity']]
df_2015 = df_2015.rename(columns = {'Economy (GDP per Capita)': 'GDP_per_Capita','Family': 'Social_support','Health (Life Expectancy)':'Life_Expectancy','Trust (Government Corruption)': 'Trust_in_Government','Happiness Rank':'Happiness_Rank','Happiness Score':'Happiness_Score'})
df_2015[['year']]= '2015'

df_2015.head()

Unnamed: 0,Country,Happiness_Rank,Happiness_Score,GDP_per_Capita,Social_support,Life_Expectancy,Freedom,Trust_in_Government,Generosity,year
0,Switzerland,1,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2015
1,Iceland,2,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2015
2,Denmark,3,7.527,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2015
3,Norway,4,7.522,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2015
4,Canada,5,7.427,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2015


In [6]:
#  2016 world happiness report data
df_2016 = df_2016[['Country','Happiness Rank','Happiness Score','Economy (GDP per Capita)','Family', 'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)','Generosity']]
df_2016 = df_2016.rename(columns = {'Economy (GDP per Capita)': 'GDP_per_Capita','Family': 'Social_support','Health (Life Expectancy)':'Life_Expectancy','Trust (Government Corruption)': 'Trust_in_Government','Happiness Rank':'Happiness_Rank','Happiness Score':'Happiness_Score'})
df_2016[['year']]= '2016'
df_2016.head()

Unnamed: 0,Country,Happiness_Rank,Happiness_Score,GDP_per_Capita,Social_support,Life_Expectancy,Freedom,Trust_in_Government,Generosity,year
0,Denmark,1,7.526,1.44178,1.16374,0.79504,0.57941,0.44453,0.36171,2016
1,Switzerland,2,7.509,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083,2016
2,Iceland,3,7.501,1.42666,1.18326,0.86733,0.56624,0.14975,0.47678,2016
3,Norway,4,7.498,1.57744,1.1269,0.79579,0.59609,0.35776,0.37895,2016
4,Finland,5,7.413,1.40598,1.13464,0.81091,0.57104,0.41004,0.25492,2016


In [7]:
#  2017 world happiness report data
df_2017 = df_2017[['Country','Happiness.Rank','Happiness.Score','Economy..GDP.per.Capita.','Family', 'Health..Life.Expectancy.', 'Freedom', 'Trust..Government.Corruption.','Generosity']]
df_2017 = df_2017.rename(columns = {'Economy..GDP.per.Capita.': 'GDP_per_Capita','Family': 'Social_support','Health..Life.Expectancy.':'Life_Expectancy','Trust..Government.Corruption.': 'Trust_in_Government','Happiness.Rank':'Happiness_Rank','Happiness.Score':'Happiness_Score'})
df_2017[['year']]= '2017'

df_2017.head()

Unnamed: 0,Country,Happiness_Rank,Happiness_Score,GDP_per_Capita,Social_support,Life_Expectancy,Freedom,Trust_in_Government,Generosity,year
0,Norway,1,7.537,1.616463,1.533524,0.796667,0.635423,0.315964,0.362012,2017
1,Denmark,2,7.522,1.482383,1.551122,0.792566,0.626007,0.40077,0.35528,2017
2,Iceland,3,7.504,1.480633,1.610574,0.833552,0.627163,0.153527,0.47554,2017
3,Switzerland,4,7.494,1.56498,1.516912,0.858131,0.620071,0.367007,0.290549,2017
4,Finland,5,7.469,1.443572,1.540247,0.809158,0.617951,0.382612,0.245483,2017


In [8]:
#  2018 world happiness report data
df_2018 = df_2018[['Country or region','Overall rank','Score','GDP per capita','Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Perceptions of corruption','Generosity']]
df_2018 = df_2018.rename(columns = {'GDP per capita': 'GDP_per_Capita','Social support': 'Social_support','Healthy life expectancy':'Life_Expectancy','Perceptions of corruption': 'Trust_in_Government','Overall rank':'Happiness_Rank','Score':'Happiness_Score','Freedom to make life choices':'Freedom','Country or region':'Country'})
df_2018[['year']]= '2018'

df_2018.head()

Unnamed: 0,Country,Happiness_Rank,Happiness_Score,GDP_per_Capita,Social_support,Life_Expectancy,Freedom,Trust_in_Government,Generosity,year
0,Finland,1,7.632,1.305,1.592,0.874,0.681,0.393,0.202,2018
1,Norway,2,7.594,1.456,1.582,0.861,0.686,0.34,0.286,2018
2,Denmark,3,7.555,1.351,1.59,0.868,0.683,0.408,0.284,2018
3,Iceland,4,7.495,1.343,1.644,0.914,0.677,0.138,0.353,2018
4,Switzerland,5,7.487,1.42,1.549,0.927,0.66,0.357,0.256,2018


In [9]:
#  2019 world happiness report data
df_2019 = df_2019[['Country or region','Overall rank','Score','GDP per capita','Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Perceptions of corruption','Generosity']]
df_2019 = df_2019.rename(columns = {'GDP per capita': 'GDP_per_Capita','Social support': 'Social_support','Healthy life expectancy':'Life_Expectancy','Perceptions of corruption': 'Trust_in_Government','Overall rank':'Happiness_Rank','Score':'Happiness_Score','Freedom to make life choices':'Freedom','Country or region':'Country'})
df_2019[['year']]= '2019'

df_2019.head()

Unnamed: 0,Country,Happiness_Rank,Happiness_Score,GDP_per_Capita,Social_support,Life_Expectancy,Freedom,Trust_in_Government,Generosity,year
0,Finland,1,7.769,1.34,1.587,0.986,0.596,0.393,0.153,2019
1,Denmark,2,7.6,1.383,1.573,0.996,0.592,0.41,0.252,2019
2,Norway,3,7.554,1.488,1.582,1.028,0.603,0.341,0.271,2019
3,Iceland,4,7.494,1.38,1.624,1.026,0.591,0.118,0.354,2019
4,Netherlands,5,7.488,1.396,1.522,0.999,0.557,0.298,0.322,2019


In [10]:
#  2020 world happiness report data
df_2020 = df_2020[['Country name','Happiness_Rank','Ladder score','Explained by: Log GDP per capita','Explained by: Social support', 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices', 'Explained by: Perceptions of corruption','Explained by: Generosity']]
df_2020 = df_2020.rename(columns = {'Explained by: Log GDP per capita': 'GDP_per_Capita','Explained by: Social support': 'Social_support','Explained by: Healthy life expectancy':'Life_Expectancy','Explained by: Perceptions of corruption': 'Trust_in_Government','Ladder score':'Happiness_Score','Explained by: Freedom to make life choices':'Freedom','Country name':'Country','Explained by: Generosity': 'Generosity'})
df_2020[['year']]= '2020'

df_2020.head()

Unnamed: 0,Country,Happiness_Rank,Happiness_Score,GDP_per_Capita,Social_support,Life_Expectancy,Freedom,Trust_in_Government,Generosity,year
0,Finland,1,7.8087,1.28519,1.499526,0.961271,0.662317,0.477857,0.15967,2020
1,Denmark,2,7.6456,1.326949,1.503449,0.979333,0.66504,0.49526,0.242793,2020
2,Switzerland,3,7.5599,1.390774,1.472403,1.040533,0.628954,0.407946,0.269056,2020
3,Iceland,4,7.5045,1.326502,1.547567,1.000843,0.661981,0.144541,0.36233,2020
4,Norway,5,7.488,1.424207,1.495173,1.008072,0.670201,0.434101,0.287985,2020


In [11]:
#  2021 world happiness report data
df_2021 = df_2021[['Country name','Happiness_Rank','Ladder score','Explained by: Log GDP per capita','Explained by: Social support', 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices', 'Explained by: Perceptions of corruption','Explained by: Generosity']]
df_2021 = df_2021.rename(columns = {'Explained by: Log GDP per capita': 'GDP_per_Capita','Explained by: Social support': 'Social_support','Explained by: Healthy life expectancy':'Life_Expectancy','Explained by: Perceptions of corruption': 'Trust_in_Government','Ladder score':'Happiness_Score','Explained by: Freedom to make life choices':'Freedom','Country name':'Country','Explained by: Generosity': 'Generosity'})
df_2021[['year']]= '2021'

df_2021.head()

Unnamed: 0,Country,Happiness_Rank,Happiness_Score,GDP_per_Capita,Social_support,Life_Expectancy,Freedom,Trust_in_Government,Generosity,year
0,Finland,1,7.842,1.446,1.106,0.741,0.691,0.481,0.124,2021
1,Denmark,2,7.62,1.502,1.108,0.763,0.686,0.485,0.208,2021
2,Switzerland,3,7.571,1.566,1.079,0.816,0.653,0.413,0.204,2021
3,Iceland,4,7.554,1.482,1.172,0.772,0.698,0.17,0.293,2021
4,Netherlands,5,7.464,1.501,1.079,0.753,0.647,0.384,0.302,2021


In [12]:
#  2022 world happiness report data
df_2022 = df_2022[['Country','RANK','Happiness score','Explained by: GDP per capita','Explained by: Social support', 'Explained by: Healthy life expectancy', 'Explained by: Freedom to make life choices', 'Explained by: Perceptions of corruption','Explained by: Generosity']]
df_2022 = df_2022.rename(columns = {'Explained by: GDP per capita': 'GDP_per_Capita','Explained by: Social support': 'Social_support','Explained by: Healthy life expectancy':'Life_Expectancy','Explained by: Perceptions of corruption': 'Trust_in_Government','Happiness score':'Happiness_Score','Explained by: Freedom to make life choices':'Freedom','Country name':'Country','Explained by: Generosity': 'Generosity','RANK':'Happiness_Rank'})
df_2022[['year']]= '2022'

df_2022.head()

Unnamed: 0,Country,Happiness_Rank,Happiness_Score,GDP_per_Capita,Social_support,Life_Expectancy,Freedom,Trust_in_Government,Generosity,year
0,Finland,1,7.821,1.892,1.258,0.775,0.736,0.534,0.109,2022
1,Denmark,2,7.636,1.953,1.243,0.777,0.719,0.532,0.188,2022
2,Iceland,3,7.557,1.936,1.32,0.803,0.718,0.191,0.27,2022
3,Switzerland,4,7.512,2.026,1.226,0.822,0.677,0.461,0.147,2022
4,Netherlands,5,7.415,1.945,1.206,0.787,0.651,0.419,0.271,2022


In [13]:
# 4) Number of countries in report for each year is different. 
#    We want to create a unified set of countries for which we have data across 8 years to perform analysis. 
#    Hence we would not perform analysis on countries with missing/incomplete information.

df_2015 = df_2015.set_index(['Country','year'])
df_2016 = df_2016.set_index(['Country','year'])
df_2017 = df_2017.set_index(['Country','year'])
df_2018 = df_2018.set_index(['Country','year'])
df_2019 = df_2019.set_index(['Country','year'])
df_2020 = df_2020.set_index(['Country','year'])
df_2021 = df_2021.set_index(['Country','year'])
df_2022 = df_2022.set_index(['Country','year'])

## Let us filter out the countries for which we have data available for all the years
tmp = pd.merge(df_2015.reset_index()[['Country','year']], df_2016.reset_index()[['Country','year']], on='Country', how='inner')
tmp = pd.merge(tmp, df_2017.reset_index()[['Country','year']], on='Country', how='inner')
tmp = pd.merge(tmp, df_2018.reset_index()[['Country','year']], on='Country', how='inner')
tmp = pd.merge(tmp, df_2019.reset_index()[['Country','year']], on='Country', how='inner')
tmp = pd.merge(tmp, df_2020.reset_index()[['Country','year']], on='Country', how='inner')
tmp = pd.merge(tmp, df_2021.reset_index()[['Country','year']], on='Country', how='inner')
tmp = pd.merge(tmp, df_2022.reset_index()[['Country','year']], on='Country', how='inner')
country_list = tmp.Country.unique().tolist()

  tmp = pd.merge(tmp, df_2018.reset_index()[['Country','year']], on='Country', how='inner')
  tmp = pd.merge(tmp, df_2020.reset_index()[['Country','year']], on='Country', how='inner')
  tmp = pd.merge(tmp, df_2022.reset_index()[['Country','year']], on='Country', how='inner')


In [14]:
happiness_index_df_final = df_2015.append(df_2016)
happiness_index_df_final = happiness_index_df_final.append(df_2017)
happiness_index_df_final = happiness_index_df_final.append(df_2018)
happiness_index_df_final = happiness_index_df_final.append(df_2019)
happiness_index_df_final = happiness_index_df_final.append(df_2020)
happiness_index_df_final = happiness_index_df_final.append(df_2021)
happiness_index_df_final = happiness_index_df_final.append(df_2022)

  happiness_index_df_final = df_2015.append(df_2016)
  happiness_index_df_final = happiness_index_df_final.append(df_2017)
  happiness_index_df_final = happiness_index_df_final.append(df_2018)
  happiness_index_df_final = happiness_index_df_final.append(df_2019)
  happiness_index_df_final = happiness_index_df_final.append(df_2020)
  happiness_index_df_final = happiness_index_df_final.append(df_2021)
  happiness_index_df_final = happiness_index_df_final.append(df_2022)


In [15]:
happiness_index_df_final = happiness_index_df_final.reset_index()
happiness_index_df_final = happiness_index_df_final.merge(regions_df,left_on='Country',right_on='Country')
happiness_index_df_final.head(10)

Unnamed: 0,Country,year,Happiness_Rank,Happiness_Score,GDP_per_Capita,Social_support,Life_Expectancy,Freedom,Trust_in_Government,Generosity,Region
0,Switzerland,2015,1,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,Western Europe
1,Switzerland,2016,2,7.509,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083,Western Europe
2,Switzerland,2017,4,7.494,1.56498,1.516912,0.858131,0.620071,0.367007,0.290549,Western Europe
3,Switzerland,2018,5,7.487,1.42,1.549,0.927,0.66,0.357,0.256,Western Europe
4,Switzerland,2019,6,7.48,1.452,1.526,1.052,0.572,0.343,0.263,Western Europe
5,Switzerland,2020,3,7.5599,1.390774,1.472403,1.040533,0.628954,0.407946,0.269056,Western Europe
6,Switzerland,2021,3,7.571,1.566,1.079,0.816,0.653,0.413,0.204,Western Europe
7,Switzerland,2022,4,7.512,2.026,1.226,0.822,0.677,0.461,0.147,Western Europe
8,Iceland,2015,2,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,Western Europe
9,Iceland,2016,3,7.501,1.42666,1.18326,0.86733,0.56624,0.14975,0.47678,Western Europe


In [16]:
happiness_index_df_final = happiness_index_df_final[happiness_index_df_final.Country.isin(country_list)==True]

In [17]:
#  There are null values in the "Trust_in_Government" column specifically for UAE. 
#  We will impute the missing values for this country with average "Trust_in_Government" value for UAE calculated using available data.
tempp = happiness_index_df_final[(happiness_index_df_final['Country']=='United Arab Emirates') & (happiness_index_df_final['Trust_in_Government'].isna()==False)]
happiness_index_df_final.loc[(happiness_index_df_final['Trust_in_Government'].isna()==True),'Trust_in_Government'] = tempp['Trust_in_Government'].mean()

In [18]:
# Let us now print the head of our compiled dataset to make sure all our preprocessing was successful.

happiness_index_df_final.head()

Unnamed: 0,Country,year,Happiness_Rank,Happiness_Score,GDP_per_Capita,Social_support,Life_Expectancy,Freedom,Trust_in_Government,Generosity,Region
0,Switzerland,2015,1,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,Western Europe
1,Switzerland,2016,2,7.509,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083,Western Europe
2,Switzerland,2017,4,7.494,1.56498,1.516912,0.858131,0.620071,0.367007,0.290549,Western Europe
3,Switzerland,2018,5,7.487,1.42,1.549,0.927,0.66,0.357,0.256,Western Europe
4,Switzerland,2019,6,7.48,1.452,1.526,1.052,0.572,0.343,0.263,Western Europe


In [19]:
## Calculate mean feature scores by country across the 7 years

mean_index_df = happiness_index_df_final.groupby(['Country'])[['Happiness_Score', 'GDP_per_Capita','Social_support','Life_Expectancy','Freedom','Trust_in_Government','Generosity']].mean().reset_index()

In [20]:
happiness_index_df_final.to_csv("HappinessIndex.csv")

In [None]:
df_2015.to_csv("2015_new.csv")
df_2016.to_csv("2016_new.csv")
df_2017.to_csv("2017_new.csv")
df_2018.to_csv("2018_new.csv")
df_2019.to_csv("2019_new.csv")
df_2020.to_csv("2020_new.csv")
df_2021.to_csv("2021_new.csv")
df_2022.to_csv("2022_new.csv")