<h1><center> Happiness Within Countries </center></h1>

<h4><center> Iris Truong, Jason Zhong, Yun Qi <h4><center>

<h2><center>Introduction</center></h2>

Humans are complex beings and hard to gauge how things are going. As so, happiness is also a hard aspect to gauge with data. The welfare of humanity is an important aspect that we should care about. This is especially important to world organizations like the WHO and the UN. The UN even has a <a href= https://www.un.org/en/observances/happiness-day>day</a> dedicated celebrating the happiness in the culture of people around the world. There have been independent researches done by the <a href= https://worldhappiness.report/archive/>World Happiness Report</a> on how happy people are in the world. In the <a href= https://worldhappiness.report/ed/2020/>2020</a>  report, the WHR discusses in detail what categories they used to group the variables. It discusses how the environment affects the happiness of the population, especially different social environments like connections and institutions in its country. The report then continues to discuss the differences in happiness in urban areas compared to other places. The report is an interesting analysis of what determines happiness for people around the world and bring it to an analytic point of view.

In this tutorial, we will be using this <a href= https://www.kaggle.com/mathurinache/world-happiness-report>dataset</a> from Kaggle which was gathered from the WHR. Our goal is to tidy up the data given to us and provide insight what the data tells us. We would like to see if there is a formula that would tell us how effective a each variable is in affect the happiness of people. In addition, using the data we could see any continuity of how the world changes overtime as people may value certain factors over the years. There could be other groupings we could look at like different regions might value happiness or whether first and third world countries have different values. Finally, with the given results we could compare the results with other analysis on happiness and see if there any difference between the analysis. We hope to show the progression of humanity as time goes on, the measure of happiness would shift to something else as more technology becomes avaliable and different philosophy rises. Hopefully we can teach people about how different countries sees happiness.

<h2><center>Preparing the Data</center></h2>

Using Python3 we will import some libraries to help with the data munging, analysis, and visual representation. The libraries imported are <a href= https://pandas.pydata.org/>pandas</a> as pd for orginizing the data, <a href= https://matplotlib.org/contents.html>matplotlib.pyplot</a> as plt for visualizations, <a href= https://scikit-learn.org/stable/>sklearn</a> for training the linear regression model, and <a href= https://numpy.org/doc/stable/user/>numpy</a> as np for math operation.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Before the data can be used to we have to import it into the notebook and clean up the data so then it can be managable, easier to represent, and help withe the analysis. So we downloaded the files from the website and put it into the github repository in a so then we can read from them. Then using pandas csv reader we were able to read the files and put them into pandas dataframes.

In [2]:
data2015 = pd.read_csv("Data/2015")
data2016 = pd.read_csv("Data/2016")
data2017 = pd.read_csv("Data/2017")
data2018 = pd.read_csv("Data/2018")
data2019 = pd.read_csv("Data/2019")
data2020 = pd.read_csv("Data/2020")

The data collected by the WHR differed from year to year as they gathered information that they thought was important to the research they were doing and some columns were renamed. For example in the 2015 data, there is a feature called family while in 2020 family was replaced with social support. The columns of each dataset are named differently but represents the same informations. For example "Freedom" for one year could be "Explained by: Freedom to make life choices" in another year. So these columns would have to be renamed so it would be easier to merge the datasets into one big set.

In [3]:
data2020['Happiness Rank'] =  range(1, len(data2020.index)+1)
data2020['Year'] = "2020"

In [4]:
# Drop the extraneous generosity so that concat can work later on in the code
data2020 = data2020.drop(['Generosity'], axis = 1)

data2020 = data2020.rename(columns = {'Country name' : 'Country', 'Regional indicator': 'Region', 'Ladder score' : 'Happiness Score', 
                        'Explained by: Log GDP per capita' : 'Economy (GDP per Capita)', 'Explained by: Social support' : 'Family', 
                                    'Explained by: Healthy life expectancy' : 'Health (Life Expectancy)',
                        'Explained by: Freedom to make life choices' : 'Freedom', 
                                    'Explained by: Perceptions of corruption' : 'Trust (Government Corruption)',
                                      'Explained by: Generosity' : 'Generosity',
                                     'Standard error of ladder score' : 'Standard Error', 'Regional indicator' : 'Region'})

In [5]:
#Just renaming some regions in 2020 as they changed some regions around. Commonwealth of Independent States are all
#Central and Eastern European countries. The Asian regions the report decided to use the noun versions of the
#cardinal directions rather the adjectives.
data2020['Region'] = data2020['Region'].replace(['Commonwealth of Independent States'],'Central and Eastern Europe')
data2020['Region'] = data2020['Region'].replace(['South Asia'],'Southern Asia')
data2020['Region'] = data2020['Region'].replace(['Southeast Asia'],'Southeastern Asia')
data2020['Region'] = data2020['Region'].replace(['East Asia'],'Southeastern Asia')

In [6]:
#Split North America regions from Australia and New Zealand
for i, row in data2020.loc[data2020['Region'] == 'North America and ANZ'].iterrows():
    if row['Country'] == 'United States' or row['Country'] == 'Canada':
        data2020.at[i,'Region'] = 'North America'
    else:
        data2020.at[i,'Region'] = 'Australia and New Zealand'

In [7]:
data2019['Region'] = ''
data2019 = data2019.rename(columns = {'Overall rank' : 'Happiness Rank', 'Country or region' : 'Country', 
                                      'Score' : 'Happiness Score', 'GDP per capita' : 'Economy (GDP per Capita)',
                                      'Social support': 'Family', 'Healthy life expectancy' : 'Health (Life Expectancy)',
                                      'Freedom to make life choices':'Freedom','Perceptions of corruption' : 'Trust (Government Corruption)'})
#Matches the countries with their repective regions
data2019 = pd.merge(data2015,data2019, how = 'left',left_on=['Region','Happiness Rank','Country','Happiness Score','Economy (GDP per Capita)',
                                                             'Family','Health (Life Expectancy)','Freedom','Trust (Government Corruption)','Generosity'], 
                    right_on = ['Region','Happiness Rank','Country','Happiness Score','Economy (GDP per Capita)', 
                                'Family','Health (Life Expectancy)','Freedom','Trust (Government Corruption)', 'Generosity'])
data2019['Year'] = "2019"

In [8]:
data2018['Region'] = ''
data2018 = data2018.rename(columns = {'Happiness.Rank' : 'Happiness Rank', 'Country or region':'Country', 'Overall rank':'Happiness Rank',
                                      'Score' : 'Happiness Score', 'GDP per capita' : 'Economy (GDP per Capita)',
                                      'Social support': 'Family', 'Healthy life expectancy' : 'Health (Life Expectancy)',
                                      'Freedom to make life choices':'Freedom','Perceptions of corruption' : 'Trust (Government Corruption)'})
data2018 = pd.merge(data2015,data2018, how = 'left',left_on=['Region','Happiness Rank','Country','Happiness Score','Economy (GDP per Capita)',
                                                             'Family','Health (Life Expectancy)','Freedom','Trust (Government Corruption)','Generosity'], 
                    right_on = ['Region','Happiness Rank','Country','Happiness Score','Economy (GDP per Capita)',
                                'Family','Health (Life Expectancy)','Freedom','Trust (Government Corruption)','Generosity'])
data2018['Year'] = "2018"
data2018

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,Year
0,Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738,2018
1,Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.43630,2.70201,2018
2,Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204,2018
3,Norway,Western Europe,4,7.522,0.03880,1.45900,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531,2018
4,Canada,North America,5,7.427,0.03553,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176,2018
...,...,...,...,...,...,...,...,...,...,...,...,...,...
153,Rwanda,Sub-Saharan Africa,154,3.465,0.03464,0.22208,0.77370,0.42864,0.59201,0.55191,0.22628,0.67042,2018
154,Benin,Sub-Saharan Africa,155,3.340,0.03656,0.28665,0.35386,0.31910,0.48450,0.08010,0.18260,1.63328,2018
155,Syria,Middle East and Northern Africa,156,3.006,0.05015,0.66320,0.47489,0.72193,0.15684,0.18906,0.47179,0.32858,2018
156,Burundi,Sub-Saharan Africa,157,2.905,0.08658,0.01530,0.41587,0.22396,0.11850,0.10062,0.19727,1.83302,2018


In [9]:
data2017['Region'] = ''
data2017 = data2017.rename(columns = {'Happiness.Rank' : 'Happiness Rank', 'Country or region' : 'Country', 
                                      'Happiness.Score' : 'Happiness Score', 'Economy..GDP.per.Capita.' : 'Economy (GDP per Capita)',
                                      'Health..Life.Expectancy.' : 'Health (Life Expectancy)',
                                      'Freedom to make life choices':'Freedom','Trust..Government.Corruption.' : 'Trust (Government Corruption)'})
data2017 = pd.merge(data2015,data2017, how = 'left',left_on=['Region','Happiness Rank','Country','Happiness Score','Economy (GDP per Capita)',
                                                             'Family','Health (Life Expectancy)','Freedom','Trust (Government Corruption)','Generosity'], 
                    right_on = ['Region','Happiness Rank','Country','Happiness Score','Economy (GDP per Capita)',
                                'Family','Health (Life Expectancy)','Freedom','Trust (Government Corruption)','Generosity'])
data2017['Year'] = "2017"

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,Whisker.high,Whisker.low,Dystopia.Residual,Year
0,Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738,,,,2017
1,Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.43630,2.70201,,,,2017
2,Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204,,,,2017
3,Norway,Western Europe,4,7.522,0.03880,1.45900,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531,,,,2017
4,Canada,North America,5,7.427,0.03553,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176,,,,2017
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
153,Rwanda,Sub-Saharan Africa,154,3.465,0.03464,0.22208,0.77370,0.42864,0.59201,0.55191,0.22628,0.67042,,,,2017
154,Benin,Sub-Saharan Africa,155,3.340,0.03656,0.28665,0.35386,0.31910,0.48450,0.08010,0.18260,1.63328,,,,2017
155,Syria,Middle East and Northern Africa,156,3.006,0.05015,0.66320,0.47489,0.72193,0.15684,0.18906,0.47179,0.32858,,,,2017
156,Burundi,Sub-Saharan Africa,157,2.905,0.08658,0.01530,0.41587,0.22396,0.11850,0.10062,0.19727,1.83302,,,,2017


In [10]:
data2016['Year'] = "2016"
data2015['Year'] = "2015"

In [11]:
dataAll = pd.concat([data2020,data2019,data2018,data2017,data2016,data2015])
dataAll=dataAll[['Country','Region', 'Happiness Rank', 'Happiness Score', 'Economy (GDP per Capita)',
                                   'Family', 'Health (Life Expectancy)', 'Freedom','Trust (Government Corruption)',
                                   'Generosity', 'Year',]]
dataAll

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Year
0,Finland,Western Europe,1,7.8087,1.285190,1.499526,0.961271,0.662317,0.477857,0.159670,2020
1,Denmark,Western Europe,2,7.6456,1.326949,1.503449,0.979333,0.665040,0.495260,0.242793,2020
2,Switzerland,Western Europe,3,7.5599,1.390774,1.472403,1.040533,0.628954,0.407946,0.269056,2020
3,Iceland,Western Europe,4,7.5045,1.326502,1.547567,1.000843,0.661981,0.144541,0.362330,2020
4,Norway,Western Europe,5,7.4880,1.424207,1.495173,1.008072,0.670201,0.434101,0.287985,2020
...,...,...,...,...,...,...,...,...,...,...,...
153,Rwanda,Sub-Saharan Africa,154,3.4650,0.222080,0.773700,0.428640,0.592010,0.551910,0.226280,2015
154,Benin,Sub-Saharan Africa,155,3.3400,0.286650,0.353860,0.319100,0.484500,0.080100,0.182600,2015
155,Syria,Middle East and Northern Africa,156,3.0060,0.663200,0.474890,0.721930,0.156840,0.189060,0.471790,2015
156,Burundi,Sub-Saharan Africa,157,2.9050,0.015300,0.415870,0.223960,0.118500,0.100620,0.197270,2015


For the meaing of each column/variable refer to the index <a href= https://happiness-report.s3.amazonaws.com/2020/WHR20_Ch2_Statistical_Appendix.pdf>here</a>. The first two pages explain what the report asked in their survey and how it converted the information to numbers.