   # hapPyness: Project One 
 Team Boa 
       (Aashita, 
       Alannah,
       Ben,
       and Brian)

Charles Schulz once said that “happiness is a warm puppy”. According to Walt Disney, “happiness is a state of mind” and author, Tom Bodett, offered that "a person needs just three things to be truly happy in this world: Someone to love, something to do, and something to hope for.” For many, happiness is hard to quantify. However, in 2012 the first World Happiness Report was published based on Gallup poll data. This report uses six factors to measure happiness: economic production, social support, life expectancy, freedom, absence of corruption, and generosity. (Tom may have been on to something.)

Our team will be analyzing this data to determine correlations between metrics (such as the relationship between GDP per capita and the happiness index, and life expectancy vs freedom) and across years. We will also focus on the position of the United States on the list, seeing just how we measure up to the happiest and uphappiest nations to start, then going on to further analyze the metrics to see what story they tell. It will also be interesting to see how 2020 measures up to years prior, considering the Covid-19 pandemic, widespread social unrest, and the general consensus that it was a terrible year.

In [1]:
# Import dependencies
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import linregress

In [2]:
# Store csv file paths in variables
path_2016 = "../Resources/clean_data_2016.csv"
path_2017 = "../Resources/clean_data_2017.csv"
path_2018 = "../Resources/clean_data_2018.csv"
path_2019 = "../Resources/clean_data_2019.csv"
path_2020 = "../Resources/clean_data_2020.csv"

In [3]:
clean_data_2016_df = pd.read_csv(path_2016)
clean_data_2017_df = pd.read_csv(path_2017)
clean_data_2018_df = pd.read_csv(path_2018)
clean_data_2019_df = pd.read_csv(path_2019)
clean_data_2020_df = pd.read_csv(path_2020)

In [4]:
frames = [clean_data_2016_df, clean_data_2017_df, clean_data_2018_df, clean_data_2019_df, clean_data_2020_df]

all_years_df = pd.concat(frames)

In [5]:
all_years_df["Country"].value_counts().tail(50)

Bulgaria                     5
Spain                        5
Colombia                     5
Cameroon                     5
Ireland                      5
Azerbaijan                   5
Poland                       5
Jordan                       5
Romania                      5
Niger                        5
China                        5
Switzerland                  5
Zambia                       5
Mali                         5
Bosnia and Herzegovina       5
Gabon                        5
Mexico                       5
France                       5
Argentina                    5
Singapore                    5
Mauritius                    5
Laos                         4
Macedonia                    4
Syria                        4
Central African Republic     4
Qatar                        4
Bhutan                       4
Lesotho                      4
Somalia                      4
Mozambique                   4
Sudan                        3
Hong Kong                    3
Belize  

In [6]:
all_years_df["Country"] = all_years_df["Country"].str.replace("Trinidad & Tobago", "Trinidad and Tobago")

In [7]:
all_years_df

Unnamed: 0.1,Unnamed: 0,Country,Happiness Score,GDP per Capita,Social Support,Health/Life Expectancy,Freedom to Make Choices,Generosity,Perceptions of Gov’t Corruption,Year
0,0,Denmark,7.5260,1.441780,1.163740,0.795040,0.579410,0.361710,0.444530,2016
1,1,Switzerland,7.5090,1.527330,1.145240,0.863030,0.585570,0.280830,0.412030,2016
2,2,Iceland,7.5010,1.426660,1.183260,0.867330,0.566240,0.476780,0.149750,2016
3,3,Norway,7.4980,1.577440,1.126900,0.795790,0.596090,0.378950,0.357760,2016
4,4,Finland,7.4130,1.405980,1.134640,0.810910,0.571040,0.254920,0.410040,2016
...,...,...,...,...,...,...,...,...,...,...
148,148,Central African Republic,3.4759,0.041072,0.000000,0.000000,0.292814,0.253513,0.028265,2020
149,149,Rwanda,3.3123,0.343243,0.522876,0.572383,0.604088,0.235705,0.485542,2020
150,150,Zimbabwe,3.2992,0.425564,1.047835,0.375038,0.377405,0.151349,0.080929,2020
151,151,South Sudan,2.8166,0.289083,0.553279,0.208809,0.065609,0.209935,0.111157,2020


In [8]:
all_years_df["Country"].value_counts()

Liberia                      5
Costa Rica                   5
Venezuela                    5
Uruguay                      5
Philippines                  5
                            ..
Puerto Rico                  1
Hong Kong S.A.R., China      1
Somaliland Region            1
Suriname                     1
Hong Kong S.A.R. of China    1
Name: Country, Length: 168, dtype: int64

In [9]:
all_years_df.drop(columns="Unnamed: 0", inplace=True)

In [10]:
all_years_df

Unnamed: 0,Country,Happiness Score,GDP per Capita,Social Support,Health/Life Expectancy,Freedom to Make Choices,Generosity,Perceptions of Gov’t Corruption,Year
0,Denmark,7.5260,1.441780,1.163740,0.795040,0.579410,0.361710,0.444530,2016
1,Switzerland,7.5090,1.527330,1.145240,0.863030,0.585570,0.280830,0.412030,2016
2,Iceland,7.5010,1.426660,1.183260,0.867330,0.566240,0.476780,0.149750,2016
3,Norway,7.4980,1.577440,1.126900,0.795790,0.596090,0.378950,0.357760,2016
4,Finland,7.4130,1.405980,1.134640,0.810910,0.571040,0.254920,0.410040,2016
...,...,...,...,...,...,...,...,...,...
148,Central African Republic,3.4759,0.041072,0.000000,0.000000,0.292814,0.253513,0.028265,2020
149,Rwanda,3.3123,0.343243,0.522876,0.572383,0.604088,0.235705,0.485542,2020
150,Zimbabwe,3.2992,0.425564,1.047835,0.375038,0.377405,0.151349,0.080929,2020
151,South Sudan,2.8166,0.289083,0.553279,0.208809,0.065609,0.209935,0.111157,2020


In [11]:
us_df = all_years_df.loc[all_years_df["Country"] == "United States"]
us_df

Unnamed: 0,Country,Happiness Score,GDP per Capita,Social Support,Health/Life Expectancy,Freedom to Make Choices,Generosity,Perceptions of Gov’t Corruption,Year
12,United States,7.104,1.50796,1.04782,0.779,0.48163,0.41077,0.14868,2016
13,United States,6.993,1.546259,1.419921,0.774287,0.505741,0.392579,0.135639,2017
17,United States,6.886,1.398,1.471,0.819,0.547,0.291,0.133,2018
18,United States,6.892,1.433,1.457,0.874,0.454,0.28,0.128,2019
17,United States,6.9396,1.373987,1.404787,0.831618,0.534608,0.298143,0.152285,2020


In [12]:
counts = all_years_df["Country"].value_counts()
clean_all_years_df = all_years_df[~all_years_df["Country"].isin(counts[counts < 5].index)]

In [13]:
clean_all_years_df

Unnamed: 0,Country,Happiness Score,GDP per Capita,Social Support,Health/Life Expectancy,Freedom to Make Choices,Generosity,Perceptions of Gov’t Corruption,Year
0,Denmark,7.5260,1.441780,1.163740,0.795040,0.579410,0.361710,0.444530,2016
1,Switzerland,7.5090,1.527330,1.145240,0.863030,0.585570,0.280830,0.412030,2016
2,Iceland,7.5010,1.426660,1.183260,0.867330,0.566240,0.476780,0.149750,2016
3,Norway,7.4980,1.577440,1.126900,0.795790,0.596090,0.378950,0.357760,2016
4,Finland,7.4130,1.405980,1.134640,0.810910,0.571040,0.254920,0.410040,2016
...,...,...,...,...,...,...,...,...,...
147,Tanzania,3.4762,0.457163,0.872675,0.442678,0.509343,0.271541,0.203881,2020
149,Rwanda,3.3123,0.343243,0.522876,0.572383,0.604088,0.235705,0.485542,2020
150,Zimbabwe,3.2992,0.425564,1.047835,0.375038,0.377405,0.151349,0.080929,2020
151,South Sudan,2.8166,0.289083,0.553279,0.208809,0.065609,0.209935,0.111157,2020


In [14]:
clean_all_years_df.to_csv("../Resources/clean_data_all_years.csv")