# Happiness: _A Reflection on a Worldly Subjective Subjective System_

Caleb Jensen, Undergrad \
Christian Allen, Undergrad \
Jessyca Allen, Undergrad 

---

## Table of Contents

1. [Introduction](#Introduction)
2. [Data](#Data)  
    2.1 [World Happiness Reports 2015-2022](#World-Happiness-Reports-2015-2022)
3. [Goal](#Goal)
4. [Imports](#Imports)
5. [Preprocessing](#Preprocessing)  
    5.1 [World Happiness Reports 2015-2022](#World-Happiness-Reports)
    

### **Introduction**

**ADD INTRO HERE**

### **Data**

#### _World Happiness Reports 2015-2022_

The World Happiness Report is a global survey conducted in over 150 countries and it is done annually. The report scores each countries happiness based on its citizens' response. The report also contains major contributing factors such as economy, health, and government trust. Each consecutive year adds new features to their dataset. The structure of the data is as follows:

- Country: The country the data is from
- Region: The region the country is in, ie Central and Eastern Europe, Sub-Saharan Africa, etc.
- Happiness Rank: The rank of the country based on the happiness score
- Happiness Score: The happiness score of the country
- Standard Error: The standard error of the happiness score
- Economy (GDP per Capita): The GDP per capita of the country
- Family (Social Support): how much social support the demographic has
- Freedom: The freedom score of the country
- Generosity: The generosity score of the country
- Health (Life Expectancy): The life expectancy of the country
- Trust (Government Corruption): The trust score of the country
- Dystopia Residual: The dystopia residual score of the country
    - Dystopia is a hypothetical country used for a baseline. The lowest values for each category that is found within the dataset is used for the dystopia country. The residual is the difference between the actual happiness score and the dystopian happiness score.

### Goal 

Our goal in this project is to visualize the data and find any trends that may exist. We will also be finding supplementary data that could help us see any addition causes to a country's happiness score.



### Imports

In [139]:
# Imports go here

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import os
import glob

### Preprocessing

#### World Happiness Reports

The world happiness reports are all formatted differently. Each year looks to add something new to or remove a feature from the reports. Feature names also get changed as the years progress. For this preprocessing step, we are going to be renaming the features into a comprehensive naming convention that will include the following features for each year:

- Country
- Region
- Rank
- Score
- GDP
- Social Support
- Health
- Freedom
- Trust
- Generosity
- Dystopia Residual

Some notes on the features after renaming and dropping:

- Some years will not have a region feature or the values in the region feature will be named differently. We are keeping them in the dataset for visualization purposes.
- Some years will not have a rank or a dystopia residual feature. We will generate these features in the next step.

In [140]:
years = ['2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022']
wh_dfs = {}

year_index = 0
path = os.getcwd()
wh_files = glob.glob(os.path.join(path, "data/world_happiness_reports/*.csv"))

for file in wh_files:
    wh_dfs[years[year_index]] = pd.read_csv(file)
    year_index += 1


In [141]:
for year, df in wh_dfs.items():
    print(year)
    print(df.shape)
    print(list(df.columns))
    print('-'*10)

2015
(158, 12)
['Country', 'Region', 'Happiness Rank', 'Happiness Score', 'Standard Error', 'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)', 'Generosity', 'Dystopia Residual']
----------
2016
(157, 13)
['Country', 'Region', 'Happiness Rank', 'Happiness Score', 'Lower Confidence Interval', 'Upper Confidence Interval', 'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)', 'Generosity', 'Dystopia Residual']
----------
2017
(155, 12)
['Country', 'Happiness.Rank', 'Happiness.Score', 'Whisker.high', 'Whisker.low', 'Economy..GDP.per.Capita.', 'Family', 'Health..Life.Expectancy.', 'Freedom', 'Generosity', 'Trust..Government.Corruption.', 'Dystopia.Residual']
----------
2018
(156, 9)
['Overall rank', 'Country or region', 'Score', 'GDP per capita', 'Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']
----------
201

In [142]:
DROPPED_MAPPER = {
    '2015': ['Standard Error'],
    '2016': ['Lower Confidence Interval', 'Upper Confidence Interval'],
    '2017': ['Whisker.high', 'Whisker.low'],
    '2020': ['Standard error of ladder score', 'upperwhisker', 'lowerwhisker', 
                'Logged GDP per capita', 'Social support', 'Healthy life expectancy',
                'Freedom to make life choices', 'Generosity', 'Perceptions of corruption',
                'Dystopia + residual'],
    '2021': ['Standard error of ladder score', 'upperwhisker', 'lowerwhisker',
                'Logged GDP per capita', 'Social support', 'Healthy life expectancy',
                'Freedom to make life choices', 'Generosity', 'Perceptions of corruption',
                'Dystopia + residual'],
    '2022': ['Whisker-high', 'Whisker-low', 'Dystopia (1.83) + residual']
}

RENAMED_MAPPER = {
    '2015': {'Happiness Rank': 'Rank', 'Happiness Score': 'Score', 'Economy (GDP per Capita)': 'GDP', 'Family' : 'Social Support',
                'Health (Life Expectancy)': 'Health', 'Trust (Government Corruption)': 'Trust'},
    '2016': {'Happiness Rank': 'Rank', 'Happiness Score': 'Score', 'Economy (GDP per Capita)': 'GDP', 'Family' : 'Social Support',
                'Health (Life Expectancy)': 'Health', 'Trust (Government Corruption)': 'Trust'},
    '2017': {'Happiness.Rank': 'Rank', 'Happiness.Score': 'Score', 'Economy..GDP.per.Capita.': 'GDP', 'Family' : 'Social Support',
                'Health..Life.Expectancy.': 'Health', 'Trust..Government.Corruption.': 'Trust', 'Dystopia.Residual': 'Dystopia Residual'},
    '2018': {'Overall rank': 'Rank', 'Country or region': 'Country', 'GDP per capita': 'GDP', 'Social support': 'Social Support',
                'Healthy life expectancy': 'Health', 'Freedom to make life choices': 'Freedom', 'Perceptions of corruption': 'Trust'},
    '2019': {'Overall rank': 'Rank', 'Country or region': 'Country', 'GDP per capita': 'GDP', 'Social support': 'Social Support',
                'Healthy life expectancy': 'Health', 'Freedom to make life choices': 'Freedom', 'Perceptions of corruption': 'Trust'},
    '2020': {'Country name': 'Country', 'Regional indicator': 'Region', 'Ladder score': 'Score', 'Explained by: Log GDP per capita': 'GDP',
                'Explained by: Social support': 'Social Support', 'Explained by: Healthy life expectancy': 'Health',
                'Explained by: Freedom to make life choices': 'Freedom', 'Explained by: Generosity': 'Generosity',
                'Explained by: Perceptions of corruption': 'Trust', 'Ladder score in Dystopia': 'Dystopia Residual'},
    '2021': {'Country name': 'Country', 'Regional indicator': 'Region', 'Ladder score': 'Score', 'Explained by: Log GDP per capita': 'GDP',
                'Explained by: Social support': 'Social Support', 'Explained by: Healthy life expectancy': 'Health',
                'Explained by: Freedom to make life choices': 'Freedom', 'Explained by: Generosity': 'Generosity',
                'Explained by: Perceptions of corruption': 'Trust', 'Ladder score in Dystopia': 'Dystopia Residual'},
    '2022': {'Happiness score': 'Score', 'RANK': 'Rank', 'Explained by: GDP per capita': 'GDP', 'Explained by: Social support': 'Social Support',
                'Explained by: Healthy life expectancy': 'Health', 'Explained by: Freedom to make life choices': 'Freedom',
                'Explained by: Generosity': 'Generosity', 'Explained by: Perceptions of corruption': 'Trust'}
}


In [143]:
# Drop and rename columns as shown in our mappers above
for year, df in wh_dfs.items():
    if year in DROPPED_MAPPER:
        df.drop(columns=DROPPED_MAPPER[year], inplace=True)
    if year in RENAMED_MAPPER:
        df.rename(columns=RENAMED_MAPPER[year], inplace=True)
    

In [144]:
# Check to see if we the columns were changed
for year, df in wh_dfs.items():
    print(year)
    print(df.shape)
    print(list(df.columns.sort_values()))
    print('-'*10)

2015
(158, 11)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Region', 'Score', 'Social Support', 'Trust']
----------
2016
(157, 11)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Region', 'Score', 'Social Support', 'Trust']
----------
2017
(155, 10)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Score', 'Social Support', 'Trust']
----------
2018
(156, 9)
['Country', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Score', 'Social Support', 'Trust']
----------
2019
(156, 9)
['Country', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Score', 'Social Support', 'Trust']
----------
2020
(153, 10)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Region', 'Score', 'Social Support', 'Trust']
----------
2021
(149, 10)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Region', 'Score', 'Social Support', 'Trust']
----------
2022


Now we will generate rank and dystopia residual features for years that do not have them.

These are the years without rank:

- 2020
- 2021

These are the years without dystopia residual:

- 2018
- 2019
- 2022

In [145]:
def generate_rank_feature(df):
    df['Rank'] = df['Score'].rank(ascending=False, method='min').astype(int)
    return df

def generate_dystopia_residual_feature(df):
    score = int(df['Score'].mean())
    df['Dystopia Residual'] = df['Score'] - (df['GDP'] + df['Social Support'] + df['Health'] + df['Freedom'] + df['Generosity'] + df['Trust'])
    return df

In [146]:
wh_dfs['2021']

Unnamed: 0,Country,Region,Score,Dystopia Residual,GDP,Social Support,Health,Freedom,Generosity,Trust
0,Finland,Western Europe,7.842,2.43,1.446,1.106,0.741,0.691,0.124,0.481
1,Denmark,Western Europe,7.620,2.43,1.502,1.108,0.763,0.686,0.208,0.485
2,Switzerland,Western Europe,7.571,2.43,1.566,1.079,0.816,0.653,0.204,0.413
3,Iceland,Western Europe,7.554,2.43,1.482,1.172,0.772,0.698,0.293,0.170
4,Netherlands,Western Europe,7.464,2.43,1.501,1.079,0.753,0.647,0.302,0.384
...,...,...,...,...,...,...,...,...,...,...
144,Lesotho,Sub-Saharan Africa,3.512,2.43,0.451,0.731,0.007,0.405,0.103,0.015
145,Botswana,Sub-Saharan Africa,3.467,2.43,1.099,0.724,0.340,0.539,0.027,0.088
146,Rwanda,Sub-Saharan Africa,3.415,2.43,0.364,0.202,0.407,0.627,0.227,0.493
147,Zimbabwe,Sub-Saharan Africa,3.145,2.43,0.457,0.649,0.243,0.359,0.157,0.075


In [147]:
YEARS_WITHOUT_RANK = ['2020', '2021']
YEARS_WITHOUT_DYSTOPIA_RESIDUAL = ['2018', '2019', '2022']

for year in YEARS_WITHOUT_RANK:
    wh_dfs[year] = generate_rank_feature(wh_dfs[year])

wh_dfs['2021']

# for year in YEARS_WITHOUT_DYSTOPIA_RESIDUAL:
#     wh_dfs[year] = generate_dystopia_residual_feature(wh_dfs[year])

Unnamed: 0,Country,Region,Score,Dystopia Residual,GDP,Social Support,Health,Freedom,Generosity,Trust,Rank
0,Finland,Western Europe,7.842,2.43,1.446,1.106,0.741,0.691,0.124,0.481,1
1,Denmark,Western Europe,7.620,2.43,1.502,1.108,0.763,0.686,0.208,0.485,2
2,Switzerland,Western Europe,7.571,2.43,1.566,1.079,0.816,0.653,0.204,0.413,3
3,Iceland,Western Europe,7.554,2.43,1.482,1.172,0.772,0.698,0.293,0.170,4
4,Netherlands,Western Europe,7.464,2.43,1.501,1.079,0.753,0.647,0.302,0.384,5
...,...,...,...,...,...,...,...,...,...,...,...
144,Lesotho,Sub-Saharan Africa,3.512,2.43,0.451,0.731,0.007,0.405,0.103,0.015,145
145,Botswana,Sub-Saharan Africa,3.467,2.43,1.099,0.724,0.340,0.539,0.027,0.088,146
146,Rwanda,Sub-Saharan Africa,3.415,2.43,0.364,0.202,0.407,0.627,0.227,0.493,147
147,Zimbabwe,Sub-Saharan Africa,3.145,2.43,0.457,0.649,0.243,0.359,0.157,0.075,148
