# Happiness: _A Reflection on a Worldly Subjective Subjective System_

Caleb Jensen, Undergrad \
Christian Allen, Undergrad \
Jessyca Allen, Undergrad 

---

## Table of Contents

1. [Introduction](#Introduction)
2. [Data](#Data)  
    2.1 [World Happiness Reports 2015-2022](#World-Happiness-Reports-2015-2022)
3. [Goal](#Goal)
4. [Imports](#Imports)
5. [Preprocessing](#Preprocessing)  
    5.1 [World Happiness Reports 2015-2022](#World-Happiness-Reports)
6. [Trends and Analysis](#Trends-and-Analysis)
    6.1 [Initial trends in the WHR data](#Initial-trends-in-the-whr-data)
    
### **Introduction** 

**ADD INTRO HERE**

### **Data** 

#### _World Happiness Reports 2015-2022_ 

The World Happiness Report (WHR) is a global survey conducted in over 150 countries and it is done annually. The report scores each countries happiness based on its citizens' response. There are also various contributing factors that are used to help calculate the happiness score. Each year, more or less features are included and some feature names change over time. We are going to look at the dataset with the following features in mind:

- Country: The country the data is from
- Region: The region the country is in, ie Central and Eastern Europe, Sub-Saharan Africa, etc.
- Rank: The rank of the country based on the happiness score
- Score: The happiness score of the country
- Standard Error: The standard error of the happiness score
- Economy (GDP per Capita): The GDP per capita of the country
- Family (Social Support): how much social support the demographic has
- Freedom: The freedom score of the country
- Generosity: The generosity score of the country
- Health (Life Expectancy): The life expectancy of the country
- Trust (Government Corruption): The trust score of the country
- Dystopia Residual: The dystopia residual score of the country
    - Dystopia is a hypothetical country used for a baseline. The lowest values for each category that is found within the dataset is used for the dystopia country. The residual is the difference between the actual happiness score and the dystopian happiness score.

### Goal 

The goal of this project is to visualize the data and find any trends that may exist. The report will also include supplementary data that could help us see any additional causes to a country's happiness score.



### Imports 

In [1]:
# Imports go here

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import os
import glob

### Preprocessing

#### World Happiness Reports

The world happiness reports are all formatted differently. Each year looks to add something new to or remove a feature from the reports. Feature names also get changed as the years progress. For this preprocessing step, the features in each report will be either renamed or removed to form a comprehensive naming convention that will include the following features for each year:

- Country
- Region
- Rank
- Score
- GDP
- Social Support
- Health
- Freedom
- Trust
- Generosity
- Dystopia Residual

Some notes on the features after renaming and dropping:

- Some years will not have a region feature or the values in the region feature will be named differently. They will be kept in the dataset for visualization purposes.
- Some years will not have a rank or a dystopia residual feature. They will be generated in the next step.

In [2]:
years = ['2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022']
wh_dfs = {}

year_index = 0
path = os.getcwd()
wh_files = glob.glob(os.path.join(path, "data/world_happiness_reports/*.csv"))

for file in wh_files:
    wh_dfs[years[year_index]] = pd.read_csv(file)
    year_index += 1


In [3]:
for year, df in wh_dfs.items():
    print(year)
    print(df.shape)
    print(list(df.columns))
    print('-'*10)

2015
(158, 12)
['Country', 'Region', 'Happiness Rank', 'Happiness Score', 'Standard Error', 'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)', 'Generosity', 'Dystopia Residual']
----------
2016
(157, 13)
['Country', 'Region', 'Happiness Rank', 'Happiness Score', 'Lower Confidence Interval', 'Upper Confidence Interval', 'Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)', 'Freedom', 'Trust (Government Corruption)', 'Generosity', 'Dystopia Residual']
----------
2017
(155, 12)
['Country', 'Happiness.Rank', 'Happiness.Score', 'Whisker.high', 'Whisker.low', 'Economy..GDP.per.Capita.', 'Family', 'Health..Life.Expectancy.', 'Freedom', 'Generosity', 'Trust..Government.Corruption.', 'Dystopia.Residual']
----------
2018
(156, 9)
['Overall rank', 'Country or region', 'Score', 'GDP per capita', 'Social support', 'Healthy life expectancy', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']
----------
201

In [4]:
DROPPED_MAPPER = {
    '2015': ['Standard Error'],
    '2016': ['Lower Confidence Interval', 'Upper Confidence Interval'],
    '2017': ['Whisker.high', 'Whisker.low'],
    '2020': ['Standard error of ladder score', 'upperwhisker', 'lowerwhisker', 
                'Logged GDP per capita', 'Social support', 'Healthy life expectancy',
                'Freedom to make life choices', 'Generosity', 'Perceptions of corruption',
                'Dystopia + residual'],
    '2021': ['Standard error of ladder score', 'upperwhisker', 'lowerwhisker',
                'Logged GDP per capita', 'Social support', 'Healthy life expectancy',
                'Freedom to make life choices', 'Generosity', 'Perceptions of corruption',
                'Dystopia + residual'],
    '2022': ['Whisker-high', 'Whisker-low', 'Dystopia (1.83) + residual']
}

RENAMED_MAPPER = {
    '2015': {'Happiness Rank': 'Rank', 'Happiness Score': 'Score', 'Economy (GDP per Capita)': 'GDP', 'Family' : 'Social Support',
                'Health (Life Expectancy)': 'Health', 'Trust (Government Corruption)': 'Trust'},
    '2016': {'Happiness Rank': 'Rank', 'Happiness Score': 'Score', 'Economy (GDP per Capita)': 'GDP', 'Family' : 'Social Support',
                'Health (Life Expectancy)': 'Health', 'Trust (Government Corruption)': 'Trust'},
    '2017': {'Happiness.Rank': 'Rank', 'Happiness.Score': 'Score', 'Economy..GDP.per.Capita.': 'GDP', 'Family' : 'Social Support',
                'Health..Life.Expectancy.': 'Health', 'Trust..Government.Corruption.': 'Trust', 'Dystopia.Residual': 'Dystopia Residual'},
    '2018': {'Overall rank': 'Rank', 'Country or region': 'Country', 'GDP per capita': 'GDP', 'Social support': 'Social Support',
                'Healthy life expectancy': 'Health', 'Freedom to make life choices': 'Freedom', 'Perceptions of corruption': 'Trust'},
    '2019': {'Overall rank': 'Rank', 'Country or region': 'Country', 'GDP per capita': 'GDP', 'Social support': 'Social Support',
                'Healthy life expectancy': 'Health', 'Freedom to make life choices': 'Freedom', 'Perceptions of corruption': 'Trust'},
    '2020': {'Country name': 'Country', 'Regional indicator': 'Region', 'Ladder score': 'Score', 'Explained by: Log GDP per capita': 'GDP',
                'Explained by: Social support': 'Social Support', 'Explained by: Healthy life expectancy': 'Health',
                'Explained by: Freedom to make life choices': 'Freedom', 'Explained by: Generosity': 'Generosity',
                'Explained by: Perceptions of corruption': 'Trust', 'Ladder score in Dystopia': 'Dystopia Residual'},
    '2021': {'Country name': 'Country', 'Regional indicator': 'Region', 'Ladder score': 'Score', 'Explained by: Log GDP per capita': 'GDP',
                'Explained by: Social support': 'Social Support', 'Explained by: Healthy life expectancy': 'Health',
                'Explained by: Freedom to make life choices': 'Freedom', 'Explained by: Generosity': 'Generosity',
                'Explained by: Perceptions of corruption': 'Trust', 'Ladder score in Dystopia': 'Dystopia Residual'},
    '2022': {'Happiness score': 'Score', 'RANK': 'Rank', 'Explained by: GDP per capita': 'GDP', 'Explained by: Social support': 'Social Support',
                'Explained by: Healthy life expectancy': 'Health', 'Explained by: Freedom to make life choices': 'Freedom',
                'Explained by: Generosity': 'Generosity', 'Explained by: Perceptions of corruption': 'Trust'}
}


In [5]:
# Drop and rename columns as shown in our mappers above
for year, df in wh_dfs.items():
    if year in DROPPED_MAPPER:
        df.drop(columns=DROPPED_MAPPER[year], inplace=True)
    if year in RENAMED_MAPPER:
        df.rename(columns=RENAMED_MAPPER[year], inplace=True)
    

In [15]:
# Check to see if the columns were changed
for year, df in wh_dfs.items():
    print(year)
    print(df.shape)
    print(list(df.columns.sort_values()))
    print('-'*10)

2015
(158, 11)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Region', 'Score', 'Social Support', 'Trust']
----------
2016
(157, 11)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Region', 'Score', 'Social Support', 'Trust']
----------
2017
(155, 10)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Score', 'Social Support', 'Trust']
----------
2018
(156, 10)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Score', 'Social Support', 'Trust']
----------
2019
(156, 10)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Score', 'Social Support', 'Trust']
----------
2020
(153, 11)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Region', 'Score', 'Social Support', 'Trust']
----------
2021
(149, 11)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'R

Before the rank and the dystopia residual are generated, the dataset for 2022 contains EU formatted decimals. They need to be converted to floats in order to complete the dystopia residual calculation.

In [7]:
df_2022 = wh_dfs['2022']
df_2022.head()

Unnamed: 0,Rank,Country,Score,GDP,Social Support,Health,Freedom,Generosity,Trust
0,1,Finland,7821,1892,1258,775,736,109,534
1,2,Denmark,7636,1953,1243,777,719,188,532
2,3,Iceland,7557,1936,1320,803,718,270,191
3,4,Switzerland,7512,2026,1226,822,677,147,461
4,5,Netherlands,7415,1945,1206,787,651,271,419


In [8]:
df_2022.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 147 entries, 0 to 146
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Rank            147 non-null    int64 
 1   Country         147 non-null    object
 2   Score           146 non-null    object
 3   GDP             146 non-null    object
 4   Social Support  146 non-null    object
 5   Health          146 non-null    object
 6   Freedom         146 non-null    object
 7   Generosity      146 non-null    object
 8   Trust           146 non-null    object
dtypes: int64(1), object(8)
memory usage: 10.5+ KB


In [9]:
COVERT_ROWS = ['Score', 'GDP', 'Social Support', 'Health', 'Freedom', 'Generosity', 'Trust']

for row in COVERT_ROWS:
    df_2022[row] = df_2022[row].str.replace(',', '.').astype(float)

df_2022.head()

Unnamed: 0,Rank,Country,Score,GDP,Social Support,Health,Freedom,Generosity,Trust
0,1,Finland,7.821,1.892,1.258,0.775,0.736,0.109,0.534
1,2,Denmark,7.636,1.953,1.243,0.777,0.719,0.188,0.532
2,3,Iceland,7.557,1.936,1.32,0.803,0.718,0.27,0.191
3,4,Switzerland,7.512,2.026,1.226,0.822,0.677,0.147,0.461
4,5,Netherlands,7.415,1.945,1.206,0.787,0.651,0.271,0.419


In [10]:
df_2022.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 147 entries, 0 to 146
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Rank            147 non-null    int64  
 1   Country         147 non-null    object 
 2   Score           146 non-null    float64
 3   GDP             146 non-null    float64
 4   Social Support  146 non-null    float64
 5   Health          146 non-null    float64
 6   Freedom         146 non-null    float64
 7   Generosity      146 non-null    float64
 8   Trust           146 non-null    float64
dtypes: float64(7), int64(1), object(1)
memory usage: 10.5+ KB


In [11]:
wh_dfs['2022'] = df_2022

Now the rank and dystopia residual features will be calculated for the years that do not have them.

These are the years without the rank feature:

- 2020
- 2021

These are the years without the dystopia residual feature:

- 2018
- 2019
- 2022

In [12]:
def generate_rank_feature(df):
    df['Rank'] = df['Score'].rank(ascending=False, method='min').astype(int)
    return df

def generate_dystopia_residual_feature(df):
    df['Dystopia Residual'] = df['Score'] - (df['GDP'] + df['Social Support'] + df['Health'] + df['Freedom'] + df['Generosity'] + df['Trust'])
    return df

In [13]:
YEARS_WITHOUT_RANK = ['2020', '2021']
YEARS_WITHOUT_DYSTOPIA_RESIDUAL = ['2018', '2019', '2022']

for year in YEARS_WITHOUT_RANK:
    wh_dfs[year] = generate_rank_feature(wh_dfs[year])

wh_dfs['2021']

for year in YEARS_WITHOUT_DYSTOPIA_RESIDUAL:
    wh_dfs[year] = generate_dystopia_residual_feature(wh_dfs[year])

With the rank and dystopia residual features generated, the dataset is ready for analysis and visualization. It can be seen below that the dataframes contain the same features, aside from 'Region'. As for the number of rows (_the set of countries surveyed_) in each year, the different data will be left as it shouldn't affect the analysis of the report.

In [14]:
for year, df in wh_dfs.items():
    print(year)
    print(df.shape)
    print(list(df.columns.sort_values()))
    print('-'*10)

2015
(158, 11)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Region', 'Score', 'Social Support', 'Trust']
----------
2016
(157, 11)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Region', 'Score', 'Social Support', 'Trust']
----------
2017
(155, 10)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Score', 'Social Support', 'Trust']
----------
2018
(156, 10)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Score', 'Social Support', 'Trust']
----------
2019
(156, 10)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Score', 'Social Support', 'Trust']
----------
2020
(153, 11)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'Region', 'Score', 'Social Support', 'Trust']
----------
2021
(149, 11)
['Country', 'Dystopia Residual', 'Freedom', 'GDP', 'Generosity', 'Health', 'Rank', 'R

### Trends and Analysis

#### Initial trends in the WHR data

The WHR reports are small in terms of features, but can still show a lot of what is contributing to people's happiness or dissatisfaction within a country. The features can also give leads to additional data that might better explain the score. In this section, the initial trends found in the reports themselves will be displayed and analyzed.

Even though the number of features are small, the features themselves are very different. What matters to people most? Does freedom to make a choice create a happier country? Does health matter more or does wealth? Some of these features may be stronger than others throughout the years. They may also be different for each country.