In [15]:
import pandas as pd
import numpy as np
import bokeh
from bokeh.plotting import figure, output_notebook, output_file, show
from bokeh.models import ColumnDataSource

# Purpose

I will start my analysis of the 2021 World Happiness Report by comparing our most recent year's findings with those findings 5 years prior. I am expecting there to be little change and due to our current political climate, the United States will not be ranked in the top ten. I expect those countries with socialized medicine and truly democratic processes to dominate in ranking, presumably many Nordic countries and due to COVID-19, New Zealand who has garnered media attention for their government's exceptional handling of the pandemic. 
I am hoping to show that life_expectancy + freedom_score in the World Happiness Report are the primary driving indicators of what consists a happy country and that the before mentioned countries will be at the top of the rankings.

# Context

In order to understand how the World Happiness Report is evaluated, I wanted to provide context into what the scoring means as some columns reside on a 0-10 scale or a binary 0-1 (no/yes) scale. I also visited the appendix for 2021's World Happiness Report to gain a better understanding of what questions pollsters are asking these nation's residents and have included that file for further reading. 

The definitions below are a brief summary of the definitions found in the appendix mentioned above, which typically contained more technical information on how the definitions were defined that I felt was unnecessary for our current purposes. 

# Definitions

ladder_score = General happiness score, participants are asked the following question and this column contains the average score from these responses: "Please imagine a ladder, with steps numbered from 0 at the bottom to 10 at the top. The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?"

gdp = GDP per capita.

social_support = National average response to the following question: "If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?"

life_expectancy = National average of healthy life expectancy. 

freedom_score = National average response to the following question on a 0-1 scale (no/yes): "Are you satisfied or dissatisfied with your freedom to choose what you do with your life?"

generosity = National average response to the following question on a 0-1 scale (no/yes): "Have you donated money to a charity in the past month?"

corruption = National average response to the following questions on a 0-1 scale (no/yes): "Is corruption widespread throughout the government or not" and "Is corruption widespread within businesses or not?" 



# Initial Project Proposal

For my final, after combing through some of the data sets that were made available to us, I thought it would be interesting to see which country is the “happiest” country in the world and provide additional speculative insight into how I believe this data can be used to make additional arguments (https://www.kaggle.com/ajaypalsinghlo/world-happiness-report-2021). My reasoning, is that some may value some key metrics above others, differing from their existing model of what how much weight is given to each category. The six key metrics evaluated in this report are Economic Production, Social Support, Life Expectancy, Freedom, Absence of Corruption, and Generosity. For these six key metrics this report is founded on, they may weigh Freedom higher than Life Expectancy which some may not agree – I would argue how can you appreciate this freedom if life expectancy for your nation is extremely low, but this may approach a more philosophical form of argument as is the alternative of living very long in a form of captivity or enslavement in a society devoid of Freedom better? To divert back to the final’s requirements, I believe this dataset would allow me the ability to display my skills in data acquisition and allows me to transform this data to receive the top results for the happiest country based upon a statistical grouping of the six key metrics that I can display via imbedded graphs and figures. I will also match my results against other’s, such as Forbes who I believe is notorious for publishing these types of rankings from the same data set (https://www.forbes.com/sites/laurabegleybloom/2021/03/19/the-20-happiest-countries-in-the-world-in-2021/?sh=7e85a73870a0).

In [16]:
# Retreiving downloaded .CSV of the 2021 World Happiness Report.
# Pulling all columns as mentioned in my project proposal for relevance. 
whr_df_2021 = pd.read_csv (r'world-happiness-report-2021.csv', 
header = None, usecols = [0, 2, 6, 7, 8, 9, 10, 11], 
names = ['country', 'ladder_score', 'gdp', 'social_support', 'life_expectancy', 'freedom_score', 'generosity', 'corruption'])

In [17]:
display(whr_df_2021)

Unnamed: 0,country,ladder_score,gdp,social_support,life_expectancy,freedom_score,generosity,corruption
0,Finland,7.842,10.775,0.954,72.000,0.949,-0.098,0.186
1,Denmark,7.620,10.933,0.954,72.700,0.946,0.030,0.179
2,Switzerland,7.571,11.117,0.942,74.400,0.919,0.025,0.292
3,Iceland,7.554,10.878,0.983,73.000,0.955,0.160,0.673
4,Netherlands,7.464,10.932,0.942,72.400,0.913,0.175,0.338
...,...,...,...,...,...,...,...,...
144,Lesotho,3.512,7.926,0.787,48.700,0.715,-0.131,0.915
145,Botswana,3.467,9.782,0.784,59.269,0.824,-0.246,0.801
146,Rwanda,3.415,7.676,0.552,61.400,0.897,0.061,0.167
147,Zimbabwe,3.145,7.943,0.750,56.201,0.677,-0.047,0.821


In [18]:
# Pulling 2016 world happiness report to match against 2021's.
# Pulling all relevant columns to match 2021 report.
whr_df_2016 = pd.read_csv (r'world-happiness-report-2016.csv', 
header = None, usecols = [0, 3, 6, 7, 8, 9, 11, 10], 
names = ['country', 'ladder_score', 'gdp', 'social_support', 'life_expectancy', 'freedom_score', 'generosity', 'corruption'])
display(whr_df_2016)

Unnamed: 0,country,ladder_score,gdp,social_support,life_expectancy,freedom_score,generosity,corruption
0,Denmark,7.526,1.44178,1.16374,0.79504,0.57941,0.44453,0.36171
1,Switzerland,7.509,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083
2,Iceland,7.501,1.42666,1.18326,0.86733,0.56624,0.14975,0.47678
3,Norway,7.498,1.57744,1.12690,0.79579,0.59609,0.35776,0.37895
4,Finland,7.413,1.40598,1.13464,0.81091,0.57104,0.41004,0.25492
...,...,...,...,...,...,...,...,...
152,Benin,3.484,0.39499,0.10419,0.21028,0.39747,0.06681,0.20180
153,Afghanistan,3.360,0.38227,0.11037,0.17344,0.16430,0.07112,0.31268
154,Togo,3.303,0.28123,0.00000,0.24811,0.34678,0.11587,0.17517
155,Syria,3.069,0.74719,0.14866,0.62994,0.06912,0.17233,0.48397


# Preliminary Analysis

We can see from the top 10 records of each dataset that majority of the countries five years later are still included with the exception of Australia and Canada who fell out of the top 10 from 2016 -> 2021. Austria and Luxembourg has replaced them, and generally speaking most countries with the exception of Finland remained in the same general ranking.

In [19]:
# Preliminary top 10 analysis
whr_df_2021.head(10)

Unnamed: 0,country,ladder_score,gdp,social_support,life_expectancy,freedom_score,generosity,corruption
0,Finland,7.842,10.775,0.954,72.0,0.949,-0.098,0.186
1,Denmark,7.62,10.933,0.954,72.7,0.946,0.03,0.179
2,Switzerland,7.571,11.117,0.942,74.4,0.919,0.025,0.292
3,Iceland,7.554,10.878,0.983,73.0,0.955,0.16,0.673
4,Netherlands,7.464,10.932,0.942,72.4,0.913,0.175,0.338
5,Norway,7.392,11.053,0.954,73.3,0.96,0.093,0.27
6,Sweden,7.363,10.867,0.934,72.7,0.945,0.086,0.237
7,Luxembourg,7.324,11.647,0.908,72.6,0.907,-0.034,0.386
8,New Zealand,7.277,10.643,0.948,73.4,0.929,0.134,0.242
9,Austria,7.268,10.906,0.934,73.3,0.908,0.042,0.481


In [20]:
# Preliminary top 10 analysis
whr_df_2016.head(10)

Unnamed: 0,country,ladder_score,gdp,social_support,life_expectancy,freedom_score,generosity,corruption
0,Denmark,7.526,1.44178,1.16374,0.79504,0.57941,0.44453,0.36171
1,Switzerland,7.509,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083
2,Iceland,7.501,1.42666,1.18326,0.86733,0.56624,0.14975,0.47678
3,Norway,7.498,1.57744,1.1269,0.79579,0.59609,0.35776,0.37895
4,Finland,7.413,1.40598,1.13464,0.81091,0.57104,0.41004,0.25492
5,Canada,7.404,1.44015,1.0961,0.8276,0.5737,0.31329,0.44834
6,Netherlands,7.339,1.46468,1.02912,0.81231,0.55211,0.29927,0.47416
7,New Zealand,7.334,1.36066,1.17278,0.83096,0.58147,0.41904,0.49401
8,Australia,7.313,1.44443,1.10476,0.8512,0.56837,0.32331,0.47407
9,Sweden,7.291,1.45181,1.08764,0.83121,0.58218,0.40867,0.38254


# Life Expectancy + Freedom

I would argue that life_expectancy and freedom_score are the two most important indicators of this dataset. One typically wishes they have a long and healthy life, with the freedom from oppression to enjoy one's life to the fullest. The above analysis ranks Singapore (ranked 31) as the country with the highest life expectancy and freedom. Singapore also has a higher GDP, less corruption, and is still ranked thirty spots below Finland.

In [21]:
# life_expectany and freedom_score analysis.

whr_df_2021.sort_values(['life_expectancy', 'freedom_score'], ascending=False).groupby('life_expectancy').head(10)

Unnamed: 0,country,ladder_score,gdp,social_support,life_expectancy,freedom_score,generosity,corruption
31,Singapore,6.377,11.488,0.915,76.953,0.927,-0.018,0.082
76,Hong Kong S.A.R. of China,5.477,11.000,0.836,76.820,0.717,0.067,0.403
55,Japan,5.940,10.611,0.884,75.100,0.796,-0.258,0.638
26,Spain,6.491,10.571,0.932,74.700,0.761,-0.081,0.745
2,Switzerland,7.571,11.117,0.942,74.400,0.919,0.025,0.292
...,...,...,...,...,...,...,...,...
129,Swaziland,4.308,9.065,0.770,50.833,0.647,-0.185,0.708
84,Ivory Coast,5.306,8.551,0.644,50.114,0.741,-0.016,0.794
115,Nigeria,4.759,8.533,0.740,50.102,0.737,0.037,0.878
144,Lesotho,3.512,7.926,0.787,48.700,0.715,-0.131,0.915


In [22]:
# Here I learned how to utilize bokeh to show a relationship of life_expectancy and freedom_score.
# A sample of 50 countries was used, but can be removed to plot all countries.

output_file('scatterplot_data.html')

sample = whr_df_2021.sample(50)
source = ColumnDataSource(sample)

p = figure()
p.circle(x='life_expectancy', y='freedom_score',
         source=source,
         size=10, color='green')

p.title.text = 'Life Expectancy and Freedom Score'
p.xaxis.axis_label = 'life_expectancy'
p.yaxis.axis_label = 'freedom_score'

show(p)

# Conclusion

I had hoped to find a stronger correlation of life_expectancy + freedom_score to be the primary driving indicators of what it takes to be a top 10 country in the World Happiness Report. What I had found, was that Singapore has the highest life_expectancy and freedom_score ratio, but was still ranked 31 due to the poor ladder_score that country received. It also appears life expectancy may be skewing the results shown, Japan and Spain with lower freedom_scores than many of the Nordic countries are still ranking higher after my grouping aggregation due to their citizens living much longer.
My thought process was that in general, people want to live long healthy lives and have enough freedom to 
enjoy their lives - if one lives long in enslavement or shortly in opulence, both cases make for a poor quality of life/happiness in those individuals. 