World Happiness Data Analysis

Happiness can be considered as the most important aspect that determines the quality of life. In a World Happiness Report from 2015, individual countries are ranked from having the happiest population to the least based on the Happiness Score: a metric measured by asking the sampled people the question: "How would you rate your happiness on a scale of 0 to 10 where 10 is the happiest". The report also considers variables such national GDP (economy) and government trust in terms of its contribution to the Happiness Score in each country.

Data Loading

In [27]:
import pandas as pd

In [28]:
%matplotlib inline

import matplotlib as mpl
mpl.rcParams["figure.figsize"] = (12,12)

In [29]:
world = pd.read_csv("data/2015_happiness.csv")

In [30]:
world.head()

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
0,Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738
1,Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201
2,Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204
3,Norway,Western Europe,4,7.522,0.0388,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531
4,Canada,North America,5,7.427,0.03553,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176


In [31]:
world.shape

(158, 12)

In [32]:
world.dtypes

Country                           object
Region                            object
Happiness Rank                     int64
Happiness Score                  float64
Standard Error                   float64
Economy (GDP per Capita)         float64
Family                           float64
Health (Life Expectancy)         float64
Freedom                          float64
Trust (Government Corruption)    float64
Generosity                       float64
Dystopia Residual                float64
dtype: object

In [33]:
world = world.drop(["Dystopia Residual"], axis=1)

In [None]:
The column names are changed to be simpler

In [34]:
world = world.rename(columns={
    "Country":"country",
    "Region":"region", 
    "Happiness Rank":"happiness_rank",
    "Happiness Score":"happiness_score",
    "Standard Error":"standard_error", 
    "Economy (GDP per Capita)":"economy",
    "Family":"family",
    "Health (Life Expectancy)":"health",
    "Freedom":"freedom",
    "Trust (Government Corruption)":"gov_trust",
    "Generosity":"generosity"
})

Goal of the Analysis: Try to determine based off of regionality, economic factors, and political factors what are the most important elements that determine happiness in a country/continent. Also, try to predict the happiness score of countries and their possible rankings in the future.

Data Dictionary

    Country (categorical).
    Region(categorical). The continent or general area of the country
    Hapiness Rank (int). 
    Happiness Score (float). 
    Standard Error (float). Based on the confidence intervals of Happiness Scores
    Economy (float). The extent to which GDP per capita contributes to the calculation of the Happiness Score
    Family (float). The extent to which Family contributes to the calculation of the Happiness Score
    Health (float). The extent to which Life expectancy contributed to the calculation of the Happiness Score
    Freedom (float). The extent to which Freedom contributed to the calculation of the Happiness Score
    Trust (float). The extent to which perception of government corruption contributes to Happiness Score
    Generosity (float). The extent to which Generosity contributed to the calculation of the Happiness Score
    

In [35]:
world.to_csv("data/world_happiness.1.initial_process.csv", index=False)