Linear Regression of World Happiness Report

In [1]:
from IPython.display import Image
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
import warnings
warnings.simplefilter("ignore")
%matplotlib inline

matplotlib.rcParams['figure.figsize'] = [12, 12]

In [2]:
world = pd.read_csv("data/world_happiness.1.initial_process.csv")

In [3]:
world.head()

Unnamed: 0,country,region,happiness_rank,happiness_score,standard_error,economy,family,health,freedom,gov_trust,generosity
0,Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678
1,Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363
2,Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139
3,Norway,Western Europe,4,7.522,0.0388,1.459,1.33095,0.88521,0.66973,0.36503,0.34699
4,Canada,North America,5,7.427,0.03553,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811


Gather the numerical variables that should be important for the change in the target variable, which is the happiness score.

In [4]:
numerical_data = world[["happiness_score", "economy", "family", "health", "freedom", "gov_trust", "generosity"]]
numerical_data = numerical_data.fillna(0)
numerical_data.head()

Unnamed: 0,happiness_score,economy,family,health,freedom,gov_trust,generosity
0,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678
1,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363
2,7.527,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139
3,7.522,1.459,1.33095,0.88521,0.66973,0.36503,0.34699
4,7.427,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811


Creating a Linear Regression Model

In [5]:
from sklearn.linear_model import LinearRegression

In [6]:
model = LinearRegression()

In [7]:
target_variable = numerical_data["happiness_score"]
independent_variables = numerical_data[["economy", "family", "health", "freedom", "gov_trust", "generosity"]]

In [8]:
model.fit(X=independent_variables, y=target_variable)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [9]:
predictions = model.predict(independent_variables)

In [10]:
df = pd.DataFrame(independent_variables)
df["happiness_score"] = target_variable
df.head()

Unnamed: 0,economy,family,health,freedom,gov_trust,generosity,happiness_score
0,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,7.587
1,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,7.561
2,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,7.527
3,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,7.522
4,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,7.427


In [11]:
df["score_pred"] = model.predict(df[["economy", "family", "health", "freedom", "gov_trust", "generosity"]])

In [12]:
df.head(10)

Unnamed: 0,economy,family,health,freedom,gov_trust,generosity,happiness_score,score_pred
0,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,7.587,7.213854
1,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,7.561,7.00015
2,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,7.527,7.148984
3,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,7.522,7.168777
4,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,7.427,7.029107
5,1.29025,1.31826,0.88911,0.64169,0.41372,0.23351,7.406,6.96614
6,1.32944,1.28017,0.89284,0.61576,0.31814,0.4761,7.378,6.934631
7,1.33171,1.28907,0.91087,0.6598,0.43844,0.36262,7.364,7.075677
8,1.25018,1.31967,0.90837,0.63938,0.42922,0.47501,7.286,7.055432
9,1.33358,1.30923,0.93156,0.65124,0.35637,0.43562,7.284,7.07846


Strangely, all of the happiness scores are predicted to be lower in the future. Perhaps the data given in the original set is not intended to be used as a predictor. However, when compared to the next year's list (2016) of the happiest, many of the top nations lower in rank and score.

In [13]:
#2015 data
world.head()

Unnamed: 0,country,region,happiness_rank,happiness_score,standard_error,economy,family,health,freedom,gov_trust,generosity
0,Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678
1,Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363
2,Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139
3,Norway,Western Europe,4,7.522,0.0388,1.459,1.33095,0.88521,0.66973,0.36503,0.34699
4,Canada,North America,5,7.427,0.03553,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811


In [14]:
#2016 data
world_2016 = pd.read_csv("data/2016.csv")
world_2016.head()

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Lower Confidence Interval,Upper Confidence Interval,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
0,Denmark,Western Europe,1,7.526,7.46,7.592,1.44178,1.16374,0.79504,0.57941,0.44453,0.36171,2.73939
1,Switzerland,Western Europe,2,7.509,7.428,7.59,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083,2.69463
2,Iceland,Western Europe,3,7.501,7.333,7.669,1.42666,1.18326,0.86733,0.56624,0.14975,0.47678,2.83137
3,Norway,Western Europe,4,7.498,7.421,7.575,1.57744,1.1269,0.79579,0.59609,0.35776,0.37895,2.66465
4,Finland,Western Europe,5,7.413,7.351,7.475,1.40598,1.13464,0.81091,0.57104,0.41004,0.25492,2.82596


Three out of five nations (Switzerland, Norway, and Canada) have lower happiness scores in 2016 when compared to 2016. 
However, the regression model seems to exagerate how much it will lower by, and for that reason, this dataset is not the most reliable indicator to predict a country's future happiness.

The issues that this data set encounters is likely that it is extremely difficult to accurately measure the happiness of a person on a scale of 1-10, let alone average all the scores of a country. The report can be affected by various confounding factors, such as people not responding honestly to the questions and the survey takers having a different relative scale for what is "maximum happiness". 

The dataset ultimately should be deemed reliable since it shows that developed nations with better economies, healthcare, and general treatment of citizens are vital factors to determining a nation's collective happiness. However, once again, it is not the best to make any sort of accurate predictions.