# The Happiest Places on Earth 2016

## Table of Content

- [Introduction](#introduction)

- [Happiness Trend Over Time](#happiness_trend)

- [Happiness by Country (2016)](#happiness_2016)

- [Happiness vs GDP per capita](#happiness_gdp)

- [Why are Latin Americans Happier?](#happiness_latin)
   - [Happiness vs Social Support](#happiness_social)
   - [Happiness vs Religion](#happiness_religion)
   - [Happiness vs Gene](#happiness_gene)
- [Other Data Exploration](#happiness_other)
- [Conclusion](#happiness_conclusion)
- [Limitation & Future Research Direction](#happiness_limitation)
- [Reference](#happiness_reference)

<a id='introduction'></a>

## Introduction

The World Happiness Report is a landmark survey by the United Nations Sustainable Development Solutions Network about the state of global happiness. Each year they survey people from more than 150 countries and ask questions about their happiness and other aspects in their lives such as social support, freedom, etc. The data published contains country level survey data, as well as some data related to country performance such as GDP and life expectancy.

The project is inspired by this report and we would like to look at what countries are happier and what makes people in a particular region happier than others through visualization of the 2016 dataset.

<a id='happiness_trend'></a>

## Happiness Trend Over Time

Happiness score is the national average score from the following survey question: 

"Please imagine a ladder, with steps numbered from 0 at the bottom to 10 at the top. The top of the ladder represents  the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?"

Our data is a panel data that measure happiness by country over time. We suspect that happiness will not fluctuate a lot over time and we can probably use the most recent happiness data. Let's see if our assumption holds by looking at the trend of happiness score over time.

In [91]:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
%matplotlib inline

In [28]:
df_ts = pd.read_csv('Original_2017_full.csv')

In [29]:
new_names = ['country', 'year', 'happiness', 'log_gdp_per_cap', 'social_support', 
             'life_expectancy', 'freedom', 'generosity', 'corruption_perception', 
             'positive_affect', 'negative_affect', 'confidence_in_government', 
             'democratic_quality', 'delivery_quality', 'happiness_sd', 
             'happiness_sd/mean', 'gini_index', 'gini_index 2000-15', 
             'household_income_gini']

In [30]:
## rename variables for better naming convention
df_ts.columns = new_names

In [31]:
len(df_ts['country'].unique())

164

We have 164 countries, and it's a lot if we are to look at their trend one by one. Thus, we decided to only look at the trend by major regions.

In [32]:
df_re = pd.read_csv('Original_2017_region.csv').rename(columns={'Region indicator': 'region'})

In [33]:
## merge with dataset that has region indicator
df_ts = pd.merge(df_ts, df_re, on='country')

In [92]:
layout = go.Layout(title="Happiness Trend (2006 - 2017)", font=dict(size=18), 
                   xaxis=dict(title='Year', titlefont=dict(size=18), 
                              tickfont=dict(size=14)),
                   yaxis=dict(range=[0, 8], title='Happiness', 
                              titlefont=dict(size=18), tickfont=dict(size=14)),legend=dict(font=dict(size=10)))
fig = {'data': [{'x': df_ts[df_ts['region'] == region].groupby('year')
                 .agg({'happiness': 'mean'}).reset_index()['year'],
                 'y': df_ts[df_ts['region'] == region].groupby('year')
                 .agg({'happiness': 'mean'}).reset_index()['happiness'],
                 'name': region, 'mode': 'lines', } for region in df_ts['region'].unique()], 
       'layout': layout}
py.iplot(fig)

We can see there were some fluctuations in 2005 - 2009. We have looked closely into the data and the fluctuations in early years are probably because the data collection methodology was not that well-established and so there were quite a bit of missing values. Yet, the trend remains pretty stable for the recent years and so we have decided to only focus on the 2016 data. (2016 is used instead 2017 because 2017 has more missing values.)

<a id='happiness_2016'></a>

## Happiness by Country (2016)

In [101]:
df_2016 = df_ts[df_ts['year'] == 2016]

We need the country code for visualization and so we need to merge this dataset with another dataset with country code. When we merge we also include the data about the percentage of people having religion per country in our merge for later use.

In [102]:
df_religion = pd.read_csv('relig_iso.csv')
# df_religion

In [103]:
df_religion = df_religion[['iso', 'country', 'percentage_non_religious']]
df_religion['religion_pct'] = 100 - df_religion['percentage_non_religious']
df_religion = df_religion[['iso', 'country', 'religion_pct']].rename(columns={'iso': 'country_code'})

A small number of country names of this dataset do not match with our original happiness dataset and so we need to rename these countries.

In [41]:
new_names = {'Bosnia Herzegovina': 'Bosnia and Herzegovina',
             'Republic of Congo': 'Congo (Brazzaville)',
             'Democratic Republic of the Congo': 'Congo (Kinshasa)',
             'Finland ': 'Finland',
             'Kyrgyz Republic': 'Kyrgyzstan',
             'Macedonia (FYR)': 'Macedonia',
             'Sudan': 'South Sudan',
             'Taiwan': 'Taiwan Province of China',
             'United States of America': 'United States'}

In [104]:
df_religion.replace({'country': new_names}, inplace=True)

In [106]:
## Merge the datasets, keep all rows in the df dataset
df_2016 = pd.merge(df_2016, df_religion, how='left', on='country')

In [107]:
df_hm = df_2016[['country', 'country_code', 'happiness']]

In [108]:
data = [dict(type='choropleth', locations=df_hm['country_code'], z=df_hm['happiness'],
             text=df_hm['country'],
             colorscale=[[0, "rgb(0, 255, 0)"], [0.25, "rgb(122, 255, 122)"],
                         [0.5, "rgb(220, 220, 220)"], [0.75, "rgb(255, 128, 128)"],
                         [1, "rgb(255, 0, 0)"]],
             autocolorscale=False, reversescale=True,
             marker=dict(line=dict(color='rgb(180, 180, 180)', width=0.5)),
             colorbar=dict(autotick=False, title='Happiness'),)]
layout = dict(title='Happiness by Country (2016)', font=dict(size=18),
              geo=dict(showframe=False, showcoastlines=False,
                       projection=dict(type='Mercator')))
fig = dict(data=data, layout=layout)
py.iplot(fig, validate=False, filename='world-heatmap')

From this heatmap, people in North America, Australia, New Zealand, Western Europe are the happiest, followed by Latin America. People in Africa have the lowest happiness score in general.

Noteworthy is that the low happiness scores for Venezuela and Haiti stand out among Latin American countries. Yet, this is not surprising because Venezuela's inflation has soared to more than 4000% and Haiti is very vulnerable to natural disasters (Ref: 2,3).

<a id='happiness_gdp'></a>

## Can money buy happiness?

Looks like countries that top the happiness list tend to be wealthy countries. Next, we are interested to find out if there is really a relationship between a country's happiness and its GDP per capita.

In [109]:
df_2016['gdp_per_cap'] = np.exp(df_2016['log_gdp_per_cap'])

In [120]:
df_gdp = df_2016[['region', 'happiness', 'gdp_per_cap', 'country']]

In [121]:
## Prepare the data for the right format for charting
df_gdp_2 = pd.get_dummies(df_gdp['region'])

In [122]:
df_gdp = pd.concat([df_gdp, df_gdp_2], axis=1)

In [123]:
region_list = df_gdp_2.columns

In [124]:
for a in region_list:
    df_gdp[a + '_happiness'] = df_gdp[a] * df_gdp['happiness']
    df_gdp[a + '_gdp_per_cap'] = df_gdp[a] * df_gdp['gdp_per_cap']
    df_gdp[a + '_country'] = df_gdp[a] * df_gdp['country']

In [125]:
df_gdp.replace(0, np.nan, inplace=True)

In [126]:
trace = [go.Scatter(y=df_gdp[a + '_happiness'], x=df_gdp[a + '_gdp_per_cap'],
                    text=df_gdp[a + '_country'], mode='markers', name=a) for a in region_list]
data = trace
layout = go.Layout(title='Relationship Between Happiness and GDP', font=dict(size=18),
                   yaxis=dict(title='happiness', range=[0, 8]),
                   xaxis=dict(title='gdp per capita', range=[0, 100000]))
fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

Seems like there is an overall positive relationship between happiness and GDP per capita except for Latin America - They have pretty low GDP and yet, their happiness level is comparable to some countries with much higher GDP. On the other hand, when we compare Latin America with countries having similarly low GDP, Latin America's happiness level is also the highest. 

<a id='happiness_latin'></a>

## Why are Latin Americans happier?

So why can people in Latin America stay happy despite their low GDP? We tried to google the possible reasons that explain why Latin Americans are happier and in general people are suggesting the following reasons:
- They don't go solo: They love doing things in big groups and having someone to share your ups and downs with would make people happier. (Ref: 4)
- They are faithful: They love going to church which helps them keep a positive attitude. (Ref: 5)
- It's in their genes: The pressence of a particular allele in our FAAH gene enhances sensory pleasure and helps to reduce pain. Studies have shown that Latin Americans and Scandinavians have more of this allele. (Ref: 1,6)

We will try to see if our data support these possible explanations.

<a id='happiness_social'></a>

## 1. Latin Americans don't go solo

To examine whether this is the reason why Latin Americans are happier, we refer to the social_support column in our data. Social_support (ranges from 0 to 1) measures how much social support people in a country can get, the higher the value the more support one can get. The definition of social_support is as follows:

Social support (or having someone to count on in times of trouble) is the national average of the binary responses (either 0 or 1) to the question “If you were in trouble, do you have relatives or friends you can count on to help you
whenever you need them, or not?”

In [128]:
df_social = df_2016[['region', 'happiness', 'social_support', 'country']]

In [129]:
df_social_2 = pd.get_dummies(df_social['region'])

In [130]:
df_social = pd.concat([df_social, df_social_2], axis=1)

In [131]:
for a in region_list:
    df_social[a + '_happiness'] = df_social[a] * df_social['happiness']
    df_social[a + '_social_support'] = df_social[a] * df_social['social_support']
    df_social[a + '_country'] = df_social[a] * df_social['country']

In [132]:
df_social.replace(0, np.nan, inplace=True)

In [133]:
trace4 = [go.Scatter(y=df_social[a + '_happiness'], x=df_social[a + '_social_support'],
                     text=df_social[a + '_country'], mode='markers',
                     name=a) for a in region_list]
data4 = trace4
layout4 = go.Layout(title='Relationship Between Happiness and Social Support', 
                    font=dict(size=18), 
                    yaxis=dict(title='happiness', range=[0, 8]),
                    xaxis=dict(title='social support', range=[0, 1]))
fig4 = go.Figure(data=data4, layout=layout4)
py.iplot(fig4)

Looks like there is a pretty strong positive correlation between happiness and social support. Thus, support from families and friends may be one of the reasons why Latin Americans (who have a high social support score) are happy.

<a id='happiness_religion'></a>

## 2. They are faithful

To answer this question, we look at the relationship between a country happiness and the percentage of her people having religion. We get the religion percentage from another dataset and we did the merge in earlier step.

In [134]:
df_reli = df_2016[['region', 'happiness', 'religion_pct', 'country']]

In [135]:
df_reli_2 = pd.get_dummies(df_reli['region'])

In [136]:
df_reli = pd.concat([df_reli, df_reli_2], axis=1)

In [137]:
for a in region_list:
    df_reli[a + '_happiness'] = df_reli[a] * df_reli['happiness']
    df_reli[a + '_religion_pct'] = df_reli[a] * df_reli['religion_pct']
    df_reli[a + '_country'] = df_reli[a] * df_reli['country']

In [138]:
df_reli.replace(0, np.nan, inplace=True)

In [139]:
trace1 = [go.Scatter(y=df_reli[a + '_happiness'], x=df_reli[a + '_religion_pct'],
                     text=df_reli[a + '_country'], mode='markers',
                     name=a) for a in region_list]
data1 = trace1
layout1 = go.Layout(title='Relationship Between Happiness and Religion', 
                    font=dict(size=18),
                    yaxis=dict(title='happiness', range=[0, 8]),
                    xaxis=dict(title='religion percentage (%)', range=[0, 110]))
fig1 = go.Figure(data=data1, layout=layout1)
py.iplot(fig1)

From this graph, there is no obvious relationship between happiness and religion - there are countries with a low percentage of people having religion and yet they are happy such as Netherlands, New Zealand, Sweden, etc.; on the other hand there are also faithful but unhappy countries like those in Sub-Saharan Africa. Thus, we can rule out this explanation based on our data.

<a id='happiness_gene'></a>

## 3. It's in their gene

Finally, we come down to the explanation regarding genetic variation. Since studies have shown that Latin Americans and Scandinavians have more of the "happy allele" that helps reduce pain, we will try to see if these people with more "happy allele" are really happier.

In [140]:
happy_list = ['Denmark', 'Norway', 'Sweden', 'Argentina', 'Bolivia', 'Brazil',
              'Chile', 'Colombia', 'Costa Rica', 'Dominican Republic', 'Ecuador',
              'El Salvador', 'Guatemala', 'Haiti', 'Honduras', 'Mexico', 'Nicaragua',
              'Panama', 'Paraguay', 'Peru', 'Uruguay', 'Venezuela']

In [141]:
df_2016['happy_gene'] = df_2016['country'].apply(lambda x: 1 if x in happy_list else 0)

In [142]:
trace0 = go.Box(y=df_2016['happiness'][df_2016['happy_gene'] == 0].values, 
                name="Less allele")
trace1 = go.Box(y=df_2016['happiness'][df_2016['happy_gene'] == 1].values,
                name="More allele")
data = [trace0, trace1]
layout = go.Layout(title="Happiness between groups with more vs less happy allele",
                   font=dict(size=18),
                   yaxis=dict(range=[0, 8], title='Happiness', 
                              titlefont=dict(size=18), tickfont=dict(size=14)))
fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

People having more of the "happy allele" that helps sensory pleasure and reduce pain do seem to be happier. Therefore, it is possible that part of the reasons why Latin Americans are happier is a born gift.

<a id='happiness_other'></a>

## Other Data Exploration

We also explore other possible reasons why Latin Americans are happier by looking at the relationship between happiness and freedom/ life expectancy.

### Relationship Between Happiness and Freedom

Freedom (ranges from 0 to 1) measures how much freedom people in a country can enjoy, the higher the value the more freedom. The definition of freedom is as follows: 

Freedom is the national average of responses to the question “Are you satisfied (code as 1) or dissatisfied (code as 0) with your freedom to choose what you do with your life?”

In [143]:
df_freedom = df_2016[['region', 'happiness', 'freedom', 'country']]

In [144]:
df_freedom_2 = pd.get_dummies(df_freedom['region'])

In [145]:
df_freedom = pd.concat([df_freedom, df_freedom_2], axis=1)

In [146]:
for a in region_list:
    df_freedom[a + '_happiness'] = df_freedom[a] * df_freedom['happiness']
    df_freedom[a + '_freedom'] = df_freedom[a] * df_freedom['freedom']
    df_freedom[a + '_country'] = df_freedom[a] * df_freedom['country']

In [147]:
df_freedom.replace(0, np.nan, inplace=True)

In [148]:
trace5 = [go.Scatter(y=df_freedom[a + '_happiness'], x=df_freedom[a + '_freedom'],
                     text=df_freedom[a + '_country'], mode='markers', name=a) for a in region_list]
data5 = trace5
layout5 = go.Layout(title='Relationship Between Happiness and Freedom', font=dict(size=18),
                    yaxis=dict(title='happiness', range=[0, 8]),
                    xaxis=dict(title='freedom level', range=[0, 1]))
fig5 = go.Figure(data=data5, layout=layout5)
py.iplot(fig5)

Seems like happiness is positively correlated with freedom. This may help explain why Latin Americans are happier than other countries with similarly low GDP - Latin Americans have more freedom than these other countries.

### Relationship Between Happiness and Life Expectancy

In [149]:
df_life = df_2016[['region', 'happiness', 'life_expectancy', 'country']]

In [150]:
df_life_2 = pd.get_dummies(df_life['region'])

In [151]:
df_life = pd.concat([df_life, df_life_2], axis=1)

In [152]:
for a in region_list:
    df_life[a + '_happiness'] = df_life[a] * df_life['happiness']
    df_life[a + '_life_expectancy'] = df_life[a] * df_life['life_expectancy']
    df_life[a + '_country'] = df_life[a] * df_life['country']

In [153]:
df_life.replace(0, np.nan, inplace=True)

In [154]:
trace3 = [go.Scatter(y=df_life[a + '_happiness'], x=df_life[a + '_life_expectancy'],
                     text=df_life[a + '_country'], mode='markers', name=a) for a in region_list]
data3 = trace3
layout3 = go.Layout(title='Relationship Between Happiness and Life Expectancy',
                    font=dict(size=18),
                    yaxis=dict(title='happiness', range=[0, 8]),
                    xaxis=dict(title='life expectancy (year)', range=[0, 85]))
fig3 = go.Figure(data=data3, layout=layout3)
py.iplot(fig3)

Not surprisingly happiness is positively correlated with life expectancy. This may help explain why Latin Americans are happier than other countries with similar low GDP.

### Relationship Between Religion and GDP

In [155]:
df_reli_gdp = df_2016[['region', 'gdp_per_cap', 'religion_pct', 'country']]

In [156]:
df_reli_gdp_2 = pd.get_dummies(df_reli_gdp['region'])

In [157]:
df_reli_gdp = pd.concat([df_reli_gdp, df_reli_gdp_2], axis=1)

In [158]:
for a in region_list:
    df_reli_gdp[a + '_gdp_per_cap'] = df_reli_gdp[a] * df_reli_gdp['gdp_per_cap']
    df_reli_gdp[a + '_religion_pct'] = df_reli_gdp[a] * df_reli_gdp['religion_pct']
    df_reli_gdp[a + '_country'] = df_reli_gdp[a] * df_reli_gdp['country']

In [159]:
df_reli_gdp.replace(0, np.nan, inplace=True)

In [160]:
trace2 = [go.Scatter(y=df_reli_gdp[a + '_gdp_per_cap'], x=df_reli_gdp[a + '_religion_pct'],
                     text=df_reli_gdp[a + '_country'], mode='markers', name=a) for a in region_list]
data2 = trace2
layout2 = go.Layout(title='Relationship Between GDP and Religion',
                    font=dict(size=18),
                    yaxis=dict(title='GDP per capita',range=[0, 100000]),
                    xaxis=dict(title='religion percentage', range=[0, 110]))
fig2 = go.Figure(data=data2, layout=layout2)
py.iplot(fig2)

We found it interesting that countries with lower GDP tend to be more faithful. Although religion might not be a sure path to happiness, people do seem to turn to religion for comfort when they suffer. 

<a id='conclusion'></a>

## Conclusion

We have looked at happiness level by country and we found that people in North America, Australia, New Zealand and Western Europe are happier, followed by Latin America. When examining the positive relationship between happiness level and gdp per capita, we found that Latin Americans are an exception - they are happy despite their low GDP. Then, we try to look for the reason behind that and our visualizations suggest Latin Americans' happiness is probably the product of their great social support from families and friends, together with their genetic structure, high freedom level and long life expectancy.

<a id='happiness_limitation'></a>

## Limitation & Future Research Direction

These are some limitations in our dataset/ analysis approach and some directions for future research:

1. We try to look at the relationship between happiness and other variables such as GDP, social life, etc. one by one through visualization and the limitation is that with visualization, it is hard to control for other variables since it is hard to look at multiple variables at the same time. For example, from the scatter plot of happiness vs GDP, we see that Latin Americans are happier than people in other countries with similar GDP, but we cannot be sure if Latin America is really comparable to these other countries in all other aspects such as social support, life expectancy, etc. Thus, if we want an apple to apple comparison, we should compare Latin America with other places that have similar ratings in all other aspects or even run a regression model to control for other variables.


2. When we look at relationship between happiness and gene, it would be helpful if we can have the actual data regarding how much "happy allele" different groups of people get so that we can have a better understanding of how this "happy allele" affects happiness.

<a id='happiness_reference'></a>

## Reference

(1) https://www.huffingtonpost.com/daniel-cubias/why-are-latinos-more-like_b_9012348.html

(2) http://thehill.com/opinion/international/367204-venezuelas-not-so-happy-new-year

(3) https://www.cnn.com/2016/10/04/world/haiti-disasters/index.html

(4) https://www.independent.co.uk/life-style/health-and-families/health-news/the-secret-of-happiness-family-friends-and-your-environment-2053053.html

(5) https://www.washingtonpost.com/national/religion/religion-is-a-sure-route-to-true-happiness/2014/01/23/f6522120-8452-11e3-bbe5-6a2a3141e3a9_story.html?noredirect=on&utm_term=.696011d8c9e5

(6) https://www.sciencedaily.com/releases/2008/03/080304103308.htm