# World Happiness Index

## preparing dataset

In [None]:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import seaborn as sns

In [None]:
df2015 = pd.read_csv('2015.csv')
df2016 = pd.read_csv('2016.csv')
df2017 = pd.read_csv('2017.csv')
df2018 = pd.read_csv('2018.csv')
df2019 = pd.read_csv('2019.csv')

df2015.drop(['Standard Error', 'Dystopia Residual'],axis=1,inplace=True)
df2015.rename(columns = {'Economy (GDP per Capita)':'GDP_per_capita', 'Happiness Rank':'Happiness_Rank', 'Happiness Score':'Happiness_Score', 'Family':'Social_Support', 'Health (Life Expectancy)':'Life_Expectancy', 'Trust (Government Corruption)':'Corruption'}, inplace = True)
#get all regions and proper column order for later on
country_region = df2015[['Country', 'Region']].copy()
cols = df2015.columns.tolist()

df2016.drop(['Lower Confidence Interval', 'Upper Confidence Interval', 'Dystopia Residual'],axis=1,inplace=True)
df2016.rename(columns = {'Economy (GDP per Capita)':'GDP_per_capita', 'Happiness Rank':'Happiness_Rank', 'Happiness Score':'Happiness_Score', 'Family':'Social_Support', 'Health (Life Expectancy)':'Life_Expectancy', 'Trust (Government Corruption)':'Corruption' }, inplace = True)

df2017.drop(['Whisker.high', 'Whisker.low', 'Dystopia.Residual'],axis=1,inplace=True)
df2017.rename(columns = {'Happiness.Rank':'Happiness_Rank', 'Happiness.Score':'Happiness_Score', 'Economy..GDP.per.Capita.':'GDP_per_capita', 'Family': 'Social_Support', 'Health..Life.Expectancy.': 'Life_Expectancy', 'Trust..Government.Corruption.': 'Corruption'},inplace=True)
df2017 = df2017.merge(country_region, on='Country') #add the missing region for year 2019
df2017 = df2017[cols] #sort columns

df2018.rename(columns = {'Overall rank':'Happiness_Rank', 'Country or region':'Country', 'GDP per capita':'GDP_per_capita', 'Healthy life expectancy':'Life_Expectancy', 'Perceptions of corruption':'Corruption', 'Social support':'Social_Support', 'Freedom to make life choices':'Freedom', 'Score':'Happiness_Score'},inplace=True)
df2018 = df2018.merge(country_region, on='Country') #add the missing region for year 2019
df2018 = df2018[cols] #sort columns

df2019.rename(columns = {'Overall rank':'Happiness_Rank', 'Country or region':'Country', 'GDP per capita':'GDP_per_capita', 'Healthy life expectancy':'Life_Expectancy', 'Perceptions of corruption':'Corruption',  'Social support':'Social_Support', 'Freedom to make life choices':'Freedom', 'Score':'Happiness_Score'},inplace=True)
df2019 = df2019.merge(country_region, on='Country') #add the missing region for year 2019
df2019 = df2019[cols] #sort columns

now all into one df

In [None]:
df2015["year"] = str(2015)
df2016["year"] = str(2016)
df2017["year"] = str(2017)
df2018["year"] = str(2018)
df2019["year"] = str(2019)
df_all = df2015.append([df2016,df2017,df2018,df2019])

# First lets look at correlation between happiness & the specific attributes

Considered in dataset contributing to happiness, values are in relation to Dystopia, the most unhappiest place on earth
- GDP per Capita
- Family
- Life Expectancy
- Freedom
- Generosity
- Trust Government Corruption


### Correlation: influence of seperate factors regarding Happiness Rank
We are using a heatmap to show the correlation.

### Year 2015

In [None]:
corr2015 = df2015.corr()
corr2015

In [None]:
sns.heatmap(corr2015, annot=True, linewidths=.5, square = True, cmap = 'Blues_r');

In [None]:
mask = np.zeros_like(corr2015)

mask[np.triu_indices_from(mask)] = True

with sns.axes_style("white"):

    f, ax = plt.subplots(figsize=(7, 5))

    ax = sns.heatmap(corr2015, mask = mask, annot=True, linewidths=.5, square = True, cmap = 'Blues_r')

Conclusion: We can see that Happiness is highly dependent on GDP as well as Social Support through family and Healthy life expectancy

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(15,10));

df2015.plot.scatter(ax=axes[0,0], x = 'GDP_per_capita', y = 'Happiness_Score');
df2015.plot.scatter(ax=axes[0,1], x = 'Social_Support', y = 'Happiness_Score');
df2015.plot.scatter(ax=axes[0,2], x = 'Life_Expectancy', y = 'Happiness_Score');
df2015.plot.scatter(ax=axes[1,0], x = 'Freedom', y = 'Happiness_Score');
df2015.plot.scatter(ax=axes[1,1], x = 'Corruption', y = 'Happiness_Score');
df2015.plot.scatter(ax=axes[1,2], x = 'Generosity', y = 'Happiness_Score');


This also shows, how gdp and social support attribute the most to the happiness factor while generosity and preception of corruption show least correlation to felt happiness.
Let's use Scatterplot series to look at correlation in the year 2015

In [None]:
sns.relplot(x="Freedom", y="Happiness_Score", hue="Happiness_Rank", data=df2015);

### Change in correlation from 2015-2019 <br>
Does correlation change over the years and are other factors more important than others in 2019? <br>
Correlation in general:

In [None]:
corr2019 = df2019.corr()
corr2015 = df2015.corr()

In [None]:
mask = np.zeros_like(corr2015)

mask[np.triu_indices_from(mask)] = True

with sns.axes_style("white"):

    f, ax = plt.subplots()

    ax = sns.heatmap(corr2015, mask = mask, annot=True, linewidths=.5, square = True, cmap = 'Blues_r')

In [None]:
mask = np.zeros_like(corr2019)

mask[np.triu_indices_from(mask)] = True

with sns.axes_style("white"):

    f, ax = plt.subplots()

    ax = sns.heatmap(corr2019, mask = mask, annot=True, linewidths=.5, square = True, cmap = 'Blues_r')

And in more detail specific development of attributes over time from year 2015 to 2019

In [None]:
sns.relplot(data=df_all, x="GDP_per_capita", y="Happiness_Score", hue="Happiness_Rank", col="year");
sns.relplot(data=df_all, x="Social_Support", y="Happiness_Score", hue="Happiness_Rank", col="year");
sns.relplot(data=df_all, x="Life_Expectancy", y="Happiness_Score", hue="Happiness_Rank", col="year");
sns.relplot(data=df_all, x="Freedom", y="Happiness_Score", hue="Happiness_Rank", col="year");
sns.relplot(data=df_all, x="Corruption", y="Happiness_Score", hue="Happiness_Rank", col="year");
sns.relplot(data=df_all, x="Generosity", y="Happiness_Score", hue="Happiness_Rank", col="year");

Conclusion: Therefore we can see that correlation in 2015 and 2019 is still depending on the same attributes, mainly GDP, social support aswell as life expectancy and less on corruption or generosity.

## Mean values of top10 countries, bottom10 countries and regions in year 2015

In [None]:
df2015_mean_happiness = df2015.copy()
location_mean_byregion = df2015.groupby(['Region']).mean()
mean_happiness_scores = location_mean_byregion['Happiness_Score'].to_dict()
df2015_mean_happiness['Mean_Happiness_Score'] = df2015_mean_happiness['Region'].map(mean_happiness_scores)

Let's look at distribution of Happiness per region. Are there countries with only good ranks or bad ranks? Are there regions with broad spectrums from very happy to very unhappy?

In [None]:
fig = plt.gcf()
fig.set_size_inches(30, 8)

sns.violinplot(x="Region", y="Happiness_Rank", data=df2015)

plt.show()

Conclustion: Australia and New Zealand obviously is only 2 countries, but we can see that Western Europe is mainly happy, in the middle east the band is pretty wide and in Sub Saharan Africa aswell as Southern Asia Happiness is mainly in the bottom part of the rank.

todo: consideration to explore, would be if there are countries diverging from the mean in their respective area?

### Top 10 Countries from 2015 next to each other regarding different attributes using stacked bar charts <br>
Therefore we focus now mainly on year 2015. And evaluate the seperate attributes. Are there differences in the top 10 most happiest countries in the world? Does one country derive its happiness more from generosity compared to others?

In [None]:
df2015_top10 = df2015.head(10)
df2015_top10.drop(['Happiness_Rank'],axis=1,inplace=True)

In [None]:
df2015_top10

### Development of top10 countries from 2015 over time from 2015 - 2019 regarding rank and factors

In [None]:
#line chart time series small multiples

## Maps
### global
Let's get an overview by looking at a world map to visualize the distribution of happiness around the world and the development and changes from 2015-2019.

In [None]:
import plotly.express as px

fig = px.choropleth(df_all, locationmode = 'country names', locations="Country",
                    color="Happiness_Rank",
                    hover_name="Country",
                    animation_frame="year",
                    color_continuous_scale=px.colors.sequential.Plasma)
fig.show()

### Regional Averages for 2015

In [None]:
fig = px.choropleth(df2015_mean_happiness, locationmode = 'country names', locations="Country",
                    color="Mean_Happiness_Score",
                    hover_name="Region",
                    animation_frame="year",
                    color_continuous_scale=px.colors.sequential.Plasma)
fig.show()

Based on the analysis of just the mean values for each region, the expected result of an overall happy North America, Australia and Western Europe can be seen clearly. Additionally, the Latin America and Carribean region is happier on average than all of Asia and Africa. Sub-Saharan Africa is on average the unhappiest place, followed closely by Southern Asia, where India, Pakistan and Afghanistan are.

# Regional exploration

In [None]:
set(df_all.Region)

In [None]:
df_all_oceania = df_all[df_all.Region == 'Australia and New Zealand']
df_all_EuropeCentralEast = df_all[df_all.Region == 'Central and Eastern Europe']
df_all_EuropeWestern = df_all[df_all.Region == 'Australia and New Zealand']
df_all_AmericaSouth = df_all[df_all.Region == 'Latin America and Caribbean']
df_all_AmericaNorth = df_all[df_all.Region == 'North America']
df_all_AfricaMiddleEastNorth = df_all[df_all.Region == 'Middle East and Northern Africa']
df_all_AfricaSubSahara = df_all[df_all.Region == 'Sub-Saharan Africa']
df_all_AsiaEast = df_all[df_all.Region == 'Eastern Asia']
df_all_AsiaSouthEast = df_all[df_all.Region == 'Southeastern Asia']
df_all_AsiaSouth = df_all[df_all.Region == 'Southern Asia']

#### Happiness ratio per region aswell as mean value per attribute per region

In [None]:
# todo: new df with values per region, also deviation from countries?

In [None]:
df_all_oceania.Freedom.mean()

In [None]:
df_all_EuropeCentralEast.Freedom.mean()

Happiness ratio per region in year 2015

In [None]:
region_lists=list(df2015['Region'].unique())
region_happiness_ratio=[]
for each in region_lists:
    region=df2015[df2015['Region']==each]
    region_happiness_rate=sum(region['Happiness_Score']/len(region))
    region_happiness_ratio.append(region_happiness_rate)
    
data=pd.DataFrame({'region':region_lists,'region_happiness_ratio':region_happiness_ratio})
new_index=(data['region_happiness_ratio'].sort_values(ascending=False)).index.values
sorted_data = data.reindex(new_index)

sorted_data

In [None]:
plt.figure(figsize=(8,5))
sns.barplot(x=sorted_data['region'], y=sorted_data['region_happiness_ratio'],palette=sns.cubehelix_palette(len(sorted_data['region'])))
plt.xticks(rotation= 90)
plt.xlabel('Region')
plt.ylabel('Region Happiness Ratio')
plt.title('Happiness rate for regions')
plt.show()

### Map visualization development of happiness rank from 2015-2019 per country

In [None]:
fig = px.choropleth(df_all, locationmode = 'country names', locations="Country",
                    color="Happiness_Rank",
                    hover_name="Country",
                    animation_frame="year",
                    scope="north america",
                    color_continuous_scale=px.colors.sequential.Plasma)
fig.show()

In [None]:
fig = px.choropleth(df_all, locationmode = 'country names', locations="Country",
                    color="Happiness_Rank",
                    hover_name="Country",
                    animation_frame="year",
                    scope="europe",
                    color_continuous_scale=px.colors.sequential.Plasma)
fig.show()

In [None]:
fig = px.choropleth(df_all, locationmode = 'country names', locations="Country",
                    color="Happiness_Rank",
                    hover_name="Country",
                    animation_frame="year",
                    scope="asia",
                    color_continuous_scale=px.colors.sequential.Plasma)
fig.show()

In [None]:
fig = px.choropleth(df_all, locationmode = 'country names', locations="Country",
                    color="Happiness_Rank",
                    hover_name="Country",
                    animation_frame="year",
                    scope="africa",
                    color_continuous_scale=px.colors.sequential.Plasma)
fig.show()

In [None]:
fig = px.choropleth(df_all, locationmode = 'country names', locations="Country",
                    color="Happiness_Rank",
                    hover_name="Country",
                    animation_frame="year",
                    scope="south america",
                    color_continuous_scale=px.colors.sequential.Plasma)
fig.show()

# Top 10 changes in happiness rank from 2015 to 2019, are there any reasons for this?

Which country made the biggest gain or loss in Happiness Rank from 2015 to 2019.

In [None]:
df2015_ranks = df2015[['Country','Happiness_Rank']]
df2015_ranks.rename(columns = {'Happiness_Rank':'2015'}, inplace = True)

df_ranks_change = df2019[['Country','Happiness_Rank']]
df_ranks_change.rename(columns = {'Happiness_Rank':'2019'}, inplace = True)

df_ranks_change = df_ranks_change.merge(df2015_ranks, on='Country')
df_ranks_change['change']=df_ranks_change['2015']-df_ranks_change['2019']
df_ranks_change

In [None]:
change_top10 = df_ranks_change.sort_values('change', ascending = False).head(10)
change_bottom10 = df_ranks_change.sort_values('change', ascending = True).head(10)

## Best performers: who jumped the most?

In [None]:
change_top10

### Let's look at change in detail from 2015-2019 for Benin, and Honduras
#### Benin:

In [None]:
benin = df_all[df_all.Country == 'Benin']
benin

In [None]:
plt.plot(benin.year, benin.Happiness_Rank);
plt.title('Development Happiness in Benin')
plt.xlabel('year')
plt.ylabel('Happiness Rank')
plt.show();

note: it looks like it is performing worse from 2015, but it is actually climbing the ranks. Maybe choose a different line graph/visualization?

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(11,5));

benin.plot(ax=axes[0,0], x = 'year', y = 'GDP_per_capita');
benin.plot(ax=axes[0,1], x = 'year', y = 'Social_Support');
benin.plot(ax=axes[0,2], x = 'year', y = 'Life_Expectancy');
benin.plot(ax=axes[1,0], x = 'year', y = 'Freedom');
benin.plot(ax=axes[1,1], x = 'year', y = 'Corruption');
benin.plot(ax=axes[1,2], x = 'year', y = 'Generosity');

## Worst performers: who slipped the most?

In [None]:
change_bottom10

There are quite heavy changes regarding happiness rank, performing the wors being Venezuela, with the country being in a long crisis.

### Let's look at change in detail from 2015-2019 for Venezuela, and Ukraine
#### Venezuela: country crisis

In [None]:
venezuela = df_all[df_all.Country == 'Venezuela']
venezuela

In [None]:
plt.plot(venezuela.year, venezuela.Happiness_Rank);
plt.title('Development Happiness in Venezuela')
plt.xlabel('year')
plt.ylabel('Happiness Rank')
plt.show();

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(11,5));

venezuela.plot(ax=axes[0,0], x = 'year', y = 'GDP_per_capita');
venezuela.plot(ax=axes[0,1], x = 'year', y = 'Social_Support');
venezuela.plot(ax=axes[0,2], x = 'year', y = 'Life_Expectancy');
venezuela.plot(ax=axes[1,0], x = 'year', y = 'Freedom');
venezuela.plot(ax=axes[1,1], x = 'year', y = 'Corruption');
venezuela.plot(ax=axes[1,2], x = 'year', y = 'Generosity');

#### Ukraine: war

In [None]:
ukraine = df_all[df_all.Country == 'Ukraine']
ukraine

In [None]:
plt.plot(ukraine.year, ukraine.Happiness_Rank);
plt.title('Development Happiness in Ukraine')
plt.xlabel('year')
plt.ylabel('Happiness Rank')
plt.show();

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(11,5));

ukraine.plot(ax=axes[0,0], x = 'year', y = 'GDP_per_capita');
ukraine.plot(ax=axes[0,1], x = 'year', y = 'Social_Support');
ukraine.plot(ax=axes[0,2], x = 'year', y = 'Life_Expectancy');
ukraine.plot(ax=axes[1,0], x = 'year', y = 'Freedom');
ukraine.plot(ax=axes[1,1], x = 'year', y = 'Corruption');
ukraine.plot(ax=axes[1,2], x = 'year', y = 'Generosity');

# Development of Switzerland from 2015-2019

In [None]:
switzerland = df_all[df_all.Country == 'Switzerland']
switzerland = switzerland.drop(columns=['Country', 'Region'])
switzerland

In [None]:
#fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(11,5));
plt.figure()
plt.plot(switzerland.year, switzerland.GDP_per_capita, label='GDP per capita')
plt.plot(switzerland.year, switzerland.Social_Support, label='Social Support')
plt.plot(switzerland.year, switzerland.Life_Expectancy, label='Life Expectancy')
plt.plot(switzerland.year, switzerland.Freedom, label='Freedom')
plt.plot(switzerland.year, switzerland.Corruption, label='Corruption')
plt.plot(switzerland.year, switzerland.Generosity, label='Generosity')
plt.xlabel('Year')
plt.ylabel('Importance factor')
plt.legend(bbox_to_anchor=(1.4,1), loc='upper right')
plt.title('Development of Switzerland importance factors for Happiness')
plt.show()



### Analysis of development in Switzerland from 2015-2019

The visualizations of the different factors for the happiness calculations in Switzerland between 2015 and 2019 show an estimate on the importance of these different factors for each year.  

General Analysis:

The two most important factors are 'GDP per capita' and 'Social Support' which makes sense for a well functioning country with a lot of socialitarian structures. 
Corruption and Generosity are a pretty low importance factor, since most of the population doesn't feel to either be suppressed by the government or has to rely on others to survive. 

Life Expectancy and Freedom are in between of these 4 factors and are somewhat important but are overall not threatening for most people and therefore not as important as Social Support or money itself.


Yearly Development:
The most unstable factor is Social Support which proves to be an important factor and rose over the years but also experienced a downfall from 2015 to 2016 (Google why)

The stable income for Swiss is important and has been overall stable. Money will probably always be of importance for Switzerland and its Population.

Life Expectancy is rising overall. (Why is health getting more important?)

Freedom, Generosity and Corruption are almost stable and only fall off slowly. The analysis for this is that these factors didn't prove to have any negative influences over the years and therefore fall off slowly.