In [1]:
import pandas as pd

In [2]:
PATH_HAPPY = "../Data/world-happiness-report.csv"
PATH_HAPPY_2021 = "../Data/world-happiness-report-2021.csv"
PATH_ALCOHOL = "../Data/alcohol-consumption.csv"

# Data Visualisation (COM-480) - 2022
## <i>If Not Alcohol, What Makes Us Happy?</i>
### Milestone 1
---

<i>NB : Please note that terms "happiness score", "life ladder" and "cantril scale rating" all refer to the same metric throughout the report, as well as in the datasets and the cited articles.</i>

## Datasets
Our work focuses mainly on 2 datasets:
- [Dataset 1](https://www.kaggle.com/datasets/ajaypalsinghlo/world-happiness-report-2021?resource=download&select=world-happiness-report.csv): Results of the [World Happiness Report](https://en.wikipedia.org/wiki/World_Happiness_Report) research initiative conducted by the Gallup World Poll (GWP) in about 150 countries, for the period from 2005 to 2020.
- [Dataset 2](https://www.kaggle.com/datasets/sveneschlbeck/alcohol-consumption-per-capita-year-and-country): Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age) in 217 historically identifiable countries and 49 aggregates/areas (e.g. EU). The dataset contains interval data for certain years depending on the country.

A [third dataset](https://www.kaggle.com/datasets/ajaypalsinghlo/world-happiness-report-2021?resource=download&select=world-happiness-report-2021.csv) is also used to integrate the 2021 results of the World Happiness Report.

### Dataset 1: World Happiness Report

#### About
The dataset contains yearly happiness scores by country (= item) in the period 2005 through 2021.
To get the scores respondents are asked to rate their own lives on a 0 to 10 scale, i.e. the survey uses the [Cantril Scale](https://news.gallup.com/poll/122453/understanding-gallup-uses-cantril-scale.aspx).

#### Quality
The survey is conduceted on a regular semi-annual to biennial frequency depending on the country.
The sample size for each country is on average 1'000 people, but can be 2'000 for bigger countries like China or Russia. Samples are weighted to correct for selection bias, nonresponse and other issues
(Source: https://www.gallup.com/178667/gallup-world-poll-work.aspx).

There are countries that do not have a happiness score for some years. In that case we decided to do an interpolation such that all countries have a happiness score value for all years going from 2005 to 2020. In addition the 2021 data is used.

#### Attributes
Other than the happiness score, the dataset also contains the following attributes:
- regional indicator of the country,
- standard error of the score,
- lower and upper whiskers,
- 6 indicators (columns) that may contribute to a happier life:
        - Econimic production 
        - Social support
        - Life expectency
        - Freedom
        - Absence of corruption
        - Generosity
- attributes named `Explained by: [attribute]` for each of the 6 indicators mentioned above,
- "Ladder score in Dystopia",
- "Dystopia + residual".

### Dataset 2: Alcohol Consumption per Capita
#### About
The dataset contains total annual alcohol consumption per capita for each country of the world, as well as regions (ex: Africa Eastern and Southern, Arab World) and some aggregates (Low income, Upper middle income, Least developed countries: UN classification).

The value of consumed alcohol corresponds to the liters of pure alcohol consumed per capita for people that are 15 years of age or older over a calendar year, and the metric is adjusted for tourist consumption. 

#### Quality
The data covers years going from 2000 to 2018 (2000, 2005, 2010, 2015 and 2018). The majority of countries have 5 entries. For countries that have less entries (e.g. Afghanistan) an interpolation will be performed to complete the missing values.

In [11]:
df_happy = pd.read_csv(PATH_HAPPY)
display(df_happy.head(14))

Unnamed: 0,Country name,year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect
0,Afghanistan,2008,3.724,7.37,0.451,50.8,0.718,0.168,0.882,0.518,0.258
1,Afghanistan,2009,4.402,7.54,0.552,51.2,0.679,0.19,0.85,0.584,0.237
2,Afghanistan,2010,4.758,7.647,0.539,51.6,0.6,0.121,0.707,0.618,0.275
3,Afghanistan,2011,3.832,7.62,0.521,51.92,0.496,0.162,0.731,0.611,0.267
4,Afghanistan,2012,3.783,7.705,0.521,52.24,0.531,0.236,0.776,0.71,0.268
5,Afghanistan,2013,3.572,7.725,0.484,52.56,0.578,0.061,0.823,0.621,0.273
6,Afghanistan,2014,3.131,7.718,0.526,52.88,0.509,0.104,0.871,0.532,0.375
7,Afghanistan,2015,3.983,7.702,0.529,53.2,0.389,0.08,0.881,0.554,0.339
8,Afghanistan,2016,4.22,7.697,0.559,53.0,0.523,0.042,0.793,0.565,0.348
9,Afghanistan,2017,2.662,7.697,0.491,52.8,0.427,-0.121,0.954,0.496,0.371


In [12]:
df_happy_2021= pd.read_csv(PATH_HAPPY_2021)
df_happy_2021.head()

Unnamed: 0,Country name,Regional indicator,Ladder score,Standard error of ladder score,upperwhisker,lowerwhisker,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Ladder score in Dystopia,Explained by: Log GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Dystopia + residual
0,Finland,Western Europe,7.842,0.032,7.904,7.78,10.775,0.954,72.0,0.949,-0.098,0.186,2.43,1.446,1.106,0.741,0.691,0.124,0.481,3.253
1,Denmark,Western Europe,7.62,0.035,7.687,7.552,10.933,0.954,72.7,0.946,0.03,0.179,2.43,1.502,1.108,0.763,0.686,0.208,0.485,2.868
2,Switzerland,Western Europe,7.571,0.036,7.643,7.5,11.117,0.942,74.4,0.919,0.025,0.292,2.43,1.566,1.079,0.816,0.653,0.204,0.413,2.839
3,Iceland,Western Europe,7.554,0.059,7.67,7.438,10.878,0.983,73.0,0.955,0.16,0.673,2.43,1.482,1.172,0.772,0.698,0.293,0.17,2.967
4,Netherlands,Western Europe,7.464,0.027,7.518,7.41,10.932,0.942,72.4,0.913,0.175,0.338,2.43,1.501,1.079,0.753,0.647,0.302,0.384,2.798


In [13]:
df_happy[df_happy["year"] == 2020].sort_values(by = "Life Ladder", ascending = False).head()

Unnamed: 0,Country name,year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect
563,Finland,2020,7.889,10.75,0.962,72.1,0.962,-0.116,0.164,0.744,0.193
731,Iceland,2020,7.575,10.824,0.983,73.0,0.949,0.16,0.644,0.863,0.172
463,Denmark,2020,7.515,10.91,0.947,73.0,0.938,0.052,0.214,0.818,0.227
1661,Switzerland,2020,7.508,11.081,0.946,74.7,0.917,-0.064,0.28,0.769,0.193
1224,Netherlands,2020,7.504,10.901,0.944,72.5,0.935,0.151,0.281,0.784,0.247


In [14]:
df_alcohol = pd.read_csv(PATH_ALCOHOL)
df_alcohol.head()

Unnamed: 0,Entity,Code,Year,"Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)"
0,Afghanistan,AFG,2010,0.21
1,Afghanistan,AFG,2015,0.21
2,Afghanistan,AFG,2018,0.21
3,Africa Eastern and Southern,,2000,5.014051
4,Africa Eastern and Southern,,2005,4.856588


## Problematic

### Overview

Money does not make you happy… or does it? The most popular view of the relation between happiness* and life quality presents it as [logarithmic](https://www.cnbc.com/2015/12/14/money-can-buy-happiness-but-only-to-a-point.html). But what if we look at a country level? Happiness certainly is correlated with GDP/inhabitant but some very surprising cases lie out of bonds. What are those countries, and what other factors play a role? Is there any interesting correlation that could “explain” happy but not so rich countries, or sad rich states?

We would indeed expect a positive tight between a country's happiness ("life ladder"), social support, health, freedom of choice and generosity. The other way around, corruption perception is expected to decrease happiness. Plenty of previous work has been done in terms of regression analysis on this point (see the Related Work section) and we will not reinvent the wheel. For us, the most interesting part here will be to visualize and highlight outliers. Further analysis of this topic will be addressed in the next sections.

As a second part, we will investigate a more amusing relation (the project aims to make the visitors *and* its authors happy). The American association of psyhology has published various studies showing alcohol works as a social [lubricant](https://www.psychologicalscience.org/news/alcohol-is-a-social-lubricant-study-confirms.html). Alcohol is supposed to minimize negative emotions. We will therefore try to show such relations at country level: do some countries tend to be happier than others, all other parameters equal, just because they drink alcohol? Or at least, is social support higher due to alcohol consumtion? If we would of course not show any direct implication, this could be of nice interest for future reasearch. 

*happiness = positive affect, not blue and low stress

### Target Audience

The results of this visualization project should be especially convenient for the following target personas:
- Individuals interested in insights on the well-being and consumption habits worldwide
- Casual alcohol consumers eager to compare drinking habits in different countries
- Young professionals looking to find the country that will bring the most joy to their future family
- Politicians who would like to strategize on how to improve their citizens' lives

### Motivation

The main motivation of the project is to tell the visitor a story about happiness across world nations and demonstrate that one's cash balance is not the only source of happiness. There are indeed other leverages to improve citizens' condition. The project will present every country’s perceived position on the “life-ladder” and make a connection with relevant indicators that might help understand the variation among states. The project’s webpage will also briefly describe interesting outlier cases while mentioning cultural differences and connecting to world events. 

Finally, another intended purpose of the project is entertainement. Although it is possible that a correlation may exist between happiness and alcohol consumption, our motivation is not to demonstrate a causal relation. Therefore, we will NOT attempt to show that alcohol is capable of boosting happiness levels in the long term.

### Visualizations
For the sake of practicality and to allow for more scalability, several medium complexity level visualizations are proposed, as opposed to a large complex visualization.

By displaying a central <b>timeline slider</b> on the webpage, the user can query entries from 2005 to 2021. As discussed, missing values are interpolated. With a certain selected year, the following are the planned visualizations:
- <b>Outlier Map.</b>
A colored world map showing outliers: here, countries whose happiness level is very different from what would have been expected from GDP/capita alone. For example, countries close to their expected happiness level are represented in gray, those far above in green shades and those below in red shades.
- <b>Distribution Plot for Happiness /  Alcohol Consumption.</b>
The objective is to plot the skew of the Happiness Score and alcohol consumption distributions to highlight their uniformity or inequality and discuss it. When hovering over each “bin” (i.e. vertical bar), the countries in that bin would become visible with the respective values.
- <b>Change of Ranking over Time (Bar Chart).</b>
Using a horizontal bar chart, the objective is to visualize the evolution of the ranking of the happiest countries.
- <b>Correlation: Scatterplot.</b>
The objective is to create an interactive 2D scatterplot, where the user can select the X-axis attribute (default: alcohol consumtion), while the Y-axis attribute is fixed to Happiness Score. Each point in the scatterplot would represent a country. The points can be color-coded based on the georegion, or the actual flag of the country. When a country is selected, the user would see the country’s connected scatterplot positions from 2005 to 2021.
- <b>Magnitude: [Radar Chart](https://en.wikipedia.org/wiki/Radar_chart).</b>
The objective is to create an interactive radar chart with the Happiness Score attribute at the top corner while others are the remaining corners.
- <b> Correlation Map.</b>
A world map color-coded based on the value of the attribute selected in the scatter-plot.
- <i>Optional: Fun animation.</i>
An animation which demonstrates the magnitude of the alcohol consumed per capita by country. For example, a bottle that gets filled by country-themed bubbles; their number corresponds to the respective value for that country.

Note that the Scatterplot, the Radar Chart and the Map are connected: hovering/tapping on a country on the scatterplot would also update the radar chart line and would highlight the country on the map, and vice versa.

## Exploratory data analysis

### Preprocessing
The two datasets will require preprocessing for the following reasons:
- Missing data across years of datasets 1 and 2. Solution: interpolate data (e.g. linear interpolation), explain in the final report.
- Dataset 1 is missing 2021 data which is available. Solution: merge with 2021 data.
- Some countries are in dataset 1 but not in dataset 2. Solution: drop data where visualizations require both values.
- Some countries are in dataset 2 but not in dataset 1. Solution: drop data where visualizations require both values.
- Some country names are inconsistent between datasets. Solution: correct or drop.
- Dataset 2 contains aggregates which may not be desired. Solution: 2 versions, keep or drop certain items.

### Basic Insights

In [15]:
df_alcohol.sort_values(by = "Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)", ascending = False).head(1)

Unnamed: 0,Entity,Code,Year,"Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)"
931,Seychelles,SYC,2018,20.5


In [8]:
df_alcohol.sort_values(by = "Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)", ascending = True).head(2)

Unnamed: 0,Entity,Code,Year,"Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)"
962,Somalia,SOM,2000,0.0
554,Kuwait,KWT,2015,0.003


In [9]:
df_happy.sort_values(by = "Life Ladder", ascending = False).head(1)

Unnamed: 0,Country name,year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect
449,Denmark,2005,8.019,10.851,0.972,69.6,0.971,,0.237,0.86,0.154


In [10]:
df_happy.sort_values(by = "Life Ladder", ascending = True).head(1)

Unnamed: 0,Country name,year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect
11,Afghanistan,2019,2.375,7.697,0.42,52.4,0.394,-0.108,0.924,0.351,0.502


<b>Country with the most alcohol consumption</b>: Seychelles (2018).<br>
<b>Country with the lowest non-zero alcohol consumption</b>: Kuwait (2015).<br>
<b>Country with the highest happiness score</b>: Denmark (2005).<br>
<b>Country with the lowest happiness score</b>: Afganistan (2019).

## Related work

At individual level, a first approach regarding happiness and revenue [mentioned above](https://www.cnbc.com/2015/12/14/money-can-buy-happiness-but-only-to-a-point.html) tends to show a logarithmic relation between the two. Interestingly enough, even at country level, the metric used by the World Happiness Report in their [linear regression](https://worldhappiness.report/ed/2020/social-environments-for-world-happiness/) attempts to explain happiness is the **logged** GDP/capita: doing so, **together with the other columns mentioned above**, they achieve a 0.75 adjusted R². In these conditions, every 1% increase in GDP/capita would increase happiness level by 0.3 (as a reminder, this is a 0 to 10 scale).

Interactive scatterplots of happiness and GDP/capita can easily be found in [many publications](https://ourworldindata.org/grapher/gdp-vs-happiness). Here, our originality is simply by including multiple parameters at the same time, together with unexpected ones (i.e. alcohol consumption). 

Of course maps of happiness score per country can be easily found (e.g. on [Wikipedia](https://fr.wikipedia.org/wiki/Rapport_mondial_sur_le_bonheur)) but we believe those are not ideal visualizations as they are too similar to GDP/capita map.

Inspiration for the "Change of Ranking over Time (Bar Chart)" visualization comes from the YouTube channel [RankingCharts](https://www.youtube.com/c/RankingCharts).

The originality of our "Outliers Map" consists of highlighting out-of-the-norm countries on the map. To convince the reader of the purpose of such a visulization, we would simply quote a [Berkley Economic Review article](https://econreview.berkeley.edu/beyond-gdp-economics-and-happiness/#:~:text=According%20to%20regression%20estimates%20of,of%20the%20variance%20in%20happiness.), mentionning the case of Costa Rica. 

> Berkley Economic Review (Staff)
> > Take, for example, Costa Rica. The UN honored them as the 13th happiest country in the world. Yet GDP per capita only explains 14.1% of the nation’s overall happiness score, whereas social support explains substantially more, about 20% of the score. The United States, on the other hand, explains 19% of its happiness score with per capita income, and is ranked 5 spots below Costa Rica. Statistically speaking, Costa Ricans “use” substantially less GDP to generate a level of happiness greater than what Americans generate with far more GDP.

Considering the relation between happiness and alcohol, we haven't found so much research at country level. However, as mentioned above, at individual level Alcohol has been confirmed as a [social lubricant](https://www.psychologicalscience.org/news/alcohol-is-a-social-lubricant-study-confirms.html). A recent movie "[Another Round](https://en.wikipedia.org/wiki/Another_Round_(film)")" turned out to be a huge success with a scenario based on a psychiatry professor's ([Finn Skarderud's](https://en.wikipedia.org/wiki/Finn_Sk%C3%A5rderud)) theory: being at 0.05% blood alcohol content would make humans more created and relaxed.

Even if the theory has been partly misinterpreted, pop culture can also be an interesting accelerator for future research. After all, the 0.75 R² mentioned earlier is certainly a compelling insight, yet it shows that there is room for other explanations in regards to country's happiness (but again, showing causality is not the aim of this project).