In [2]:
import pandas as pd

In [3]:
PATH_HAPPY = "Data/world-happiness-report-2021.csv"
PATH_DRUNK = "Data/alcohol-consumption.csv"

# Data Visualisation
### Milestone 1
---

## Datasets
Our work focuses mainly on 2 datasets:
- Dataset 1: Results of the [World Happiness Report](https://en.wikipedia.org/wiki/World_Happiness_Report) research initiative conducted by the Gallup World Poll (GWP) in about 150 countries.
- Dataset 2: Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age) in 217 historically identifiable countries and 49 aggregates/areas (e.g. EU). Recent interval data on certain yearly periods depending on the country.

A third dataset is also used to integrate the results from 2022 World Happiness Report.

### Dataset 1: World Happiness Report

#### About
The dataset contains yearly happiness scores by country (= item) in the period 2006 through 2021.
To get the scores respondents are asked to rate their own lives on a 0 to 10 scale, i.e. it uses the [Cantril Scale](https://news.gallup.com/poll/122453/understanding-gallup-uses-cantril-scale.aspx).

#### Quality
The survey is conduceted on a regular semi-annual to biennial frequency depending on the country.
The sample size for each country is on average 1,000 people, but can be 2000 for bigger countries like China or Russia. Samples are weighted to correct for selection bias, nonresponse and other issues.
[[1](https://www.gallup.com/178667/gallup-world-poll-work.aspx)]

There are countries that do not have a happiness score for some years. In that case we decided to do an interpolation such that all countries have a happiness score value for all years going from 2006 to 2021. In addition the 2022 data is used from the Kaggle webpage [World Happiness Report up to 2022](https://www.kaggle.com/datasets/mathurinache/world-happiness-report?select=2022.csv)

#### Attributes
Other than the happiness score, the dataset also contains the following attributes:
- regional indicator of the country,
- standard error of the score,
- lower and upper whiskers,
- 6 indicators (columns) that may contribute to a happier life:
        - Econimic production 
        - Social support
        - Life expectency
        - Freedom
        - Absence of corruption
        - Generosity
- "Ladder score in Dystopia",
- "Dystopia + residual",
- attributes named "Explained by:" corresponding to each of the 6 indicators (in statistics terms, this is likely the R<sup>2</sup> coefficient of determination for the variable).

### Dataset 2: Alcohol Consumption per Capita
#### About
The dataset contains yearly alcohol consumption per capita for each country of the world and also regions of the world (ex: Africa Eastern and Southern, Arab World) and some aggregates (Low income, Upper middle income, Least developed countries: UN classification).

The value of consumed alcohol corresponds to the liters of pure alcohol consumed per capita for people that are 15 years of age or older over a calendar year, it is adjusted for tourist consumption. 

#### Quality
The data covers years going from 2000 to 2018 (2000, 2005, 2010, 2015 and 2018), so for the majority of countries there are 5 entries, for the countries that do not have the 5 entries (Like for example Afghanistan) we would like for our other dataset perform an interpolation to complete the missing values.

In [8]:
df_happy = pd.read_csv(PATH_HAPPY)
df_happy.head()

Unnamed: 0,Country name,Regional indicator,Ladder score,Standard error of ladder score,upperwhisker,lowerwhisker,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Ladder score in Dystopia,Explained by: Log GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Dystopia + residual
0,Finland,Western Europe,7.842,0.032,7.904,7.78,10.775,0.954,72.0,0.949,-0.098,0.186,2.43,1.446,1.106,0.741,0.691,0.124,0.481,3.253
1,Denmark,Western Europe,7.62,0.035,7.687,7.552,10.933,0.954,72.7,0.946,0.03,0.179,2.43,1.502,1.108,0.763,0.686,0.208,0.485,2.868
2,Switzerland,Western Europe,7.571,0.036,7.643,7.5,11.117,0.942,74.4,0.919,0.025,0.292,2.43,1.566,1.079,0.816,0.653,0.204,0.413,2.839
3,Iceland,Western Europe,7.554,0.059,7.67,7.438,10.878,0.983,73.0,0.955,0.16,0.673,2.43,1.482,1.172,0.772,0.698,0.293,0.17,2.967
4,Netherlands,Western Europe,7.464,0.027,7.518,7.41,10.932,0.942,72.4,0.913,0.175,0.338,2.43,1.501,1.079,0.753,0.647,0.302,0.384,2.798


In [9]:
df_drunk = pd.read_csv(PATH_DRUNK)
df_drunk.head()

Unnamed: 0,Entity,Code,Year,"Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)"
0,Afghanistan,AFG,2010,0.21
1,Afghanistan,AFG,2015,0.21
2,Afghanistan,AFG,2018,0.21
3,Africa Eastern and Southern,,2000,5.014051
4,Africa Eastern and Southern,,2005,4.856588


## Problematic

### Overview

Money does not make you happy… or does it? The most popular view of the relation between happiness* and life quality presents it as [logarithmic](https://www.cnbc.com/2015/12/14/money-can-buy-happiness-but-only-to-a-point.html). But what if we look at a country level? Happiness certainly is correlated with GDP/inhabitant but some very surprising cases lie out of bonds. What are those countries, and what other factors play a role? Is there any interesting correlation that could “explain” happy but not so rich countries, or sad rich states?

We would indeed expect a positive tight between a country's happiness ("life ladder"), social support, health, freedom of choice and generosity. The other way around, corruption perception is expected to decrease happiness. Further analysis of this topic will be addressed in the next section.

As a second part, we will investigate a less serious (at least it should make **us** happy) relation. The American association of psyhology has published various studies showing alcohol works as a social [lubricant](https://www.psychologicalscience.org/news/alcohol-is-a-social-lubricant-study-confirms.html). Alcohol is supposed to minimize negative emotions. We will therefore try to show such relations at country level : do some countries tend to be happier than others, all other parameters equal, just because they drink alcohol? Or at least, is social support higher due to alcohol consumtion? If we would of course not show any direct implication, this could be of nice interest for future reasearch. 

*happiness = positive affect, not blue and low stress

### Target Audience

- Individuals interested in insights on the well-being and consumption habits worldwide
- Casual alcohol consumers eager to compare drinking habits in different countries
- Young professionals looking to find the country that will bring the most joy to their future family
- Politicians who would like to strategize on how to improve their citizens' lives

### Motivation

The main motivation of the project is to tell the visitor a story about happiness across world nations. The project will present every country’s perceived position on the “life-ladder” and make a connection with relevant indicators that might help understand the variation among states. The project’s webpage will also briefly describe interesting outlier cases while mentioning cultural differences and connecting to world events.

Finally, another intended purpose of the project is entertainement. Although it is possible that a correlation may exist between happiness and alcohol consumption, our motivation is not to demonstrate a causal relation. Therefore, we will NOT attempt to show that alcohol is capable of boosting happiness levels in the long term.

### Visualization
For the sake of practicality and to allow for more scalability, several medium complexity level visualizations are proposed, as opposed to a large complex visualization.

By having a central <b>timeline slider</b> on the webpage, the user can query entries from 2006 to 2022. As discussed, missing values are interpolated. With a certain selected year, the following are the planned visualizations:
- <b>Distribution Plot for Happiness /  Alcohol Consumption.</b>
The objective is to plot the skew of the Happiness Score and alcohol consumption distributions to highlight their uniformity or equality and discuss it. When hovering over each “bin” (i.e. vertical bar), the countries in that bin would become visible with the respective values.
- <b>Change of Ranking over Time (Bar Chart).</b>
Using a horizontal bar chart, the objective is to visualize the evolution of the ranking of the happiest countries.
- <b>Correlation: Scatterplot.</b>
The objective is to create an interactive 2D scatterplot, where the user can select the X-axis attribute (default: alcohol consumtion), while the Y-axis attribute is fixed to Happiness Score. Each point in the scatterplot would represent a country. The points can be color-coded based on the georegion, or the actual flag of the country. When a country is selected, the user would see the country’s connected scatterplot positions from 2006 to 2022.
- <b>Magnitude: [Radar Chart](https://en.wikipedia.org/wiki/Radar_chart).</b>
The objective is to create an interactive radar chart with the Happiness Score attribute at the top corner while others are the remaining corners.
- <b>Map.</b>
A world map color-coded based on the value of the attribute selected in the scatter-plot.

Note that the Scatterplot, the Radar Chart and the Map are connected: hovering/tapping on a country on the scatterplot would also update the radar chart line and would highlight the country on the map, and vice versa.

## Exploratory data analysis

### Preprocessing
The two datasets will require preprocessing for the following reasons:
- Dataset 1 is missing 2022 data which is available. Solution: merge with 2022 data.
- Missing data across years of datasets 1 and 2. Solution: interpolate data (e.g. linear interpolation), explain in the final report.
- Some countries are in dataset 1 but not in dataset 2. Solution: drop data where visualizations require both values.
- Some countries are in dataset 2 but not in dataset 1. Solution: drop data where visualizations require both values.
- Some country names are inconsistent between datasets. Solution: correct or drop.
- Dataset 2 contains aggregates which may not be desired. Solution: 2 versions, keep or drop certain items.

### Basic Insights
<b>Country with the least alcohol consumption</b>: Kuwait (2015).<br>
<b>Country with the most alcohol consumption</b>: Seychelles (2018).<br>
<b>Country with the highest happiness score</b>: Denmark (2015).<br>
<b>Country with the lowest happiness score</b>: Afganistan (2019).

## Related work

TODO : 
- What others have already done with the data?
- Why is your approach original?
- What source of inspiration do you take? Visualizations that you found on other
websites or magazines (might be unrelated to your data).
- In case you are using a dataset that you have already explored in another
context (ML or ADA course, semester project...), you are required to share the
report of that work to outline the differences with the submission for this class