In [None]:
import pandas as pd
import geopandas as gpd
import mapclassify
import plotly.express as px
import pycountry
import numpy as np

In [None]:
PATH_HAPPY = "../Data/world-happiness-report.csv"
PATH_HAPPY_2021 = "../Data/world-happiness-report-2021.csv"
PATH_ALCOHOL = "../Data/alcohol-consumption.csv"

# Data Visualisation (COM-480) - 2022
## <i>If Not Alcohol, What Makes Us Happy?</i>
### Milestone 2
---

### Visualizations
For the sake of practicality and to allow for more scalability, several medium complexity level visualizations are proposed, as opposed to a large complex visualization.

By displaying a central <b>timeline slider</b> on the webpage, the user can query entries from 2005 to 2021. As discussed, missing values are interpolated. With a certain selected year, the following are the planned visualizations:
- <b>Outlier Map.</b>
A colored world map showing outliers: here, countries whose happiness level is very different from what would have been expected from GDP/capita alone. For example, countries close to their expected happiness level are represented in gray, those far above in green shades and those below in red shades.
- <b>Distribution Plot for Happiness /  Alcohol Consumption.</b>
The objective is to plot the skew of the Happiness Score and alcohol consumption distributions to highlight their uniformity or inequality and discuss it. When hovering over each “bin” (i.e. vertical bar), the countries in that bin would become visible with the respective values.
- <b>Change of Ranking over Time (Bar Chart).</b>
Using a horizontal bar chart, the objective is to visualize the evolution of the ranking of the happiest countries.
- <b>Correlation: Scatterplot.</b>
The objective is to create an interactive 2D scatterplot, where the user can select the X-axis attribute (default: alcohol consumtion), while the Y-axis attribute is fixed to Happiness Score. Each point in the scatterplot would represent a country. The points can be color-coded based on the georegion, or the actual flag of the country. When a country is selected, the user would see the country’s connected scatterplot positions from 2005 to 2021.
- <b>Magnitude: [Radar Chart](https://en.wikipedia.org/wiki/Radar_chart).</b>
The objective is to create an interactive radar chart with the Happiness Score attribute at the top corner while others are the remaining corners.
- <b> Correlation Map.</b>
A world map color-coded based on the value of the attribute selected in the scatter-plot.
- <i>Optional: Fun animation.</i>
An animation which demonstrates the magnitude of the alcohol consumed per capita by country. For example, a bottle that gets filled by country-themed bubbles; their number corresponds to the respective value for that country.

Note that the Scatterplot, the Radar Chart and the Map are connected: hovering/tapping on a country on the scatterplot would also update the radar chart line and would highlight the country on the map, and vice versa.

## Tools we will use 

We decided to start by a mockup of all the visualisations listed above. These can be either quick & dirty python implementations (if nothing satisfying can be easily done "by hand") or handmade mockups. 

The python implementation process is also usefull for the next step : using pandas, we obtain quite complete dataframes and would not have to do any further processing on the data in JS. 

Below is a list of tools to be used for our real implementation. 

| Visualisation| Lectures and tools we will need |
| :----------- | :----------- |
| Outlier Map     | Lecture 4  : D3.js (Drawing the map) <br /> Lecture 5 : D3.js (Making it interactive) <br /> Lecture 8 : GDAL, Leaflet.js  (Drawing the map)  |
| Distribution Plot for Happiness / Alcohol Consumption.  | Lecture 4 : D3.js (Drawing the plot) <br /> Lecture 5 : D3.js (Making it interactive)       |
| Change of Ranking over Time (Bar Chart). / Alcohol Consumption.  | Lecture 4 : D3.js (Drawing the plot)<br /> Lecture 5 : D3.js (Making it interactive)      |
| Correlation: Scatterplot.  | Lecture 4 : D3.js (Drawing the plot)<br /> Lecture 5 : D3.js (Making it interactive)      |
| Magnitude: [Radar Chart](https://en.wikipedia.org/wiki/Radar_chart).  | Lecture 4 : D3.js (Drawing the plot)<br /> Lecture 5 : D3.js (Making it interactive)       |
| Correlation Map.   | Lecture 4 : D3.js (Drawing the plot)<br /> Lecture 5 : D3.js (Making it interactive)      |
| Fun animation (a 2D bottle that gets filled by country-themed bubbles)      | Two.js framework            |

### Ordered list of pieces to implement

<ins>Minimal viable product: Core</ins>
1. [ ] Outlier Map 
2. [ ] Correlation: Scatterplot
3. [ ] Magnitude: [Radar Chart](https://en.wikipedia.org/wiki/Radar_chart)
4. [ ] Make the 3 plots interactive and connected (Outlier Map, Radar Chart, Scatterplot)

<ins>End of minimal viable product  </ins>
1. [ ] Change of Ranking over Time (Bar Chart)
2. [ ] Distribution Plot for Happiness / Alcohol Consumption.
3. [ ] Correlation Map.
4. [ ] Change of Ranking over Time animated
<br />  

<ins>If we have time </ins>

1. [ ] The Fun animation (2D bottle that gets filled by country-themed bubbles)

### Data preprocessing

In [None]:
df_happy_2021 = pd.read_csv(PATH_HAPPY_2021)
df_happy_2021.head()

In [None]:
# We built here a list of country names that were not standard and their standard conversion. 
names_conversion = {"Czech Republic": "Czechia", 
                    "Taiwan Province of China": "Taiwan, Province of China", 
                    "South Korea": "Korea, Republic of", 
                    "Moldova": "Moldova, Republic of", 
                    "Bolivia": "Bolivia, Plurinational State of", 
                    "Russia" : "Russian Federation", 
                    "Hong Kong S.A.R. of China": "Hong Kong", 
                    "Vietnam": "Viet Nam", 
                    "Congo (Brazzaville)": "Congo", 
                    "Ivory Coast": "Côte d'Ivoire",
                    "Laos": "Lao People's Democratic Republic", 
                    "Venezuela": "Venezuela, Bolivarian Republic of",
                    "Iran": "Iran, Islamic Republic of", 
                    "Palestinian Territories": "Palestine, State of", 
                    "Swaziland": "Eswatini",
                    "Tanzania": "Tanzania, United Republic of"}

In [None]:
df_happy_2021["Country name"] = df_happy_2021["Country name"].replace(names_conversion)

A first step is to ensure name consistency for countries. To do so, we will use the pycountry library and retrieve the iso_2 and iso_3 codes for country names. 

In [None]:
input_countries = [a for a in df_happy_2021["Country name"].to_numpy()]
countries_2 = {}
countries_3 = {}
for country in pycountry.countries:
    countries_2[country.name] = country.alpha_2
    countries_3[country.name] = country.alpha_3
    
codes_2 = [countries_2.get(country, 'Unknown code') for country in input_countries]
codes_3 = [countries_3.get(country, 'Unknown code') for country in input_countries]

print(codes_3)

In [None]:
df_happy_2021["iso_2"] = codes_2
df_happy_2021["iso_3"] = codes_3

In [None]:
df_happy_2021[df_happy_2021['iso_3'] == "Unknown code"]

In [None]:
# some iso standards are not yet define. We enter by hand the temporary ones. 
df_happy_2021.loc[32,"iso_2"] = "XK"
df_happy_2021.loc[32, "iso_3"] = "XKX"
df_happy_2021.loc[73, "iso_3"] = "CTR"
df_happy_2021.loc[73, "iso_2"] = "CT"

### Outlier Map

Let's start by building a table containing the Happiness Ranking and the GDP ranking of a country. From there, we will show how strong the difference between the two of them is. 

In [None]:
df_happy_2021["Happiness ranking"] = df_happy_2021.index.array

In [None]:
df_2021_indexed = df_happy_2021.sort_values(by = "Logged GDP per capita", ascending = False).reset_index()
df_2021_indexed["GDP Ranking"] = df_2021_indexed.index.array

In [None]:
df_2021_indexed.head()

In [None]:
df_2021_outliers = df_2021_indexed[["Country name", "iso_2", "iso_3", "Happiness ranking", "GDP Ranking"]]
df_2021_outliers["Difference"] = df_2021_outliers["GDP Ranking"] - df_2021_outliers["Happiness ranking"]

In [None]:
df_2021_outliers.head()

The geopandas library allows to generate polygons with the right shape, making plotting data on a map very easy. It provides a dataframe with all countries and associated polygons.

In [None]:
countries = gpd.read_file(
               gpd.datasets.get_path("naturalearth_lowres"))
countries.head()

In [None]:
countries.plot()

In [None]:
# ATTENTION : This is just a mockup but be aware some countries are probably lost at this stage,
# due to name inconsistency in the merge operation. 
countries = countries.merge(right = df_2021_outliers, how = "left", left_on = "iso_a3", right_on = "iso_3")
countries.head()

In [None]:
plot_differences = countries[["name", "geometry", "Difference"]]
plot_differences['Difference'] = plot_differences['Difference'].fillna(0)
plot_differences.plot(column='Difference', figsize=(25, 20),
           legend=True, cmap='coolwarm')

<b> Title : Ranking difference between GDP/inhabitant and happiness by country </b>

## Distribution Plot for Happiness / Alcohol Consumption. 

All credits for building (quickly) this visualisation, to the [solution showed here](https://community.plotly.com/t/put-images-inside-bubbles/41364/5). 

In [None]:
# only iso2 can be used with the country flag dictionary mentioned below
iso3_to_iso2 = {c.alpha_3: c.alpha_2 for c in pycountry.countries}

df = px.data.gapminder().query("year==2007")
df["iso_alpha2"] = df["iso_alpha"].map(iso3_to_iso2)

In [None]:
df = df.merge(right = df_happy_2021, how = "left", left_on = "iso_alpha", right_on = "iso_3")

In [None]:
df.head()

In [None]:
# todo : change with ladder score here 
fig = px.scatter(
    df,
    x="lifeExp",
    y="gdpPercap",
    hover_name="country",
    hover_data=["lifeExp", "gdpPercap", "pop"],
)
fig.update_traces(marker_color="rgba(0,0,0,0)")

minDim = df[["lifeExp", "gdpPercap"]].max().idxmax()
maxi = df[minDim].max()
for i, row in df.iterrows():
    country_iso = row["iso_alpha2"]
    fig.add_layout_image(
        dict(
            source=f"https://raw.githubusercontent.com/matahombres/CSS-Country-Flags-Rounded/master/flags/{country_iso}.png",
            xref="x",
            yref="y",
            xanchor="center",
            yanchor="middle",
            x=row["lifeExp"],
            y=row["gdpPercap"],
            sizex=np.sqrt(row["pop"] / df["pop"].max()) * maxi * 0.15 + maxi * 0.03,
            sizey=np.sqrt(row["pop"] / df["pop"].max()) * maxi * 0.15+ maxi * 0.03,
            sizing="contain",
            opacity=0.8,
            layer="above"
        )
    )

fig.update_layout(height=600, width=1000, plot_bgcolor="#dfdfdf", yaxis_range=[-5e3, 55e3])

fig.show()