![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://colab.research.google.com/github/callysto/jupyterlite/blob/main/content/data-viz-of-the-week/world-happiness/world-happiness.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# World Happiness and Income

# Question

Is the happiness score of citizens in a country directly correlated to income in that country, or are there other factors that also influence the happiness scores?


# Gather

We will use data from the World Happiness Report. 

Run the code in the following cell to import the code libraries needed for this project. Code libraries are sets of code that make it easier to accompilsh a specific purpose, for instance plotly express is a code library used for making visualizations. The first two lines of code import code libraries into this notebook and the lines of code below that in the same cell will import the data we are using from a website. 


In [None]:
%pip install plotly nbformat pandas

import pandas as pd
import plotly.express as px

url = 'https://happiness-report.s3.amazonaws.com/2023/DataForFigure2.1WHR2023.xls'
data = pd.read_excel(url)
data

## Happiness Score by Country

Run the following code to generate a bar graph of the ladder (happiness) score for each country.

In [None]:
#px.bar(data, x='Country name', y='Ladder score', title='World Happiness Report 2023', height=800)
px.bar(data.sort_values('Ladder score'), x='Ladder score', y='Country name', title='World Happiness Report 2023', height=2100, orientation='h')

There are also `whisker` values representing the value ranges, let's see if those are significant for our purposes.

In [None]:
data['error y'] = data['upperwhisker'] - data['Ladder score']
data['error y minus'] = data['Ladder score'] - data['lowerwhisker']
px.scatter(data, x='Country name', y='Ladder score', error_y='error y', error_y_minus='error y minus')

Those whiskers don't seem large enough to worry about.

We can also color-code the bars by continent using contient names from [Gapminder](https://gapminder.org/).

In [None]:
geonames = pd.read_csv('https://raw.githubusercontent.com/open-numbers/ddf--gapminder--geo_entity_domain/master/ddf--entities--geo--country.csv')
geonames = geonames.rename(columns={'name':'Country name', 'world_4region':'Continent'}) # we could instead use the 'world_6region' column
geonames = geonames[['Country name', 'Continent']]
data['Continent'] = data['Country name'].map(geonames.set_index('Country name')['Continent']).fillna('unknown').replace('', 'unknown')
px.bar(data, x='Country name', y='Ladder score', title='World Happiness Report 2023 by Continent', height=800, color='Continent')

Instead of using 'unknown' for the countries that were named differently in the Gapminder data set, we can manually set their contients. First we will print the ones that have an `unknown` continent.

In [None]:
data[data['Continent'] == 'unknown']['Country name']

Then we can set each of their contienents and recreate the visualization.

In [None]:
contient_corrections = {'Czechia':'europe', 
                        'Taiwan Province of China':'asia', 
                        'Slovakia':'europe', 
                        'Kyrgyzstan':'asia', 
                        'Hong Kong S.A.R. of China':'asia', 
                        'Congo (Brazzaville)':'africa', 
                        'North Macedonia':'europe', 
                        'Laos':'asia', 
                        'Ivory Coast':'africa',
                        'State of Palestine':'asia',
                        'Turkiye':'asia',
                        'Congo (Kinshasa)':'africa'}

for country, continent in contient_corrections.items():
    data.loc[data['Country name'] == country, 'Continent'] = continent

px.bar(data, x='Country name', y='Ladder score', title='World Happiness Report 2023 by Continent', height=800, color='Continent')

What do you notice about the happiness scores of specific countries or from the visualization of world happiness by continent? 


# Mapping the Data

Run the code below to make a map of the countries colored by their happiness scores. 

In [None]:
px.choropleth(data, locations='Country name', locationmode='country names', color='Ladder score', title='World Happiness Report 2023', height=800)

What observations can you make about world happiness based on the map above?

The code below will generate a scatter plot with the data. The country will be on the x-axis with the happiness score on the y-axis. The size of each dot represents the amount of social support and the colour represents the life expectancy. 


In [None]:
px.scatter(data, x='Country name', y='Ladder score', size='Social support', color='Healthy life expectancy', title='World Happiness Report 2023', height=800)

We can also generate individual scatter plots for each of the factors, with trendlines, to see how they correlate to the happiness score. In the visualizations below the happiness score is on the y-axis, and the x-axis values are:

* Gross Domestic Product ($)
* Life Expectancy (years)
* Social Support ("If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?")
* Freedom ("Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”)
* Generosity (Residual of regressing national average of "Have you donated money to a charity in the past month?" on GDP per capita.)
* Corruption (Average of "Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?")

In [None]:
factors = ['Logged GDP per capita','Social support','Healthy life expectancy','Freedom to make life choices','Generosity','Perceptions of corruption']

import plotly.graph_objects as go
from plotly.subplots import make_subplots
fig = make_subplots(rows=3, cols=2, subplot_titles=factors)
for i, factor in enumerate(factors):
    new_plot = px.scatter(data, x=factor, y='Ladder score', hover_data=['Country name'], trendline='ols')
    for t in new_plot.data: # add the scatterplot and the trendline
        fig.add_trace(t, row=i//2+1, col=i%2+1)
fig.update_layout(title='World Happiness Report 2023', showlegend=False, height=1000).update_yaxes(title_text='Happiness Score')
fig.show()

# Reflection Questions

What factors contribute to a higher score of happiness? 

What factors contribute to a lower score of happiness?

Are there factors that were not explored in this data visualization that could contribute to happiness score in a country?

How do you think factors that influence quality of life affect the happiness score in a country? 

What advice would you give to a country leader who wanted to increase the happiness score in a country? 

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)