# World Happiness Report
Happiness scored according to economic production, social support, etc.

This [dataset](https://www.kaggle.com/datasets/unsdsn/world-happiness?select=2019.csv) is from Sustainable Development Solutions Network. Data from 2019.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px

## Analyzing
Let's see what kind of dataset we have on hand

In [2]:
df = pd.read_csv('2019.csv')
df.head()

Unnamed: 0,Overall rank,Country or region,Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
0,1,Finland,7.769,1.34,1.587,0.986,0.596,0.153,0.393
1,2,Denmark,7.6,1.383,1.573,0.996,0.592,0.252,0.41
2,3,Norway,7.554,1.488,1.582,1.028,0.603,0.271,0.341
3,4,Iceland,7.494,1.38,1.624,1.026,0.591,0.354,0.118
4,5,Netherlands,7.488,1.396,1.522,0.999,0.557,0.322,0.298


## Content
The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The scores are from nationally representative samples for the years 2013-2016 and use the Gallup weights to make the estimates representative. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others.


In [3]:
df.info()
df.columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 156 entries, 0 to 155
Data columns (total 9 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Overall rank                  156 non-null    int64  
 1   Country or region             156 non-null    object 
 2   Score                         156 non-null    float64
 3   GDP per capita                156 non-null    float64
 4   Social support                156 non-null    float64
 5   Healthy life expectancy       156 non-null    float64
 6   Freedom to make life choices  156 non-null    float64
 7   Generosity                    156 non-null    float64
 8   Perceptions of corruption     156 non-null    float64
dtypes: float64(7), int64(1), object(1)
memory usage: 11.1+ KB


Index(['Overall rank', 'Country or region', 'Score', 'GDP per capita',
       'Social support', 'Healthy life expectancy',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption'],
      dtype='object')

The following columns: GDP per Capita, Family, Life Expectancy, Freedom, Generosity, Trust Government Corruption describe the extent to which these factors contribute in evaluating the happiness in each country.

In [4]:
df.describe()

Unnamed: 0,Overall rank,Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
count,156.0,156.0,156.0,156.0,156.0,156.0,156.0,156.0
mean,78.5,5.407096,0.905147,1.208814,0.725244,0.392571,0.184846,0.110603
std,45.177428,1.11312,0.398389,0.299191,0.242124,0.143289,0.095254,0.094538
min,1.0,2.853,0.0,0.0,0.0,0.0,0.0,0.0
25%,39.75,4.5445,0.60275,1.05575,0.54775,0.308,0.10875,0.047
50%,78.5,5.3795,0.96,1.2715,0.789,0.417,0.1775,0.0855
75%,117.25,6.1845,1.2325,1.4525,0.88175,0.50725,0.24825,0.14125
max,156.0,7.769,1.684,1.624,1.141,0.631,0.566,0.453


Let's make a function to assign median values for each categorie we're interested in. We'll use it to compare them to our further analyse later.

In [5]:
# Median function.
def median(col, df):
    return df[col].median()

# Defining median values to variables.
median_score = median('Score', df)
median_gdp = median('GDP per capita', df)

## Research
What are interesting correlations we might want to find in this dataset?

### Is money and hapiness correlated?
let's analyze the happiness index of the datasets countries through the `Score` column, and compare it to the countries richness through the `GDP per capita` column.

Let's start by looking at the richest countries and poorest countries.

In [6]:
# Ranking by the highest and lowest GDP per capita, dropping other values.
only_gdp_score = df.copy()
only_gdp_score.drop(only_gdp_score.iloc[:, 4:], inplace=True, axis=1)

Which countries rank the richest? and which rank the poorest?

In [7]:
richest_countries = only_gdp_score.sort_values(by='GDP per capita', ignore_index=True, ascending=False)
poorest_countries = only_gdp_score.sort_values(by='GDP per capita', ignore_index=True, ascending=True)

# Taking the richest and poorest 10.
richest_10 = richest_countries[:10]
poorest_10 = poorest_countries[:10]

In [15]:
# Assigning the "poorest"/ "richest" status to the dataframes countries.
status = []
for row in only_gdp_score['GDP per capita']:
    if row >= richest_10['GDP per capita'].min() : status.append('Richest 10')
    elif row <= poorest_10['GDP per capita'].max() : status.append('Poorest 10')
    else: status.append('Other')

only_gdp_score['Income ranked'] = status
only_gdp_score

Unnamed: 0,Overall rank,Country or region,Score,GDP per capita,Income ranked
0,1,Finland,7.769,1.340,Other
1,2,Denmark,7.600,1.383,Other
2,3,Norway,7.554,1.488,Richest 10
3,4,Iceland,7.494,1.380,Other
4,5,Netherlands,7.488,1.396,Other
...,...,...,...,...,...
151,152,Rwanda,3.334,0.359,Other
152,153,Tanzania,3.231,0.476,Other
153,154,Afghanistan,3.203,0.350,Other
154,155,Central African Republic,3.083,0.026,Poorest 10


A scatter plot would allow us to see a correlation.

In [40]:
scatter = px.scatter(only_gdp_score,
                     x="GDP per capita",
                     y="Score",
                     symbol="Income ranked",
                     color="Income ranked",
                     trendline="ols",
                     trendline_scope="overall",
                     trendline_color_override="black",)

scatter.update_layout(
    title="How is a countries income and Happiness score correlated?",
    xaxis_title="GDP per capita",
    yaxis_title="Happiness score (scale out of 10)",
    legend_title="Countries income ranked",
    font=dict(
        size=14,
        color="black"
    )
)


scatter.update_traces(marker=dict(size=10, line=dict(width=2, color='DarkSlateGrey')),
                  selector=dict(mode='markers'))
scatter.show()

In [10]:
# Let' get the median happines score and GDP from the richest and poorest 10.
richest_10_median_score = median('Score', richest_10)
richest_10_median_gdp = median('GDP per capita', richest_10)

poorest_10_median_score = median('Score', poorest_10)
poorest_10_median_gdp = median('GDP per capita', poorest_10)

### Do happy people live longer?
We'll be analyzing the `Score` and `Health life expectancy` columns.

### Can corrupted countries be happy?
We'll be analyzing the `Score` and `Perception of corruptions` columns.