<a href="https://www.kaggle.com/code/varshinipj/world-happiness-report-2021-using-plotly?scriptVersionId=111166050" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

#                                         WORLD HAPPINESS REPORT - 2021

In [2]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

import plotly.express as px
import warnings
warnings.simplefilter("ignore")

# Reading the csv file:

In [3]:
data = pd.read_csv('../input/world-happiness-report-2021/world-happiness-report-2021.csv')

# The data looks like:

In [4]:
data.head()

Unnamed: 0,Country name,Regional indicator,Ladder score,Standard error of ladder score,upperwhisker,lowerwhisker,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Ladder score in Dystopia,Explained by: Log GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Dystopia + residual
0,Finland,Western Europe,7.842,0.032,7.904,7.78,10.775,0.954,72.0,0.949,-0.098,0.186,2.43,1.446,1.106,0.741,0.691,0.124,0.481,3.253
1,Denmark,Western Europe,7.62,0.035,7.687,7.552,10.933,0.954,72.7,0.946,0.03,0.179,2.43,1.502,1.108,0.763,0.686,0.208,0.485,2.868
2,Switzerland,Western Europe,7.571,0.036,7.643,7.5,11.117,0.942,74.4,0.919,0.025,0.292,2.43,1.566,1.079,0.816,0.653,0.204,0.413,2.839
3,Iceland,Western Europe,7.554,0.059,7.67,7.438,10.878,0.983,73.0,0.955,0.16,0.673,2.43,1.482,1.172,0.772,0.698,0.293,0.17,2.967
4,Netherlands,Western Europe,7.464,0.027,7.518,7.41,10.932,0.942,72.4,0.913,0.175,0.338,2.43,1.501,1.079,0.753,0.647,0.302,0.384,2.798


# Shape of the dataframe:

In [5]:
data.shape

(149, 20)

# Checking for NaN values:

In [6]:
data.isnull().sum()

Country name                                  0
Regional indicator                            0
Ladder score                                  0
Standard error of ladder score                0
upperwhisker                                  0
lowerwhisker                                  0
Logged GDP per capita                         0
Social support                                0
Healthy life expectancy                       0
Freedom to make life choices                  0
Generosity                                    0
Perceptions of corruption                     0
Ladder score in Dystopia                      0
Explained by: Log GDP per capita              0
Explained by: Social support                  0
Explained by: Healthy life expectancy         0
Explained by: Freedom to make life choices    0
Explained by: Generosity                      0
Explained by: Perceptions of corruption       0
Dystopia + residual                           0
dtype: int64

**So there are no NULL values in this dataset.**

# DATA VISUALIZATION AND ANALYSIS:

# 1. SUNBURST PLOT USING PLOTLY:

In [7]:
fig = px.sunburst(data, path=['Regional indicator', 'Country name'], values='Ladder score',
                  color='Healthy life expectancy',
                  color_continuous_scale='RdBu',
                  color_continuous_midpoint=np.average(data['Healthy life expectancy'], weights=data['Ladder score']))
fig.update_layout(
    margin = dict(t=10, l=10, r=10, b=10)
)
fig.show()

> *We can clearly see that the countries with a higher life expectancy are mostly situated in the WESTERN EUROPE and the countries with lower life expectancy are situated in the SUB-SAHARAN AFRICA.*

# 2. TREEMAPS USING PLOTLY:

> A treemap is a visual method for displaying hierarchical data that uses nested rectangles to represent the branches of a tree diagram. Each rectangles has an area proportional to the amount of data it represents.

In [8]:
fig = px.treemap(data, path=['Regional indicator','Country name'], values='Freedom to make life choices')
fig.show()

> *We can see that the Sub Saharan African countries have more Freedom to make life choices, while North American countries have the least*

# 3. SCATTER PLOT USING PLOTLY:

In [9]:
fig = px.scatter(data, x="Healthy life expectancy", y="Ladder score", color="Regional indicator",
                 size="Social support",log_x=True, size_max=20,
                 hover_name = "Country name",template = 'simple_white')
fig.show()

> *We can see that there is an approximate linear relationship between Healthy Life expectancy and Happiness score*

# 4. HEAT MAP:

In [10]:
subset = data.loc[:,['Ladder score','Logged GDP per capita','Freedom to make life choices','Social support']]
pd.plotting.register_matplotlib_converters()
px.imshow(subset.corr(), height = 600, width = 600)

> *We can visualize the correlation between the key features influencing the Happiness score. The Logged GDP per capita has a higher correlation with Ladder score.*

# 5. SCATTER PLOT USING PLOTLY: 

> *Visualizing the above concluded correlation between Logged GDP per capita and Ladder score, region wise.*

In [11]:
plt.style.use('dark_background') 
fig = px.scatter(data, x='Logged GDP per capita', y="Ladder score",hover_name = 'Country name',
                 color = 'Regional indicator',template = 'plotly_dark')

fig.update_traces(marker_line=dict(width=1, color='DarkSlateGray'))

fig.show()

# LABELLING LADDER SCORE AS 'SAD' , 'MODERATELY SAD' , 'HAPPY' AND 'VERY HAPPY' :

In [12]:
data_ = data['Ladder score']
data['label']=""

# Calculating quantiles to set intervals for the labels:

In [13]:
Q1 = np.percentile(data_, 25, interpolation = 'midpoint')
Q2 = np.mean(data_)
Q3 = np.percentile(data_, 75, interpolation = 'midpoint')
for i in range(len(data_)):
    if data['Ladder score'][i]<=Q1:
        data['label'][i] = 'Sad'
    if data['Ladder score'][i]>Q1 and data['Ladder score'][i]<Q2:
        data['label'][i] = 'Moderately Sad'
    if data['Ladder score'][i]>Q2 and data['Ladder score'][i]<Q3:
        data['label'][i] = 'Happy'
    if data['Ladder score'][i]>=Q3:
        data['label'][i] = 'Very Happy'
    

# 6. BAR CHART USING CATEGORICAL AXES: 

In [14]:
fig = px.bar(data.sort_values("Ladder score"), x='label',color="Regional indicator", hover_data=['Country name','Ladder score'])
fig.show()


> *We can visualize that most of the Sub-Saharan African countries come under the lower level of happiness and most of the Western Europian countries come under the most happiest countries.*

# 7. USING LINEAR TRENDLINE ON A BAR PLOT WITH CATEGORICAL AXES TO COMPARE THE LINEAR RELATIONSHIP BETWEEN THE FEATURES:

In [15]:
fig = px.scatter(data, x="Freedom to make life choices", y="Ladder score",trendline='ols',hover_name = 'Country name',
                 facet_col="label",template = 'plotly_dark')

fig.update_traces(marker_line=dict(width=1, color='DarkSlateGray'))

fig.show()

The slope of the regression line is called the regression coefficient. It provides a measure of the contribution of the independent variable X toward explaining the dependent variable Y.

>When we check the slope of the OLS trendlines, we can see how much the independent variable explains the dependent variable. The slope of the Very happy countries is higher and it explains that in those countries, the freedom to make life choices contributes a lot for the happiness score.

# 8. CHOROPLETH USING PLOTLY: 

> We can infer from the plot using the color hues.

In [16]:
fig = px.choropleth(data,locations='Country name',locationmode='country names',color='Ladder score',
    title='Happiness score across the globe'
)
fig.show()

**REFERENCES:**

[https://plotly.com/python/basic-charts/]
