<h1 Style="font-family:Georgia, serif;">Exploratory Data Analysis on World Happiness Index Report </h1>
    <hr>

## Introduction

<p style="font-family:Georgia, serif;">Happiness is an emotional state characterized by feelings of joy, satisfaction, contentment, and fulfillment. While happiness has many different definitions, it is often described as involving positive emotions and life satisfaction.

 <h3> World Happiness Report: </h3>

<p style="font-family:Georgia, serif;">
The World Happiness Report is a landmark survey of the state of global happiness.The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions.Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations.
The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.</li>
</p>
<p style="font-family:Georgia, serif;">The happiness scores and rankings use data from the Gallup World Poll . The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others.
</p>

<h3> Objective:</h3>
<ul style="font-family:Georgia, serif;font-size:1.1em;line-height:1.75em;">
    <li>To identify trends in happiness index for various countries over the past years.</li>
    <li>To identify factors determining state of happiness in various countries.</li>
    <li>To study how much impact these factors have in happiness score of a country.</li>
    <li>To study and identify the reasons due to which some countries have low score.</li>
    <li>To identify the reason of why India is always at bottom over the past decade. </li>
    <li>To identify areas where India needs to work on to improve it's ranking.</li>
</ul>    

<h3>Importing Libraries:</h3>
<ul>
<li>
In this section We import all the libaries used in this Kernal.
</li>
</ul>

In [None]:
!pip install plotly

import warnings
warnings.filterwarnings("ignore")
import plotly.io as pio
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import matplotlib.gridspec as grid_spec
import seaborn as sns
import plotly.express as px 
pd.options.mode.chained_assignment = None  # default='warn' 
%matplotlib inline

### Data Content: 
The happiness scores and rankings use data from the Gallup World Poll.
Gallup World Poll:
In 2005, Gallup began its World Poll, which continually surveys citizens in 160 countries, representing more than 98% of the world's adult population. The Gallup World Poll consists of more than 100 global questions as well as region-specific items. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors.
<ul style="line-height:2em;">
<li><strong>Ladder score:</strong>
Happiness score or subjective well-being. This is the national average response to the question of life evaluations.

<li><strong>Logged GDP per capita:</strong>
The GDP-per-capita time series from 2019 to 2020 using countryspecific forecasts of real GDP growth in 2020.

<li><strong>Social support:</strong>
Social support refers to assistance or support provided by members of social networks to an individual. </li>

<li><strong>Healthy life expectancy:</strong>
Healthy life expectancy is the average life in good health - that is to say without irreversible limitation of activity in daily life or incapacities - of a fictitious generation subject to the conditions of mortality and morbidity prevailing that year.

<li><strong>Freedom to make life choices:</strong>
Freedom to make life choices is the national average of binary responses to the GWP question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?” ... It is defined as the average of laughter and enjoyment for other waves where the happiness question was not asked

<li><strong>Generosity:</strong>
Generosity is the residual of regressing national average of response to the GWP question “Have you donated money to a charity in the past month?” on GDP per capita.

<li><strong>Perceptions of corruption:</strong>
The measure is the national average of the survey responses to two questions in the GWP: “Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?”

<li><strong>Ladder score in Dystopia:</strong>
It has values equal to the world’s lowest national averages. Dystopia as a benchmark against which to compare contributions from each of the six factors. Dystopia is an imaginary country that has the world's least-happy people. ... Since life would be very unpleasant in a country with the world's lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom, and least social support, it is referred to as “Dystopia,” in contrast to Utopia

World Happiness Report Official Website:<a href="https://worldhappiness.report/"> here </a>

## Reading and Analysing Data:

### Past Records of World Happiness Report:

In [None]:
# reading dataset
whr=pd.read_csv(r"Datasets/world-happiness-report.csv")

In [None]:
# Show Information of data  
whr.info()

In [None]:
# Show First five rows of data 
whr.head()

In [None]:
# describe basic statistics of data
whr.describe()

In [None]:
# what all years data do we have 
print(whr['year'].unique())

In [None]:
# checking no of null values
def check_null(df):
    for col in df.columns:
        values = np.mean(df[col].isnull())
        print(f'{col} --- \t{values}% null values')
check_null(whr)    
# since no of null values are less than 1% we will ignore them

### World Happiness Report 2021:

In [None]:
# read data:
whr21=pd.read_csv(r"Datasets/world-happiness-report-2021.csv")

In [None]:
# 2021 dataset 
whr21.info()

In [None]:
# show first five coloumn of data
whr21.head()

In [None]:
# describe basic statistics of the data 
whr21.describe()

In [None]:
# checking null values 
check_null(whr21)
# No null values present ... Good to go

## Data Cleaning:

In [None]:
# shows all the coloumn in whr dataset
whr.columns

In [None]:
# shows all the coloumn in whr21 dataset
whr21.columns
# contains few additional coloumns that we dont have in past years dataset 

In [None]:
# Renaming columns for easier access :

whr.rename(columns={'Country name':'country','Social support':'social support','Life Ladder':'score','Log GDP per capita':'gdp per capita','Healthy life expectancy at birth':'healthy life expectancy','Freedom to make life choices':'freedom','Generosity':'generosity','Perceptions of corruption':'corruption'},inplace=True)

In [None]:
# renameing columns for easier access :

whr21.rename(columns={'Country name':'country','Regional indicator':'region','Social support':'social support','Ladder score':'score','Logged GDP per capita':'gdp per capita','Healthy life expectancy':'healthy life expectancy','Freedom to make life choices':'freedom','Generosity':'generosity','Perceptions of corruption':'corruption'},inplace=True)
whr21

In [None]:
# removed un-needed columns :

whr21=whr21[['country','region','score','gdp per capita','social support','healthy life expectancy','generosity','freedom','corruption']]

In [None]:
# adding year column for 2021 dataset
whr21['year']=2021

In [None]:
# merging both data frames:

temp_reg = pd.merge(whr, whr21, how='outer',on='country')
whr['region']=temp_reg['region']

eda_happy = pd.merge(whr,whr21, how='outer',on=['country', 'year', 'score', 'gdp per capita','social support','healthy life expectancy','freedom','generosity','corruption','region'])

In [None]:
# rearranging columns

eda_happy=eda_happy[['region', 'country', 'year', 'score', 'gdp per capita', 'social support', 'healthy life expectancy', 'freedom','generosity', 'corruption']]
eda_happy=eda_happy.sort_values(by='year')

In [None]:
eda_happy.tail()

In [None]:
# Still less than 0.1% so we ignore
check_null(eda_happy)

#### Now our dataframe is ready for Analysis!

## Analysis 

First of all, let us see trends of happiness scores over past few years in a global view.

In [None]:
#Seems like we dont have appropriate data for 2005 - 2006 so we filter them out
fig=px.choropleth(eda_happy[(eda_happy['year'] != 2005) &  (eda_happy['year'] != 2006)],locations='country',width=800,height=800,locationmode='country names',hover_name='country',color='score',animation_frame='year',projection='natural earth', title='Happiness Level in Countries over past years') 
fig.show()

From the above plot, It can be observed that most of the countries/states present in North American & ANZ and the Western European regions have maintained a high levels of happiness throughtout the years.

Similarly, it can also be observed that most of the countries present in Sub-Saharan African, Asian & Southeast Asian regions have happiness levels fluctuating between low to medium. Indicating a lower score happiness score throughout many years.

We will try finding out the factors leading to this!

#### Happiest Countries in the World:

    Before we continue let us find out which countries are performing well in terms of happiness score and which countries are underperforming over the past few years.

To identify performance of these countries over past few years we first find average score of countries over the past years, based on which we classify the countries as happy or unhappy depending on their scores

In [None]:
# average score over the years
mean=np.mean(eda_happy['score'])
mean

Mean Score : 5

Therefore , Any country with average score above 5 can be considered happy and those with average score below 5 can be considered as Unhappy.

First of all lets see what counties have had good performance over the past years:

In [None]:
%matplotlib inline
# To see the top 10 highest and lowest countries we need to create plot data differently

top_happy = eda_happy.groupby('country', as_index=False)['score'].mean().sort_values(
    by='score',ascending=False)[:10]
top_unhappy = eda_happy.groupby('country', as_index=False)['score'].mean().sort_values(
    by='score', ascending=True)[:10]
top_unhappy =top_unhappy.sort_values(by='score', ascending=False)
plt.style.use('seaborn')
plt.figure(2, figsize=(12,8))
sns.barplot(data=top_happy, x='score', y='country',palette='Blues_d')
plt.xlabel('Score',fontsize=14, fontweight='bold')
plt.ylabel('Country', fontsize=14, fontweight='bold')
plt.title('Top 10 Happiest Countries in the World', fontsize=16, fontweight='bold')
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
plt.show()

From above plot, It can be observed that Denmark, Finland and Switzerland have had a great performance in terms of happiness score.
Also most of the high ranking happiest countries are in Europe.

## Happiest Countries over Past Few Years: 

In [None]:
df1=eda_happy[eda_happy['score']==eda_happy.groupby('year')['score'].transform('max').values]
df1.sort_values(by='year')
plt.figure(figsize=(15,8))
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlim(0,15)
plt.ylim(0,10)
plt.title('Happiest Countries over the past years:', fontsize=16, fontweight='bold')
sns.barplot(x='year', y='score', data = df1, hue = 'country', dodge=False)
plt.show()

Now lets see what countries have lowest happiness index over past years.

#### Top 10 Unhappiest Countries in the World:

In [None]:
%matplotlib inline

plt.style.use('seaborn')
plt.figure(3, figsize=(12,8))
sns.barplot(data=top_unhappy, x='score', y='country',palette='Reds_d')
plt.xlabel('Score',fontsize=14, fontweight='bold')
plt.ylabel('Country', fontsize=14, fontweight='bold')
plt.title('Top 10 lowest happiness scored countries', fontsize=16, fontweight='bold')
plt.xlim(0,5)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

From the above graph, we can observe that South Sudan has the lowest mean happiness level scores throughtout the years. Also most of the lowest ranking countries belong to South Africa and Asian countries.

## Saddist Countries over past few Years:

In [None]:
df2=eda_happy[eda_happy['score']==eda_happy.groupby('year')['score'].transform('min').values]
df2.sort_values(by='year')
plt.figure(figsize=(15,8))
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlim(0,15)
plt.ylim(0,10)
plt.title('Saddist Countries over the past years:', fontsize=16, fontweight='bold')
sns.barplot(x='year', y='score', data = df2, hue = 'country', dodge=False)
plt.show()

## Regional Distribution

Before we continue, lets see if we can see region wise performance of these countries.


In [None]:
%matplotlib inline

plt.style.use('seaborn')
plt.figure(figsize=(15.5,7),dpi=150)
sns.kdeplot(eda_happy["score"],hue=eda_happy['region'],fill=True, linewidth=2)
plt.axvline(eda_happy["score"].mean())
plt.title("Score Distribution by Region",fontsize=16,fontweight='bold')
plt.xlabel('Score',fontsize=14, fontweight='bold')
plt.ylabel('Density', fontsize=14, fontweight='bold')
plt.xlim(0,10)
plt.xticks(fontsize=10)
plt.yticks(fontsize=10)
plt.show()

From the following plot, We can divide the regions based on their peak scores. ie.Region whose peak score lies behind the mean axis can be classified as 'Unhappy Regions' whereas those regions whose peak value is ahead of mean axis can be classified as 'Happy Regions'

Thus, It is clear that Western European Countries has greater performance followed by North American and ANZ regions, Whereas Sub-Saharan Africa has lowest performance followed by South Asian region.

### Variation in Scores over the years:

Now, In order to identify the impact of factors affecting happiness scores let us first observe trends of top 20 countries over the past few years, In order to understand how these scores are varrying over the years

In [None]:
background = "#fbfbfb"
fig, ax = plt.subplots(1,1, figsize=(10, 5),dpi=150)
fig.patch.set_facecolor(background) # figure background color
ax.set_facecolor(background)

# 2021 dataframe 
eda_happy21=eda_happy[eda_happy['year']==2021]
# top 20 happiest countries of 2021
top_list_ = eda_happy21.groupby('country')['score'].mean().sort_values(ascending=False).reset_index()[:20].sort_values(by='score',ascending=True)

plot = 1
# for top 20 happiest countries of 2021 do: 
for country in top_list_['country']:
    m = eda_happy[eda_happy['country'] == country].groupby('country')['score'].mean()
    sns.scatterplot(data=eda_happy[eda_happy['country'] == country], y=plot, x='score',color='grey',s=35,ax=ax)
    sns.scatterplot(data=eda_happy21[eda_happy21['country'] == country], y=plot, x='score',color='red',ec='black',linewidth=1,s=75,ax=ax)   
    sns.scatterplot(data=eda_happy21[eda_happy21['country'] == country], y=plot, x=m,color='gold',ec='black',linewidth=1,s=75,ax=ax)
    plot += 1
ax.set_yticks(top_list_.index+1)
ax.set_yticklabels(top_list_['country'][::-1], fontdict={'horizontalalignment': 'right'}, alpha=0.7)
ax.tick_params(axis=u'both', which=u'both',length=0)
ax.set_xlabel("Happiness Index Score",fontfamily='monospace',loc='left',color='gray')
Xstart, Xend = ax.get_xlim()
Ystart, Yend = ax.get_ylim()
ax.hlines(y=top_list_.index+1, xmin=Xstart, xmax=Xend, color='gray', alpha=0.5, linewidth=.3, linestyles='--')
ax.set_axisbelow(True)
ax.text(6.25, Yend+4.3, 'Happiness Index Scores through the years', fontsize=17, fontweight='bold',color='#323232')
plt.annotate('2021\nscore', xy=(7.842, 20), xytext=(8.2, 11),
             arrowprops=dict(facecolor='steelblue',arrowstyle="->",connectionstyle="arc3,rad=.3"), fontsize=10,fontfamily='monospace',ha='center', color='red')
plt.annotate('Mean\nscore', xy=(7.615, 20), xytext=(8.2, 16),
             arrowprops=dict(facecolor='steelblue',arrowstyle="->",connectionstyle="arc3,rad=.5"), fontsize=10,fontfamily='monospace',ha='center', color='gold')
plt.show()


This plot shows all scores from 2005 through to the present for the top 20 countries, with their Mean score and their 2021 scores hihglighted specifically.

It's remarkable that many countries 2021 score is higher than their mean, despite the pandemic.

Although the scores do vary, they still remain relatively high.


We'll see what were the contributors that led to this!

In [None]:
# creating correlation matrix to see which factors contributes the most to happiness levels
happy_corr = eda_happy.corr()

In [None]:
plt.figure(figsize=(12,8))
plt.style.use('seaborn')
sns.heatmap(happy_corr, annot=True)
plt.xticks(fontsize=11, fontstyle='normal')
plt.yticks(fontsize=11, fontstyle='normal')
plt.title("Factor's Correlation with Happiness Score", fontsize=14, fontweight='bold')
plt.show()

#### It seems that, GDP per capita score, Healthy life expectancy & Social Support of a country, are the main factors contributing to the overall happiness level!

Let's explore these differences between Europe and the rest of the world a little more closely based on above factors.

In [None]:
# Finding Relation between Healthy Life Expectation, GDP per Capita, and Social Support comparing Europian and Other Regions:  
background = "#fbfbfb"
fig, ax = plt.subplots(1,1, figsize=(10, 5),dpi=150)
fig.patch.set_facecolor(background) 
cmap = ['#dd4124','#009473']
ax.set_facecolor(background)
sns.scatterplot(data=eda_happy, x='healthy life expectancy', y='score',hue=eda_happy['region'] == 'Western Europe',palette=cmap, alpha=0.9,ec='black',size=eda_happy["gdp per capita"]*1000, legend=True, sizes=(5, 500))
ax.set_xlabel("Life Expectancy",fontfamily='monospace',loc='left',color='gray')
ax.set_ylabel("Happiness Index Score",fontfamily='monospace',loc='top',color='gray')
ax.tick_params(axis = 'both', which = 'major', labelsize = 10)
for s in ["top","right","left"]:
    ax.spines[s].set_visible(False)
ax.text(45,9.2,'Happiness Score, Life Expectancy, and GDP per Capita',fontfamily='sans serif',fontweight='normal',fontsize=17,weight='bold',color='#323232')
L = ax.legend(frameon=False,loc="upper center", bbox_to_anchor=(1.25, 0.8), ncol= 1)
plt.setp(L.texts, family='monospace')
L.get_frame().set_facecolor('none')
L.get_texts()[1].set_text('Outside of Europe')
L.get_texts()[2].set_text('Europe')
L.get_texts()[3].set_text('GDP p/Capita [log]')

ax.tick_params(axis='both', which='both',left=False, bottom=False,labelbottom=True) 

plt.show()

We see a strong relationship between these variables and happiness scores\nand countries in Europe have exclusively high scores.

Happier countries tend to be those with longer life expectancies, and a higher GDP. This is also most of Western Europe.

In [None]:
# Finding Relation between Healthy Life Expectation, GDP per Capita, and Social Support comparing African and Other Regions:  
background = "#fbfbfb"
fig, ax = plt.subplots(1,1, figsize=(10, 5),dpi=150)
fig.patch.set_facecolor(background) 
cmap = ['#dd4124','#009473']
ax.set_facecolor(background)
sns.scatterplot(data=eda_happy, x='healthy life expectancy', y='score',hue=eda_happy['region'] == 'Sub-Saharan Africa',palette=cmap, alpha=0.9,ec='black',size=eda_happy["gdp per capita"]*1000, legend=True, sizes=(5, 500))
ax.set_xlabel("Life Expectancy",fontfamily='monospace',loc='left',color='gray')
ax.set_ylabel("Happiness Index Score",fontfamily='monospace',loc='top',color='gray')
ax.tick_params(axis = 'both', which = 'major', labelsize = 10)
ax.text(45,9.2,'Happiness Score, Life Expectancy, and GDP per Capita',fontfamily='sansserif',fontweight='normal',fontsize=17,weight='bold',color='#323232')
L = ax.legend(frameon=False,loc="upper center", bbox_to_anchor=(1.25, 0.8), ncol= 1)
plt.setp(L.texts, family='monospace')
L.get_frame().set_facecolor('none')
L.get_texts()[1].set_text('Outside of Africa')
L.get_texts()[2].set_text('Africa')
L.get_texts()[3].set_text('GDP p/Capita [log]')
ax.tick_params(axis='both', which='both',left=False, bottom=False,labelbottom=True) 
plt.show()

By and large, African countries have lower life expectancy, a lower GDP, and ultimately, lower happiness index scores.

#### Thus We can Conclude that GDP per Capita, Healthy Life Expectancy and Socail Support are three of the major factor that determine the state of Happiness in a country.

### Other factors
So GDP & Life expactancy are factors. What else can be considered?

In [None]:
background = "#fbfbfb"
fig, ax = plt.subplots(1,1, figsize=(10, 5),dpi=150)
fig.patch.set_facecolor(background) # figure background color
cmap = ['#dd4124','#009473']
ax.set_facecolor(background)
sns.scatterplot(data=eda_happy, x='freedom', y='corruption',hue=eda_happy['region'] == 'Western Europe',palette=cmap, alpha=0.9,ec='black',size=eda_happy["score"], legend=True, sizes=(5, 600))
ax.set_xlabel("Freedom",fontfamily='monospace',loc='left',color='gray')
ax.set_ylabel("Corruption",fontfamily='monospace',loc='top',color='gray')
ax.tick_params(axis = 'both', which = 'major', labelsize = 10)
for s in ["top","right","left"]:
    ax.spines[s].set_visible(False)
L = ax.legend(frameon=False,loc="upper center", bbox_to_anchor=(1.25, 0.8), ncol= 1)
plt.setp(L.texts, family='monospace')
L.get_frame().set_facecolor('none')
L.get_texts()[1].set_text('Outside of Europe')
L.get_texts()[2].set_text('Europe')
L.get_texts()[3].set_text('Happiness Score')
start, end = ax.get_ylim()
ax.yaxis.set_ticks(np.arange(0, end+0.2, 0.2))
ax.text(0.31,1.155,'Happiness Score, Freedom, and Corruption',fontfamily='sansserif',fontweight='normal',fontsize=17,weight='bold',color='#323232')
ax.tick_params(axis='both', which='both',left=False, bottom=False,labelbottom=True) 
plt.show()


It appears that freedom and corruption
are inversely related. 

It also seems that as corruption reduces
and freedom increases, so too does
happiness.higher corruption tends to be accompanied by lower freedom.

However, it is interesting to note that several European nations have high percieved levels of corruption too.

## A Continental view
Let's wrap the countries up in to their respective continents to see if we can learn more.

Of course we expect Western Europe to be high, but are there any other continents that perform particularly well or poorly in the happiness rankings?

In [None]:
continent_score = eda_happy.groupby('region')['healthy life expectancy','gdp per capita','corruption','freedom','score'].mean().reset_index()
background = "#fbfbfb"
fig, ax = plt.subplots(1,1, figsize=(10, 5),dpi=150)
fig.patch.set_facecolor(background) # figure background color
cmap = ['#dd4124','#009473']
color_map = ['#e7e9e7' for _ in range(10)]
color_map[9] =  '#009473'# color highlight
color_map[5] =  '#009473'
color_map[8] =  '#dd4124'
color_map[6] =  '#dd4124'
ax.set_facecolor(background)
sns.scatterplot(data=continent_score, x=continent_score['healthy life expectancy'], y=continent_score['score'],hue=continent_score['region'], alpha=0.9,ec='black',palette=color_map,size=eda_happy["score"], legend=False, sizes=(5, 600))
ax.set_xlabel("Life Expectancy",fontfamily='monospace',loc='left',color='gray')
ax.set_ylabel("Happiness Index Score",fontfamily='monospace',loc='top',color='gray')
ax.tick_params(axis = 'both', which = 'major', labelsize = 10)
for s in ["top","right","left"]:
    ax.spines[s].set_visible(False)
ax.text(55,7.5,'Happiness Score & Life Expectancy by Continent',fontfamily='sansserif',fontweight='normal',fontsize=17,weight='bold',color='#323232')
ax.text(55,7.3,'There are clear distinctions, with four stand-out continents',fontfamily='monospace',fontweight='light',fontsize=12,color='gray')
ax.tick_params(axis='both', which='both',left=False, bottom=False,labelbottom=True) 
for i, txt in enumerate(continent_score['region']):
    ax.annotate(txt, (continent_score['healthy life expectancy'][i]+0.5, continent_score['score'][i]),fontfamily='monospace')
plt.show()

There are three clusters of continents that are clear to see.
Sub-Saharan Africa & South Asia have the lowest scores. Whilst Western Europe and North America & ANZ are far ahead at the top.

### Differences between those above & below the mean happiness level
Let's plot all many features at once, split by the mean happiness level. The happiest countries are, as always, shown in green.

In [None]:
continent_score = eda_happy.groupby('region')['healthy life expectancy','gdp per capita','corruption','freedom','score'].mean().reset_index().mean().sort_values(ascending=True)[:10]
df_bottom = eda_happy.groupby('country')['gdp per capita','corruption','freedom','social support','score'].mean().sort_values(by='score',ascending=True)[:10]
df_bottom['gdp per capita'] = df_bottom['gdp per capita']/10
df_bottom['score'] = df_bottom['score']/5

categorical = [var for var in eda_happy.columns if eda_happy[var].dtype=='O']
continuous = [var for var in eda_happy.columns if eda_happy[var].dtype!='O']
#refined
continuous = ['gdp per capita',
 'social support',
 'healthy life expectancy',
 'freedom',
 'generosity',
 'corruption']

In [None]:
background_color = '#fbfbfb'
fig = plt.figure(figsize=(12, 6), dpi=150,facecolor=background_color)
gs = fig.add_gridspec(2, 3)
gs.update(wspace=0.2, hspace=0.5)
plot = 0
for row in range(0, 2):
    for col in range(0, 3):
        locals()["ax"+str(plot)] = fig.add_subplot(gs[row, col])
        locals()["ax"+str(plot)].set_facecolor(background_color)
        locals()["ax"+str(plot)].tick_params(axis='y', left=False)
        locals()["ax"+str(plot)].get_yaxis().set_visible(False)
        locals()["ax"+str(plot)].set_axisbelow(True)
        for s in ["top","right","left"]:
            locals()["ax"+str(plot)].spines[s].set_visible(False)
        plot += 1

plot = 0
Yes = eda_happy[eda_happy['score'].apply(lambda x: 0 if x < mean else 1) == 1]
No = eda_happy[eda_happy['score'].apply(lambda x: 0 if x < mean else 1) == 0]
for variable in continuous:
        sns.kdeplot(Yes[variable], ax=locals()["ax"+str(plot)], color='#009473',ec='black', shade=True, linewidth=1.5, alpha=0.9, zorder=3, legend=False)
        sns.kdeplot(No[variable],ax=locals()["ax"+str(plot)], color='#dd4124', shade=True, ec='black',linewidth=1.5, alpha=0.9, zorder=3, legend=False)
        locals()["ax"+str(plot)].grid(which='major', axis='x', zorder=0, color='gray', linestyle=':', dashes=(1,5))
        locals()["ax"+str(plot)].set_xlabel(variable, fontfamily='monospace')
        plot += 1
        
Xstart, Xend = ax0.get_xlim()
Ystart, Yend = ax0.get_ylim()

ax0.text(Xstart, Yend+(Yend*0.5), 'Differences between happy & unhappy countries', fontsize=17, fontweight='bold', fontfamily='sansserif',color='#323232')
ax0.text(Xstart, Yend+(Yend*0.25), 'There are large differences, with GDP & Social Support being clear\nperhaps more interesting though, unhappy countries appear to be more generous', fontsize=12, fontweight='light', fontfamily='monospace',color='gray')

plt.show()

The plots above confirm what we saw earlier, with some notable features such as social support.
That genorisity is percieved as higher in unhappier countries is very interesting.

In [None]:
sns.pairplot(eda_happy21,x_vars=[
 'gdp per capita',
 'social support',
 'healthy life expectancy',
 'freedom',
 'generosity',
 'corruption'],
    y_vars=['score'])

Now let us see how exactly does these factors behave with each other

In [None]:
sns.pairplot(eda_happy21,x_vars=[
 'score',
 'social support',
 'healthy life expectancy',
 'freedom',
 'generosity',
 'corruption'],
    y_vars=['gdp per capita'])

In [None]:
sns.pairplot(eda_happy21,x_vars=[
 'score',
 'gdp per capita',
 'healthy life expectancy',
 'freedom',
 'generosity',
 'corruption'],
    y_vars=['social support'])

In [None]:
sns.pairplot(eda_happy21,x_vars=[
 'score',
 'gdp per capita',
 'social support',
 'freedom',
 'generosity',
 'corruption'],
    y_vars=['healthy life expectancy'])

In [None]:
sns.pairplot(eda_happy21,x_vars=[
 'score',
 'gdp per capita',
 'healthy life expectancy',
 'social support',
 'generosity',
 'corruption'],
    y_vars=['freedom'])

#### So far We have Seen How these factor are affecting happiness scores of various regions and there behavior with each other.
##### From the Study So far we have identified major factors that affect happiness scores in a country now let us see how exactly do these factor impact happiness index of various Countries

In [None]:
ddf=eda_happy
dfi=ddf.loc[ddf['country']=='India']
dff=ddf.loc[ddf['country']=='Finland']
dfj=ddf.loc[ddf['country']=='Japan']
dfd=ddf.loc[ddf['country']=='Denmark']
h=ddf.loc[ddf['country'].isin(['India','Japan','Denmark','Finland'])]
dfj.head()

In [None]:
dfd.head()

In [None]:
dfi.head()

In [None]:
dff.head()

In [None]:
plt.figure(figsize=(20,10))
sns.lineplot(x="year", y="score",data=h,hue='country',palette='flare',markers=True,style=True,)
plt.ylabel('Score',fontsize=25)
plt.title('Score',fontsize=30)
plt.xlabel('Year',fontsize=25)
plt.show()
fig, ax =plt.subplots(3,2,figsize=(20,15))
sns.lineplot(x="year", y="generosity",data=h,hue='country',palette='flare',markers=True,style=True,ax=ax[0,0])
sns.lineplot(x="year", y="gdp per capita",data=h,hue='country',palette='flare',markers=True,style=True,ax=ax[0,1])
sns.lineplot(x="year", y="healthy life expectancy",data=h,hue='country',palette='flare',markers=True,style=True,ax=ax[1,0])
sns.lineplot(x="year", y="social support",data=h,hue='country',markers=True,style=True,palette='flare',ax=ax[1,1])
sns.lineplot(x="year", y="freedom",data=h,hue='country',palette='flare',markers=True,style=True,ax=ax[2,0])
sns.lineplot(x="year", y="corruption",data=h,hue='country',markers=True,style=True,palette='flare',ax=ax[2,1])
ax[0,0].set_title('Generosity',fontsize=20)
ax[0,1].set_title('GDP per Capita',fontsize=20)
ax[1,0].set_title('Healthy life expectancy',fontsize=20)
ax[1,1].set_title('Social Support',fontsize=20)
ax[2,0].set_title('Freedom',fontsize=20)
ax[2,1].set_title('Corruption',fontsize=20)
ax[0,0].set_ylabel('Generosity',fontsize=15)
ax[0,1].set_ylabel('GDP per Capita',fontsize=15)
ax[1,0].set_ylabel('Healthy life expectancy',fontsize=15)
ax[1,1].set_ylabel('Social Support',fontsize=15)
ax[2,0].set_ylabel('Freedom',fontsize=15)
ax[2,1].set_ylabel('Corruption',fontsize=15)
ax[0,0].set_xlabel('',fontsize=15)
ax[0,1].set_xlabel('',fontsize=15)
ax[1,0].set_xlabel('',fontsize=15)
ax[1,1].set_xlabel('',fontsize=15)
ax[2,0].set_xlabel('Year',fontsize=15)
ax[2,1].set_xlabel('Year',fontsize=15)
fig.show()

##### Now Its quite clear that countries with higher happiness scores have put considerable amount of work in their GDP, Healthy Life Expectancy, Social Support and Freedom.Also these counties have considerably less perception of corruption.

# Case Study : Where does India Stand?

In [None]:
asian_countries= ['Afghanistan', 'Bangladesh', 'Bhutan', 'China','Cambodia' 'Hong Kong S.A.R. of China', 'India', 'Indonesia', 'Japan', 'Kazakhstan', 'Kyrgyzstan','Laos', 'Malaysia', 'Maldives', 'Mongolia', 'Myanmar', 'Nepal', 'Pakistan', 'Philippines', 'Singapore', 'South Korea', 'Sri Lanka', 'Taiwan Province of China', 'Tajikistan', 'Thailand', 'Turkmenistan', 'Uzbekistan', 'Vietnam']
sa=['Afghanistan','Bangladesh','Bhutan','China','India','Nepal','Maldives','Myanmar','Pakistan','Sri Lanka']
df_asian= eda_happy[eda_happy['country'].isin(asian_countries)]
df_asian=df_asian.reset_index(drop=True)
print("Number of South-Asian countries:",len(sa))

In [None]:
df_sa= eda_happy[eda_happy['country'].isin(sa)]
df_sa=df_sa.reset_index(drop=True)
df_sa_h=df_sa[df_sa['score']==df_sa.groupby("year")['score'].transform('max').values]
df_sa_h=df_sa_h.sort_values(by='year')
plt.figure(figsize=(12,5))
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlim(0,15)
plt.ylim(0,10)
plt.title('Happiest countries in South-Asia by year',fontsize='20')
sns.barplot(x='year', y='score', data = df_sa_h, hue = 'country', dodge=False)

In [None]:
df_sa_s=df_sa[df_sa['score']==df_sa.groupby("year")['score'].transform('min').values]
df_sa_s=df_sa_s.sort_values(by='year')
plt.figure(figsize=(12,5))
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlim(0,15)
plt.ylim(0,10)
plt.title('Saddest countries in South-Asia by year',fontsize='20')
sns.barplot(x='year', y='score', data = df_sa_s, hue = 'country', dodge=False)

## How happy the people living in each Asian country have been over the years ?

In [None]:
df5=df_sa.copy()
df5.pop('year')
df5=df5.pivot_table(values=df5.columns[1:], index='country', aggfunc=np.mean)
df5=df5.reset_index()
df5=df5.sort_values(by='score')
pal=sns.color_palette("flare", len(df5['score']))
plt.figure(figsize=(15,3))
plt.xticks(fontsize=15, rotation=45)
plt.yticks(fontsize=15)
plt.xlim(0,25)
plt.ylim(0,10)
sns.barplot(x='country', y='score', data = df5, palette=pal, dodge=False)
plt.show()

## Why India has been among the saddest south-asian countries since 15 years?
We will first analyze and compare the Life Ladder score over the years of India and other south-asian countries. Then we will look into different factors that could've possibly played the role in India's unhappiness over the years

## Life Ladder score comparison of south-asian countries over the years

In [None]:
plt.figure(figsize=(20, 10))
for country in sa:
    life_ldr_sc = df_asian[(df_asian['country']==country)]
    life_ldr_sc = life_ldr_sc.loc[:,['year','score']]
    plt.plot(life_ldr_sc['year'], life_ldr_sc['score'].values, label=country, marker='o')

plt.xlabel('Year',fontsize=20)
plt.ylabel('Life Ladder Score',fontsize=20)
plt.title('Life Ladder score of Asian countries over the years',fontsize=20)
plt.legend(fontsize=20)
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)
plt.show()


From the above graph, it can be seen that happiness index of India has been gradually decreasing since 2006. In 2006, the happiness index was around 5.4 which is the highest value in this period and in 2020 it was around 4.2. The lowest value of happiness index of India was recorded in 2019 which was around 3.4.

## How different factors have been for India in these years?
We will now visualize and analyze about how the following factors have been for India in the period 2005 to 2021:

* Generosity
* Healthy life expectancy at birth
* Freedom to make life choices
* Log GDP per capita
* Corruption
* Social Support

In [None]:

plt.figure(figsize = (15,8))

#Generosity score of India over the years
best_generosity=max(df_sa['generosity'])
plt.subplot(231)
plt.axhline(best_generosity, ls='--', color='grey')
plt.bar(df_sa[df_sa['country']=='India']['year'], df_sa[df_sa['country']=='India']['generosity'], color='r') 
plt.ylabel('Generosity')
plt.title("Generosity in India")


#Life Expectancy at birth score of India over the years
best_LEB=max(df_sa['healthy life expectancy'])
plt.subplot(232)
plt.axhline(best_LEB, ls='--', color='grey')
plt.bar(df_sa[df_sa['country']=='India']['year'], df_sa[df_sa['country']=='India']['healthy life expectancy'], color='g') 
plt.ylabel('Healthy life expectancy')
plt.title("Healthy life expectancy in India")


#Freedom to make life choices score of India over the years
best_freedom=max(df_sa['freedom'])
plt.subplot(233)
plt.axhline(best_freedom, ls='--', color='grey')
plt.bar(df_sa[df_sa['country']=='India']['year'], df_sa[df_sa['country']=='India']['freedom'], color='b') 
plt.ylabel('Freedom to make life choices')
plt.title("Freedom to make life choices in India")


#Log GDP per capita score of India over the years
best_gdp=max(df_sa['gdp per capita'])

plt.subplot(234)
plt.axhline(best_gdp, ls='--', color='grey')
plt.bar(df_sa[df_sa['country']=='India']['year'], df_sa[df_sa['country']=='India']['gdp per capita'], color='y') 
plt.xlabel('Year')
plt.ylabel('Log GDP per capita')
plt.title("Log GDP per capita in India")

#Corruption score of India over the years
least_corruption=min(df_sa['corruption'])
plt.subplot(235)
plt.axhline(least_corruption, ls='--', color='grey')
plt.bar(df_sa[df_sa['country']=='India']['year'], df_sa[df_sa['country']=='India']['corruption'], color='c') 
plt.xlabel('Year')
plt.ylabel('Corruption')
plt.title("Corruption in India")

#Social Support score of India over the years
best_socialsupport=max(df_sa['social support'])
plt.subplot(236)
plt.axhline(best_socialsupport, ls='--', color='grey')
plt.bar(df_sa[df_sa['country']=='India']['year'], df_sa[df_sa['country']=='India']['social support'], color='m') 
plt.xlabel('Year')
plt.ylabel('Social Support')
plt.title("Social Support in India")
plt.show()


### Key Takeaways from above graphs:
<ul style="line-height=2.5em;">
<li> <u>Factors where India is doing good at:</u> Healthly life exptancy at birth, Freedom to make life choices and Log GDP per capita.
<li> <u>Factors where India needs a lot of improvement:</u> Generosity, Social Support and corruption.
<li> Corruption in India post-2014 has been lesser than that in pre-2014 with 2011 being the year when most of the corruption cases happened.
<li> There has been the good improvement in the freedom to make life choices in last 2-3 years.
<li> India still needs to do a lot of work to get the best score in above factors, which is stated as below:

In [None]:
plt.figure(figsize=(20,70))
#Generosity score of south-asian countries (2005-2020)
plt.subplot(611)
for country in sa:
    life_ldr_sc = df_asian[(df_asian['country']==country)]
    life_ldr_sc = life_ldr_sc.loc[:,['year','generosity']]
    plt.plot(life_ldr_sc['year'], life_ldr_sc['generosity'].values, label=country, marker='o')

plt.xlabel('Year',fontsize=20)
plt.ylabel('Generosity',fontsize=20)
plt.title('Generosity of South-Asian countries over the years',fontsize=20)
plt.legend(fontsize=20)
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)
#Life Expectancy at birth score of south-asian countries
plt.subplot(612)
for country in sa:
    life_ldr_sc = df_asian[(df_asian['country']==country)]
    life_ldr_sc = life_ldr_sc.loc[:,['year','healthy life expectancy']]
    plt.plot(life_ldr_sc['year'], life_ldr_sc['healthy life expectancy'].values, label=country, marker='o')

plt.xlabel('Year',fontsize=20)
plt.ylabel('Healthy life expectancy',fontsize=20)
plt.title('Healthy life expectancy of South-Asian countries over the years',fontsize=20)
plt.legend(fontsize=20)
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)

#Freedom to make life choices score of south-asian countries
plt.subplot(613)
for country in sa:
    life_ldr_sc = df_asian[(df_asian['country']==country)]
    life_ldr_sc = life_ldr_sc.loc[:,['year','freedom']]
    plt.plot(life_ldr_sc['year'], life_ldr_sc['freedom'].values, label=country, marker='o')

plt.xlabel('Year',fontsize=20)
plt.ylabel('Freedom to make life choices',fontsize=20)
plt.title('Freedom to make life choices of South-Asian countries over the years',fontsize=20)
plt.legend(fontsize=20)
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)

#Log GDP per capita score of south-asian countries
plt.subplot(614)
for country in sa:
    life_ldr_sc = df_asian[(df_asian['country']==country)]
    life_ldr_sc = life_ldr_sc.loc[:,['year','gdp per capita']]
    plt.plot(life_ldr_sc['year'], life_ldr_sc['gdp per capita'].values, label=country, marker='o')

plt.xlabel('Year',fontsize=20)
plt.ylabel('Log GDP per capita',fontsize=20)
plt.title('Log GDP per capita of South-Asian countries over the years',fontsize=20)
plt.legend(fontsize=20)
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)

#Corruption score of south-asian countries (2005-2020)
plt.subplot(615)
for country in sa:
    life_ldr_sc = df_asian[(df_asian['country']==country)]
    life_ldr_sc = life_ldr_sc.loc[:,['year','corruption']]
    plt.plot(life_ldr_sc['year'], life_ldr_sc['corruption'].values, label=country, marker='o')

plt.xlabel('Year',fontsize=20)
plt.ylabel('Corruption',fontsize=20)
plt.title('Corruption in South-Asian countries over the years',fontsize=20)
plt.legend(fontsize=20)
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)

#Social support score of south-asian countries
plt.subplot(616)
for country in sa:
    life_ldr_sc = df_asian[(df_asian['country']==country)]
    life_ldr_sc = life_ldr_sc.loc[:,['year','social support']]
    plt.plot(life_ldr_sc['year'], life_ldr_sc['social support'].values, label=country, marker='o')

plt.xlabel('Year',fontsize=20)
plt.ylabel('Social support',fontsize=20)
plt.title('Social support in South-Asian countries over the years',fontsize=20)
plt.legend(fontsize=20)
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)
plt.show()

## Key takeaways from above graphs
<ul style='line-height:2em;'>
<li><b>Generosity:</b> Eventhough India's generosity score has been nearly more or less same as majority of other south-asian countries, it still needs to work a lot on it to achieve an average score same as Myanmar
<li><b>Healthy Life Expectancy:</b> India and other south asian countries have been making a steady improvement in this field. India still needs to work in its healthcare sector to make a significant improvement in Healthy Life expectancy at Birth as countries like China, Bangladesh and Sri Lanka are still ahead
<li><b>Freedom to make life choices:</b> India has been making a significant improvement in this field in last 2-3 years as compared to other countries. Infact in 2020, India ended taking the highest score in this factor. India should keep up this score
<li><b>Log GDP per capita:</b> India's GDP per capita has been in great shape in all these years. However in 2020 it dropped down which could be due to the ongoing COVID-19 pandemic
<li><b>Corruption:</b> Corruption in India has been going down since 2014. India should keep up the good work on decreasing the corruption
<li><b>Social Support:</b> India lags behind a lot in this field and needs a good amount of work here

# Inferences based on above analysis:

* India's low life ladder score could be due to it's lagging in Social support, generosity and corruption.To make significant improvement and become atleast the happiest south asian country:
* India should work a lot on Generosity and Social support.
* It should also do a good amount of work on decreasing corruption and increasing Life expectancy at birth.
* It should keep up the work of maintaining the freedom to make choices and GDP per capita