 
 # **World Happiness Report**


> Happiness is the key indicator of societal well-being. 

This project analyzes global happiness data collected by the United Nations. It explores how economic, social, and health factors influence happiness levels across countries.

## Understanding columns:

1.	**Ladder/ happiness score**: its range is from 0 to 10, where people choose where they thing they belong on this ladder.
2.	**GDP**: This is the logarithm of GDP per person in a country. It’s typical range is around 0.3 – 2.0 , poorer countries are closer to 0.4 whereas, richer countries around 1.8-2.0
3.	**Social support**: This represents how strongly people feel they have someone to rely on in times of trouble. Its range is roughly 0-2. Countries with good social networks score around 1.5-1.6, whereas ones with poor relations score 0.5-0.8.
4.	**Generosity**: This depicts whether people have recently donated money or goods to charity. Its range is -0.2 to 0.6. Positive values show more generosity, near zero show neutral, negative values shoe low generosity.
5.	**Life Expectancy**: This average age up to which an infant is assumed to live. Its range is 0.2 – 1.0.
6.	**Freedom**: This represents the how strongly people feel that they can make their life choices freely. Its range is 0 – 1.0
7.	**Corruption**: This depicts how much people perceive corruption in government and organizational systems. Its range is 0 – 0.5


## Objectives

* Explore happiness scores across various countries and regions
* Visualize relationships between happiness and factors like GDP, social support, and life expectancy
* Visualize happiness levels of each country on world map using color intensity
* Identify key insights and patterns in the data to summarize conclusions
  

****

## Breakdown of this notebook:

1. **Loading Dataset**
2. **Data Cleaning**
    * Understanding columns
    * Deleting unrequired columns
    * Renaming columns
    * Removing missng values from dataset
3. **Data Analysis**
    * Analyzing happiest and unhappiest countries
    * Binning the Happiness score columns
    * Pivot table of happiness score and top 3 factors
    * Correlation matrix
    * Happiness trend across regions
4. **Data Visualization**
    * Frequency of happiness levels
    * Heatmap of correlation matrix
    * Scatterplot of relationship between happiness score and other factors
    * Boxplot displaying happiness across regions
    * Choropleth map of happiness scores by country

**Importing Libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style= 'whitegrid')

In [None]:
%matplotlib inline

****

## STEP 1: Load Dataset

In [None]:
import os 
input_path = '/kaggle/input/world-happiness-report-2024'
os.listdir(input_path)

In [None]:
df = pd.read_csv("/kaggle/input/world-happiness-report-2024/World-happiness-report-2024.csv")
df.head()

****

## STEP 2: Clean Dataset

**Understanding columns**

In [None]:
df.info()

In [None]:
df.isnull().sum()

In [None]:
df.columns

**INSIGHT:**
1. Number of countries – 143
2. Missing values –  In columns ,Log GDP per capita, Social support, Healthy life expectancy, Freedom to make life, Generosity,  Perceptions of corruption and Dystopia + residual
3. Data types of all columns are correct and suitable

**Deleting unrequired columns** 

In [None]:
df= df.drop(['upperwhisker','lowerwhisker','Dystopia + residual'],axis=1)

**Renaming columns**

In [None]:
df= df.rename(columns={ 'Country name':'Country',
                      'Regional indicator':'Region',
                      'Ladder score':'Happiness score',
                      'Log GDP per capita':'GDP (log)',
                      'Healthy life expectancy':'Life expectancy',
                      'Freedom to make life choices':'Freedom',
                      'Perceptions of corruption':'Corruption'
                      })

In [None]:
df.columns

**Replace missing values with mean**

In [None]:
cols_to_fill = ['GDP (log)', 'Social support','Life expectancy', 'Freedom', 'Generosity', 'Corruption']

for col in cols_to_fill:
    df[col]= df[col].fillna(df[col].mean())

In [None]:
df.isnull().sum()

In [None]:
df.head()

****

## STEP 3: Analyzing data

In [None]:
df.describe()

**Top 5 Happiest countries**

In [None]:
df[['Country','Happiness score']].head()

**Top 5 Unhappiest countries**

In [None]:
df[['Country','Happiness score']].tail()

**Creating bins(low, medium, high) for Happiness score column**

In [None]:
bins = np.linspace (min(df['Happiness score']),max(df['Happiness score']),4)
group_names = ['low','medium','high']
df['Happiness level'] = pd.cut(df['Happiness score'],bins,labels=group_names,include_lowest=True)

In [None]:
df['Happiness level'].value_counts()

**INSIGHT:** 
* The number of countries with a high happiness level is 73, i.e, slightly more than half of the total number of countries.
* 91.6% countries enjoy a high and medium happiness level, which is a vast number


**Finding correlation between all the numerical columns**

In [None]:
corr_matrix={'Happiness score':1.000000, 'GDP (log)':0.767181, 'Social support':0.812142, 'Life expectancy':0.758352,
             'Freedom':0.643342,'Generosity':0.129815,'Corruption':0.451052}
df.corr(numeric_only=True)['Happiness score']


**INSIGHT:**
1. Social support has the highest impact on happiness score, whereas generosity has the least
2. GDP and life expectancy also play a significant role in determining happiness score of a nation.
3. These factors are interdependant

**Visulaize key factors by happiness level**

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
pivot = df.pivot_table( index = 'Happiness level',values=['GDP (log)','Social support','Life expectancy'],
                        aggfunc='mean'
                        )
pivot.plot(kind='bar', color=['lightblue','peachpuff','palegreen'])

**Happiness trends across regions**

In [None]:
# Distribution of happiness score based on region
df['Region'].value_counts()

In [None]:
region_analysis = df.groupby('Region')['Happiness score'].agg(['mean','std','count']).sort_values(by='mean',ascending=False)
region_analysis

**INSIGHT:**
1. The highest count of countries belong to Sub-Saharan Africa region, however the Happiness score among these countries is the very low with an average of 4.33.
2. North America and ANZ region has the highest mean happiness score of 6.93, followed by, Western Europe and, Central and Eastern Europe.
3. South Asia consists of 6 countries and has the lowest average happiness score of 3.90.

****

## STEP 4: Data Visualization

**Frequency of happiness levels**

In [None]:
plt.figure(figsize=(10,5))
happiness_count = df['Happiness level'].value_counts()
plt.bar(happiness_count.index, happiness_count.values,color=['pink','coral','yellow'])
for index, value in enumerate(happiness_count.values):
    plt.text(index, value, str(value), ha='center', va='bottom')
plt.title('Frequency of Happiness levels')
plt.xlabel('Happiness levels')
plt.ylabel('Frequency')
plt.show()

**Heatmap displaying the correlation of happiness factors**

In [None]:
corr_df= pd.DataFrame(corr_matrix, index=['Happiness score'])
plt.figure(figsize=(10,6))
sns.heatmap(corr_df, annot=True, cmap='YlGnBu', square=True)

plt.title('Correlation heatmap of World happiness Report factors')
plt.show()

**Relation between Happiness score and key factors**

In [None]:
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,4))

# Realtion between happiness score and GDP of a country
sns.scatterplot(data=df, x='GDP (log)', y='Happiness score', hue='Happiness score', palette='spring', ax=ax1)
ax1.set_title('Happiness score vs GDP(log)')

# Realtion between happiness score and Social support in a country
sns.scatterplot(data=df, x='Social support', y='Happiness score', hue='Happiness score', palette='mako', ax=ax2)
ax2.set_title('Happiness score vs Social support')

plt.tight_layout()
plt.show()

In [None]:
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(12,4))

# Realtion between happiness score and Life expectancy in a country
sns.scatterplot(data=df, x='Life expectancy', y='Happiness score', hue='Happiness score', palette='cool', ax=ax1)
ax1.set_title('Happiness score vs Life expectancy')

# Realtion between happiness score and Corruption in a country
sns.scatterplot(data=df, x='Corruption', y='Happiness score', hue='Happiness score', palette='autumn', ax=ax2)
ax2.set_title('Happiness score vs Corruption')

plt.tight_layout()
plt.show()

**Box plot displaying happiness score across regions**

In [None]:
short_labels = {
    'Western Europe': 'W. Europe',
    'Sub-Saharan Africa': 'Sub-Sah. Africa',
    'North America': 'N. America',
    'South Asia': 'S. Asia',
    'Latin America and Caribbean': 'Lat. Am.',
    'Eastern Asia': 'E. Asia',
    'Southeast Asia': 'SE Asia',
    'Commonwealth of Independent States': 'CIS',
    'Middle East and North Africa': 'MENA',
    'Central and Eastern Europe': 'C. & E. Europe'
}

# Add a new column for plotting only (original data stays intact)
df['Region_Short'] = df['Region'].map(short_labels)
plt.figure(figsize=(12,4))
sns.boxplot(x='Region_Short', y='Happiness score', data=df, palette='Set2')
plt.title('Distribution of Happiness Scores by Region')
plt.xlabel('Region')
plt.ylabel('Happiness score')
plt.xticks()
plt.show()

**INSIGHT:**
1. Western Europe and Latin America have the highest median happiness scores
2. Sub-Saharan Africa and South Asia have the lowest median happiness scores
3. The Middle East and North Africa region shows the largest spread in happiness scores, indicating a wide range of happiness levels within region

**Map of Happiness levels by country**

In [None]:
import plotly.io as pio
pio.renderers.default = 'iframe_connected'
import plotly.express as px
fig = px.choropleth( df, locations='Country', locationmode='country names', color='Happiness score', 
                    hover_name='Country', color_continuous_scale='RdYlGn', 
                   title='Global Happiness scores by country')
fig.show()

**INSIGHT:**
1. Countries in North America, Western Europe and Oceania generally show highest scores
2. Countries in South America, Wastern Europe, Russia and parts of Central and Southeast Asia tend to have moderate happiness scores
3. Many countries in Africa and parts of Middle East and Asia show lowest happiness scores
4. The map suggests a correlation between a country's level of development and its happiness score

****

## Final Insights

* Developed countries tend to have higher happiness scores.
* The happiest regions are mostly in Western Europe and North America, while Sub-Saharan African and South Asia show the lowest scores.
* Social support, GDP per capita and Life expectancy have the strongest positive correlation with happiness.
* Generosity and perceptions of corruption show weak relationships with happiness.
* A few countries show hgiher happiness despite lower GDP, suggesting cultural and social factors contribute strongly.
* Finland is ranked the happiest country in 2024, whereas Afghanistan is ranked the unhappiest.
* Improving health, trust and social connections may raise national happiness levels even without major GDP growth.