# **Introduction**


The purpose of the project is to explore world happiness data, revealing the correlations between key metrics including expectancy, freedome, GDP and overall happiness score. In addition, we intend to uncover trends in happiness score across different countries over the years. As I'm living in Finland, I'm particularly intrigued in also to dive deeper into the happiness data of the country that is called the happiest in the world for 5 consecutive years. We'll use mainly 2 datasets:
- World happiness report 2013 to explore deeply intro 2023 happiness metrics
- World happiness index 2013-2023 to see the trends of happiness index across countries over the whole decade

In [None]:
#Import libraries
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Data Cleaning

> World happiness index 2013-2023

In [None]:
world_index = pd.read_csv('/kaggle/input/world-happiness-report-2013-2023/WorldHappinessIndex2013-2023.csv')
world_index.head(10).style

In [None]:
world_index.info()

In [None]:
world_index.isna().sum()

As there are 147 countries that have the Index and Rank metrics missing and the countries are quite random so we'll drop the observations with missing values to keep the data consistant and easier to work with.

In [None]:
world_index = world_index.dropna()

> World happiness report 2013

In [None]:
world_2023 = pd.read_csv('/kaggle/input/world-happiness-report-2013-2023/World Happiness Report 2023.csv')

In [None]:
world_2023.head(10).style

Each column in the dataset corresponds to a distinct category and metric, and here's what each of them signifies:

* Ladder score: This metric represents overall happiness score or subjective well-being, along with its associated standard error, upperwhisker and lower whisker as upper quarter and lower quartile. It also includes the 95% confidence interval.

* Logged GDP per capita: This indicates a country's Gross Domestic Product divided by its population, reflecting how much each individual contributes to the country's economic output.

* Social support: This metric measures whether individuals have someone to rely on during challenging times, assessing the strength of social connections within a society.

* Healthy life expectancy: This assesses an individual's life expectancy while considering both physical and mental health, providing a holistic perspective on well-being.

* Freedom to make life choices: This metric gauges an individual's satisfaction with their ability to make life decisions and encompasses their freedom to exercise their human rights.

* Generosity: It evaluates an individual's willingness to be charitable, such as whether they've donated money to a charitable cause in the past month.

* Perception of corruption: This measures people's perceptions of corruption within their government and their trust in the integrity of government institutions and fellow citizens.

In [None]:
world_2023.info()

Same with the World Happiness Report 2023, we'll drop a few rows with missing values of all columns to make sure we have better and cleaner data to work with. The countries with missing values are random also so our analysis won't be subject to the bias error.

In [None]:
world_2023 = world_2023.dropna()

In [None]:
world_2023.describe().round()

#  Exploratory Data Analysis - Index and Ranking of world countries from 2013 to 2023:


- Index score refers to the number that measures the level of happiness of a country
- Ranking refers to the position of a particular country in a list that ranks countries by index score

> Average index score from 2013-2023

In [None]:
#Average index 
world_index['Year'] = pd.to_datetime(world_index['Year'], format='%Y')
average_index_by_year = world_index.groupby('Year')['Index'].mean()
sns.set_style("whitegrid")
plt.figure(figsize=(12, 8)) 
plt.plot(average_index_by_year.index, average_index_by_year.values, marker='o', markersize=8, color='blue', label='Average Index')
plt.xlabel('Year', fontsize=14, labelpad=12)  
plt.ylabel('Average Index', fontsize=14, labelpad=12) 
plt.title('Average Happiness Index Across Countries Over Time', fontsize=18)
plt.legend(fontsize=12)
plt.grid(True, linestyle='--', alpha=0.7, color='gray')
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.tight_layout() 
plt.show()

2017 saw the all-time low of the average index in the world in the past decade. Political conflicts, some social unrest events and climate crisis like hurricanes, terrorist bombs, et, cause the overall low happiness level across many countries. The average index has increased drastically since then up to now and it's interesting to see the pandemic, massive layoffs and fears of recessions haven't made the happiness scores dive down.

> Index over 10 years by country

In [None]:
#index score by country
world_index['Year'] = pd.to_datetime(world_index['Year'], format='%Y')
countries = world_index['Country'].unique()
plt.figure(figsize=(10, 9))
color = 'blue'
line_styles = ['-', '--', '-.', ':']

for i, country in enumerate(countries):
    country_data = world_index[world_index['Country'] == country]
    line_style = line_styles[i % len(line_styles)]  
    plt.plot(country_data['Year'], country_data['Index'], label=country, linestyle=line_style, linewidth=1)

plt.xlabel('Year')
plt.ylabel('Index')
plt.title('Index Over Time By Country', fontsize = 18)
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=3)  
plt.show()

The index scores for different countries over time show a varied pattern. While some countries maintain relatively stable index scores, others experience significant fluctuations in their happiness index scores across the years.

# Exploratory Data Analysis - Happiness index score of Finland

In [None]:
#Happiness index trends of Finland
world_index['Year'] = pd.to_datetime(world_index['Year'], format='%Y')
finland_data = world_index[world_index['Country'] == 'Finland']
plt.figure(figsize=(10, 6)) 
plt.plot(finland_data['Year'], finland_data['Index'], marker='o', color='blue', linestyle='-', linewidth=2)
plt.xlabel('Year', fontsize=14, labelpad=12)  
plt.ylabel('Index', fontsize=14, labelpad=12)  
plt.title('Happiness Index Score Over Time Of Finland', fontsize=18) 
plt.grid(True, linewidth=0.5)
plt.xticks(fontsize=11)
plt.yticks(fontsize=11)
plt.tight_layout()  
plt.show()

Finland's happiness score saw a substantial increase starting in 2016 but has since declined from 2021. This decline may be attributed to potential influences from European events, such as the Russian-Ukraine war, as well as concerns related to economic and political matters.

> Happiness index rankings of Finland in 2013-2013

In [None]:
#Happiness index rankings of Finland
world_index['Year'] = pd.to_datetime(world_index['Year'], format='%Y')
finland_data = world_index[world_index['Country'] == 'Finland']
plt.figure(figsize=(10, 6)) 
plt.plot(finland_data['Year'], finland_data['Rank'], marker='o', color='blue', linestyle='-', linewidth=2)
plt.xlabel('Year', fontsize=14, labelpad=12)
plt.ylabel('Rank', fontsize=14, labelpad=12) 
plt.title('Happiness Index Ranks Over Time Of Finland', fontsize=18)  
plt.grid(True, linewidth=0.5)
plt.xticks(fontsize=11)
plt.yticks(fontsize=11)
plt.tight_layout()  
plt.show()

From the 5th position in the index rankings in 2017 to the first in 2018, Finland has seen 5 years in a row being called the happiest country, looking forward to more years of Finland carrying the crown!

# Exploratory Data Analysis - A closer look at Happiness report in 2023

In [None]:
world_2023.head(5)

> Adding Finland in perspective - top 10 happiest countries in 2023

In [None]:
#top 10 countries with highest happiest score
top_10_countries = world_2023.sort_values("Ladder score", ascending=False).head(10)
plt.figure(figsize=(10, 6))
plt.barh(top_10_countries["Country name"], top_10_countries["Ladder score"], color="purple")
plt.xlabel("Ladder score(Happiness score)", fontsize=14, labelpad=12)
plt.ylabel("Country name", fontsize=14, labelpad=12)
plt.title("Top 10 Countries with the Highest Ladder Score", fontsize=18)
plt.gca().invert_yaxis()
plt.show()



Top 10 happiest countries include mainly countries in the EU, it's interesting also to see 2 neighbors of Finland, Sweden and Norway in the top list.

> Top 10 happiest countries with the score of key metrics

In [None]:
top_10_countries = world_2023.sort_values(by="Ladder score", ascending=False).head(10)
metrics = ["Logged GDP per capita", "Social support", "Healthy life expectancy", "Generosity", "Freedom to make life choices"]

# Extract data for the top 10 countries and selected metrics
top_10_data = top_10_countries[["Country name"] + metrics]
top_10_data = top_10_data.set_index("Country name").T

# Create a grouped bar chart
plt.figure(figsize=(12, 8))
top_10_data.plot(kind="bar", width=0.8, colormap="viridis", edgecolor="black")
plt.xlabel("Metrics", fontsize=12)
plt.ylabel("Values", fontsize=12)
plt.title("Happiness Metrics of the Happiest Countries in 2023", fontsize=18)
plt.legend(title="Country", bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=11)
plt.xticks(rotation=45, fontsize=11)
plt.yticks(fontsize=11)
plt.show()

The top 10 happiest countries in 2023 have quite equal key metric values like GDP per capita, Social support, Life expectancy, etc. and Finland isn't the country with the highest GDP per capita or life expectancy out of these top 10.

> Distribution of ladder score (happiness score) across countries in 2023

In [None]:
#Distibution of Ladder score
sns.set(style="whitegrid")
plt.figure(figsize=(10, 6))
ax = sns.histplot(data=world_2023, x="Ladder score", bins=20, kde=True, color='blue')
plt.xlabel("Ladder Score", fontsize=14)
plt.ylabel("Frequency", fontsize=14)
plt.title("Distribution of Ladder Score", fontsize=16)
mean_score = world_2023["Ladder score"].mean()
plt.axvline(mean_score, color='red', linestyle='--', label=f"Mean Score ({mean_score:.2f})", linewidth=2)
plt.legend()
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

The next part of the analysis is to reveal the correlation of each selected factor used to measure the happiness of a country and the happiness score itself.

> Correlation between GDP per capita and Happiness score

In [None]:
plt.scatter(world_2023["Ladder score"], world_2023["Logged GDP per capita"], color="purple", label="Data Points")
x = world_2023["Ladder score"]
y = world_2023["Logged GDP per capita"]
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
sns.regplot(x=x, y=y, color="blue", scatter=False, label="Trendline")
plt.title("Happiness Ladder Score and Logged GDP per Capita", fontsize=18)
plt.xlabel("Happiness Ladder Score")
plt.ylabel("Logged GDP per Capita")
plt.legend()
plt.show()

The regression lines indicate a strong correlation between GDP per capita and the overall happiness score, showing the significance of economic development in influencing happiness.

> Correlation between Social support and Happiness score

In [None]:
plt.scatter(world_2023["Ladder score"], world_2023["Social support"], color="purple", label="Data Points")
x = world_2023["Ladder score"]
y = world_2023["Social support"]
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
sns.regplot(x=x, y=y, color="blue", scatter=False, label="Trendline")
plt.title("Happiness Ladder Score and Social Support", fontsize=18)
plt.xlabel("Happiness Ladder Score")
plt.ylabel("Social Support")
plt.legend()
plt.show()

A positive correlation exists between Social support scores and Happiness scores. During challenging periods, the availability of support from one's social network becomes increasingly crucial, impacting the satisfaction and overall well-being of a country's residents.

> Correlation between Life expectancy and Happiness score

In [None]:
plt.scatter(world_2023["Ladder score"], world_2023["Healthy life expectancy"], color="purple", label="Data Points")
x = world_2023["Ladder score"]
y = world_2023["Healthy life expectancy"]
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
sns.regplot(x=x, y=y, color="blue", scatter=False, label="Trendline")
plt.title("Happiness Ladder Score and Healthy Life Expectancy", fontsize=18)
plt.xlabel("Happiness Ladder Score")
plt.ylabel("Healthy Life Expectancy")
plt.legend()
plt.show()

Life expectancy demonstrates a correlation with the Happiness score, and intriguingly, there are instances of countries having similar life expectancy levels yet exhibiting different ladder scores. This observation is supported by the earlier bar graph, which highlights that all the top 10 happiest countries have a life expectancy exceeding 70 years.

> Correlation between Freedom to make choices and Happiness score

In [None]:
plt.scatter(world_2023["Ladder score"], world_2023["Freedom to make life choices"], color="purple", label="Data Points")
x = world_2023["Ladder score"]
y = world_2023["Freedom to make life choices"]
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
sns.regplot(x=x, y=y, color="blue", scatter=False, label="Trendline")
plt.title("Happiness Ladder Score and Freedom to Make Life Choices", fontsize=18)
plt.xlabel("Happiness Ladder Score")
plt.ylabel("Freedom to Make Life Choices")
plt.legend()
plt.show()

There is a relationship between the Freedom to make life choice metric and Happiness score but it's quite scattered and the correlation isn't very strong, however it's discernible that countries with higher Happiness score tend to have higher Freedom to make life choice scores.

> Correlation between Generosity and Happiness score

In [None]:
plt.scatter(world_2023["Ladder score"], world_2023["Generosity"], color="purple", label="Data Points")
x = world_2023["Ladder score"]
y = world_2023["Generosity"]
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
sns.regplot(x=x, y=y, color="blue", scatter=False, label="Trendline")
plt.title("Happiness Ladder Score and Generosity", fontsize=18)
plt.xlabel("Happiness Ladder Score")
plt.ylabel("Generosity")
plt.legend()
plt.show()

The scatter plot doesn't show any correlation between Generosity and Happiness score, all the dots are scattered and doesn't signify any trends or relationship between 2 variables.

> Correlation of all metrics and Happiness score

In [None]:
metrics = ["Ladder score", "Logged GDP per capita", "Generosity", "Healthy life expectancy", "Freedom to make life choices", "Social support"]
selected_data = world_2023[metrics]
correlation_matrix = selected_data.corr()
plt.figure(figsize=(10, 8))
sns.set(font_scale=1.1)  
sns.set_style("whitegrid") 
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5, vmin=-1, vmax=1, cbar_kws={"shrink": 0.7})
plt.title("Correlation Between Happiness Metrics and Happiness Ladder Score", fontsize=18)
plt.show()

The heatmap draws the correlations between the specified metrics and Happiness "Ladder score." The values in the heatmap represent the correlation coefficients, which range from -1 (strong negative correlation) to 1 (strong positive correlation), with 0 indicating no correlation. Out of all metrics, GDP per capita and Life expectancy are the most positively correlated with the Happiness score, with correlation coefficients as 0.84 and 0.75 respectively.

# Summary 

The project uses analysis and visualization methods to reveal the trends of happiness score over the years and correlations between important metrics to the overall happiness level as well as happiness trends of Finland over the decade. The visualization shows some interesting findings:
* There are positive correlations between several key metrics and the Happiness score, notably "Logged GDP per capita," "Healthy life expectancy", "Social support", and "Freedom to make life choices". The correlation between "Generosity" and Happiness score appeared to be less pronounced, suggesting that the impact of generosity on happiness isn't very clear.
* The overall happiness score has increased significantly since 2017 dropping in 2023 but we might see a clearer trend when having a complete dataset at the end of 2023
* Finland's happiness score saw a substantial increase starting in 2016 and having exhibited the highest score since 2018 and being 5 years in a row being called the happiest country

In conclusion, this analysis project contributes to our understanding of the multifaceted nature of happiness and the factors that contribute to it. It underscores the importance of considering a range of well-being metrics when assessing and improving the quality of life for individuals and communities worldwide.