# Project 9: World Happiness Report EDA

This notebook performs an Exploratory Data Analysis (EDA) on the World Happiness Report 2021 data. The goal is to explore the dataset to understand the factors that are most correlated with national happiness.

## 1. Setup and Library Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('darkgrid')

## 2. Data Loading and Initial Inspection

In [None]:
# Load the dataset
try:
    df = pd.read_csv('data/world-happiness-report-2021.csv')
    print("Data loaded successfully.")
except FileNotFoundError:
    print("Data file not found. Make sure 'world-happiness-report-2021.csv' is in the 'data/' directory.")

df.head()

In [None]:
df.info()

## 3. Exploring the Data

### 3.1 Top 10 Happiest Countries

In [None]:
top_10 = df.nlargest(10, 'Ladder score')

plt.figure(figsize=(10, 6))
sns.barplot(x='Ladder score', y='Country name', data=top_10, palette='viridis')
plt.title('Top 10 Happiest Countries in 2021')
plt.xlabel('Happiness Score (Ladder score)')
plt.ylabel('Country')
plt.show()

### 3.2 Distribution of Key Factors

In [None]:
factors = ['Logged GDP per capita', 'Social support', 'Healthy life expectancy', 
           'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']

df[factors].hist(figsize=(15, 10), bins=20, color='skyblue')
plt.suptitle('Distribution of Key Happiness Factors', size=16)
plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()

### 3.3 Correlation Analysis

In [None]:
# Scatter plots for each factor against the Ladder score
plt.figure(figsize=(18, 12))
for i, factor in enumerate(factors):
    plt.subplot(2, 3, i + 1)
    sns.scatterplot(x=df[factor], y=df['Ladder score'])
    plt.title(f'{factor} vs. Happiness Score')
plt.tight_layout()
plt.show()

In [None]:
# Correlation Heatmap
plt.figure(figsize=(10, 8))
corr_matrix = df[factors + ['Ladder score']].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix of Happiness Factors')
plt.show()

**Observations from the heatmap:**
*   `Logged GDP per capita`, `Social support`, and `Healthy life expectancy` have very strong positive correlations with the `Ladder score`.
*   `Freedom to make life choices` has a moderate positive correlation.
*   `Generosity` has a very weak positive correlation.
*   `Perceptions of corruption` has a moderate negative correlation (lower corruption is associated with higher happiness).

### 3.4 Regional Analysis

In [None]:
plt.figure(figsize=(12, 8))
regional_happiness = df.groupby('Regional indicator')['Ladder score'].mean().sort_values()
regional_happiness.plot(kind='barh', color='salmon')
plt.title('Average Happiness Score by Region')
plt.xlabel('Average Ladder Score')
plt.ylabel('Region')
plt.show()

## 4. Conclusion

This exploratory data analysis has provided several key insights into the World Happiness Report 2021 data:

1.  **Top Performers:** The happiest countries are predominantly located in Western Europe, with Nordic countries leading the way.
2.  **Key Drivers:** Economic prosperity (GDP per capita), strong social support systems, and high life expectancy are the factors most strongly associated with national happiness.
3.  **Other Factors:** Freedom to make life choices and low perceptions of corruption also play significant roles, while generosity appears to have a much weaker link.
4.  **Regional Differences:** There are clear disparities in happiness levels across different world regions, with Western Europe and North America/ANZ showing the highest average scores.