# DSA210 Term Project: Health Spending, Well-Being and Out-of-Pocket Burden

## Motivation
In many countries, the cost of healthcare has been increasing over the past years. This raises an important question: How does health spending relate to peopleâ€™s well-being and health outcomes across different countries? This project analyzes international data on healthcare expenditure, life expectancy, and life satisfaction.

## My Hypothesis
My initial expectation is that while **total health expenditure** may correlate positively with outcomes, the **out-of-pocket financial burden** should negatively affect personal health and overall life quality.

## Data Sources
All datasets were obtained from Our World in Data (OWID) and merged using country-year observations:
* **Life Satisfaction:** `gdp-vs-happiness.csv`
* **Health Expenditure:** `annual-healthcare-expenditure-per-capita.csv`
* **Life Expectancy:** `life-expectancy.csv`
* **Out-of-Pocket Share:** `share-of-out-of-pocket-expenditure-on-healthcare.csv`

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

happiness_path = r"C:\Users\blgnd\Downloads\gdp-vs-happiness\gdp-vs-happiness.csv"
health_path    = r"C:\Users\blgnd\Downloads\annual-healthcare-expenditure-per-capita\annual-healthcare-expenditure-per-capita.csv"
life_path      = r"C:\Users\blgnd\Downloads\life-expectancy\life-expectancy.csv"
oopp_path = r"C:\Users\blgnd\Downloads\share-of-out-of-pocket-expenditure-on-healthcare\share-of-out-of-pocket-expenditure-on-healthcare.csv"

df_happiness = pd.read_csv(happiness_path)
df_health    = pd.read_csv(health_path)
df_life      = pd.read_csv(life_path)
df_oopp      = pd.read_csv(oopp_path)

In [None]:

df_oopp = df_oopp[[
    "Entity",
    "Year",
    "Out-of-pocket expenditure (% of current health expenditure)"
]].rename(columns={
    "Out-of-pocket expenditure (% of current health expenditure)": "out_of_pocket_share"
})


merged = df_happiness.merge(df_health, on=["Entity", "Year"], how="inner")
merged = merged.merge(df_life, on=["Entity", "Year"], how="inner")
merged = merged.merge(df_oopp, on=["Entity", "Year"], how="inner")


merged = merged[[
    "Entity",
    "Year",
    "Cantril ladder score",
    "GDP per capita, PPP (constant 2021 international $)",
    "Current health expenditure per capita, PPP (current international $)",
    "Period life expectancy at birth",
    "out_of_pocket_share"
]].rename(columns={
    "Entity": "country",
    "Year": "year",
    "Cantril ladder score": "life_satisfaction",
    "GDP per capita, PPP (constant 2021 international $)": "gdp_per_capita",
    "Current health expenditure per capita, PPP (current international $)": "health_expenditure",
    "Period life expectancy at birth": "life_expectancy"
})


merged.dropna(inplace=True)

print("first five rows of merged data (merged.head()):")
print(merged.head())
print(f"\nTotal Number of Observations (Rows): {merged.shape[0]}")

In [None]:
print("\n--- Descriptive Statistics ---")
print(merged[['life_satisfaction',
              'health_expenditure',
              'life_expectancy',
              'out_of_pocket_share']].describe())

print("\n--- correlation matrix ---")
correlation_matrix = merged[['life_satisfaction',
                             'health_expenditure',
                             'life_expectancy',
                             'out_of_pocket_share']].corr()
print(correlation_matrix)

In [None]:
print("--- Hypothesis test 1: Health Expenditure -> Life Expectancy ---")
model_life = smf.ols('life_expectancy ~ health_expenditure', data=merged)
results_life = model_life.fit()
print(results_life.summary())

In [None]:
print("--- Hypothesis test 2: Health Expenditure -> Life Satisfaction ---")
model_happiness = smf.ols('life_satisfaction ~ health_expenditure', data=merged)
results_happiness = model_happiness.fit()
print(results_happiness.summary())

In [None]:
print("--- Hypothesis test 3: Out-of-pocket -> Life Expectancy ---")
model_life_oopp = smf.ols('life_expectancy ~ out_of_pocket_share', data=merged)
results_life_oopp = model_life_oopp.fit()
print(results_life_oopp.summary())

In [None]:
print("--- Hypothesis test 4: Out-of-pocket -> Life Satisfaction ---")
model_happiness_oopp = smf.ols('life_satisfaction ~ out_of_pocket_share', data=merged)
results_happiness_oopp = model_happiness_oopp.fit()
print(results_happiness_oopp.summary())

In [None]:
# Health Expenditure vs. Life Expectancy
plt.figure(figsize=(10, 6))
sns.scatterplot(x='health_expenditure', y='life_expectancy', data=merged)
plt.title('Health Expenditure vs. Life Expectancy')
plt.xlabel('Health Expenditure Per Capita (USD)')
plt.ylabel('Life Expectancy (Years)')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()

In [None]:
# Health Expenditure vs. Life Satisfaction
plt.figure(figsize=(10, 6))
sns.scatterplot(x='health_expenditure', y='life_satisfaction', data=merged)
plt.title('Health Expenditure vs. Life Satisfaction')
plt.xlabel('Health Expenditure Per Capita (USD)')
plt.ylabel('Cantril Life Satisfaction Score')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()

In [None]:
# Out-of-pocket (%) vs Life Expectancy
plt.figure(figsize=(10, 6))
sns.scatterplot(x='out_of_pocket_share', y='life_expectancy', data=merged)
plt.title('Out-of-Pocket Share vs. Life Expectancy')
plt.xlabel('Out-of-Pocket Share of Health Spending (%)')
plt.ylabel('Life Expectancy (Years)')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()

In [None]:
# Out-of-pocket (%) vs Life Satisfaction
plt.figure(figsize=(10, 6))
sns.scatterplot(x='out_of_pocket_share', y='life_satisfaction', data=merged)
plt.title('Out-of-Pocket Share vs. Life Satisfaction')
plt.xlabel('Out-of-Pocket Share of Health Spending (%)')
plt.ylabel('Cantril Life Satisfaction Score')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()

## Conclusion

The original hypothesis about "higher healthcare prices reducing well-being" was confirmed, but with an important distinction:
* **Total Health Expenditure** showed a **strong positive association** with well-being.
* The **Out-of-Pocket financial burden** showed a **moderate negative effect**.

This means that high spending countries achieve better health **ONLY when the spending is not primarily carried by individuals**. When healthcare becomes financially burdensome, well-being declines.

## Implication

This suggests that global health funds should be prioritized for low-spending countries, where each dollar has dramatically more impact (due to the non-linear nature of health returns).
