# Wealth, health, and population
---

Group Name

Team members:

- Name : Ali Alsaegh
- Name : Fatima Alsayed

# 📖 Introduction

- 🌍 The relationship between **wealth, health, and population** has been one of the most important drivers of human progress.  
- 💰 **Gross National Income (GNI) per capita** provides insights into a country’s economic prosperity.  
- 🩺 **Life expectancy** reflects overall health outcomes, quality of life, and access to healthcare.  
- 👥 **Population size and growth** influence resource distribution, economic opportunities, and social development.  

---

## Why This Analysis?
- To explore how **wealth and health** are interconnected globally.  
- To identify **patterns and disparities** across countries and regions.  
- To understand how **population dynamics** impact both economic and health outcomes.  
- To provide insights into **global development trends** and their implications for the future.  

---

## Scope of the Study
- 📊 Time coverage: **1800–2100** (historical and projected data).  
- 🌐 Geographic coverage: **Worldwide** (all available countries).  
- 🔎 Focus areas:
  - GNI per capita (Atlas Method, constant 2021 USD)  
  - Life Expectancy (years)  
  - Population (total number of people)  


## Problem Statement

What trends or disparities can we see in the worldwide development of wealth, health, and population size, and how do these factors interact across nations?

How do wealth, health, and population size interact across countries, and what patterns or inequalities can we observe in global development?

# 🎯 Objectives

__Questions that will guide the analysis to solve the problem:__

- 🌍 **Wealth vs. Health**  
  - How does **GNI per capita** correlate with **life expectancy** across countries?  
  - Are wealthier nations consistently healthier, or are there exceptions?

- 👥 **Population Dynamics**  
  - How does **population size** affect wealth and health outcomes?  
  - Do larger populations experience slower or faster improvements in income and life expectancy?

- 📈 **Trends Over Time**  
  - How have **GNI, life expectancy, and population** changed globally since the 1800s?  
  - Which regions or countries show the **fastest or slowest growth**?

- ⚖️ **Inequality & Disparities**  
  - Which countries consistently remain at the **top and bottom** of each category?  
  - How wide is the gap between high-income and low-income nations in terms of health and population?

- 🚀 **Future Outlook**  
  - What can historical patterns tell us about the **future trajectory** of global health and wealth?  
  - Which nations are likely to become **emerging leaders** in the coming decades?


## Exploratory Data Analysis (EDA):

### Data Info:
__Getting the data and exploring it (includes descriptive statistics)__

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
#load data
gni_per_cap = pd.read_csv('./data/world-development-statistics/world-development-statistics/gni_per_cap_atlas_method_con2021.csv')
life_expectancy = pd.read_csv('./data/world-development-statistics/world-development-statistics/life_expectancy.csv')
population = pd.read_csv('./data/world-development-statistics/world-development-statistics/population.csv')

In [3]:
#check the shape of dataframe (raws , columns)
gni_per_cap.shape

(191, 252)

In [4]:
# Accesses the column names 
gni_per_cap.columns

Index(['country', '1800', '1801', '1802', '1803', '1804', '1805', '1806',
       '1807', '1808',
       ...
       '2041', '2042', '2043', '2044', '2045', '2046', '2047', '2048', '2049',
       '2050'],
      dtype='object', length=252)

In [5]:
#Show the data type of each column
gni_per_cap.dtypes

country     object
1800       float64
1801       float64
1802       float64
1803       float64
            ...   
2046        object
2047        object
2048        object
2049        object
2050        object
Length: 252, dtype: object

In [6]:
# generates summary statistics for all columns
gni_per_cap.describe(include='all')

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2041,2042,2043,2044,2045,2046,2047,2048,2049,2050
count,191,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0,...,190,190,190,190,190,190.0,190,190,190,190.0
unique,191,,,,,,,,,,...,175,178,176,174,175,175.0,174,179,173,176.0
top,Afghanistan,,,,,,,,,,...,10.3k,10.5k,15.1k,11k,11.2k,1370.0,11.7k,16.8k,12.2k,1490.0
freq,1,,,,,,,,,,...,3,3,3,3,3,2.0,3,2,3,2.0
mean,,608.615789,608.547368,611.536842,610.689474,611.878947,611.821053,612.921053,612.284211,602.557895,...,,,,,,,,,,
std,,670.490166,669.126775,681.331746,674.917062,681.7259,677.22908,679.940924,672.112396,627.047946,...,,,,,,,,,,
min,,33.0,33.0,33.0,33.0,33.0,33.0,33.0,33.0,33.0,...,,,,,,,,,,
25%,,257.0,252.0,252.0,251.75,251.5,251.0,252.5,252.5,252.25,...,,,,,,,,,,
50%,,402.0,401.0,398.5,398.5,398.5,398.5,399.0,398.5,398.5,...,,,,,,,,,,
75%,,644.75,645.25,646.0,646.75,648.0,649.25,651.5,655.25,660.75,...,,,,,,,,,,


In [7]:
# Display the first 5 rows 
gni_per_cap.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2041,2042,2043,2044,2045,2046,2047,2048,2049,2050
0,Afghanistan,207.0,207.0,207.0,207.0,207.0,207.0,207.0,207.0,207.0,...,751,767,783,800,817,834,852,870,888,907
1,Angola,517.0,519.0,522.0,524.0,525.0,528.0,531.0,533.0,536.0,...,2770,2830,2890,2950,3010,3080,3140,3210,3280,3340
2,Albania,207.0,207.0,207.0,207.0,207.0,207.0,207.0,207.0,207.0,...,9610,9820,10k,10.2k,10.5k,10.7k,10.9k,11.1k,11.4k,11.6k
3,United Arab Emirates,738.0,740.0,743.0,746.0,749.0,751.0,754.0,757.0,760.0,...,47.9k,48.9k,50k,51k,52.1k,53.2k,54.3k,55.5k,56.7k,57.9k
4,Argentina,794.0,797.0,799.0,802.0,805.0,808.0,810.0,813.0,816.0,...,12.8k,13.1k,13.4k,13.6k,13.9k,14.2k,14.5k,14.8k,15.2k,15.5k


In [8]:
#check the shape of dataframe (raws , columns)
life_expectancy.shape

(195, 302)

In [9]:
# Accesses the column names 
life_expectancy.columns

Index(['country', '1800', '1801', '1802', '1803', '1804', '1805', '1806',
       '1807', '1808',
       ...
       '2091', '2092', '2093', '2094', '2095', '2096', '2097', '2098', '2099',
       '2100'],
      dtype='object', length=302)

In [10]:
#Show the data type of each column
life_expectancy.dtypes

country     object
1800       float64
1801       float64
1802       float64
1803       float64
            ...   
2096       float64
2097       float64
2098       float64
2099       float64
2100       float64
Length: 302, dtype: object

In [11]:
# generates summary statistics for all columns
life_expectancy.describe(include='all')

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
count,195,186.0,186.0,186.0,186.0,186.0,186.0,186.0,186.0,186.0,...,186.0,186.0,186.0,186.0,186.0,186.0,186.0,186.0,186.0,186.0
unique,195,,,,,,,,,,...,,,,,,,,,,
top,Afghanistan,,,,,,,,,,...,,,,,,,,,,
freq,1,,,,,,,,,,...,,,,,,,,,,
mean,,31.503763,31.463441,31.480108,31.385484,31.460753,31.586559,31.644086,31.598387,31.385484,...,83.361828,83.476344,83.600538,83.717742,83.838172,83.955376,84.076344,84.193548,84.312903,84.430645
std,,3.80951,3.801217,3.932344,3.955872,3.928388,4.003874,4.102694,3.974506,4.08023,...,5.803782,5.797854,5.788922,5.777904,5.770755,5.766333,5.756555,5.750616,5.743805,5.741341
min,,23.4,23.4,23.4,19.6,23.4,23.4,23.4,23.4,12.5,...,66.4,66.5,66.7,66.8,66.9,67.0,67.1,67.2,67.3,67.4
25%,,29.025,28.925,28.9,28.9,28.925,29.025,29.025,29.025,28.925,...,79.65,79.75,79.925,80.025,80.15,80.325,80.425,80.525,80.7,80.8
50%,,31.75,31.65,31.55,31.5,31.55,31.65,31.75,31.75,31.55,...,84.0,84.1,84.25,84.3,84.5,84.6,84.7,84.8,84.9,85.0
75%,,33.875,33.9,33.875,33.675,33.775,33.875,33.975,33.975,33.775,...,87.775,87.875,87.975,88.075,88.175,88.3,88.4,88.5,88.675,88.775


In [12]:
# Display the first 5 rows 
life_expectancy.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
0,Afghanistan,28.2,28.2,28.2,28.2,28.2,28.2,28.1,28.1,28.1,...,75.5,75.7,75.8,76.0,76.1,76.2,76.4,76.5,76.6,76.8
1,Angola,27.0,27.0,27.0,27.0,27.0,27.0,27.0,27.0,27.0,...,78.8,79.0,79.1,79.2,79.3,79.5,79.6,79.7,79.9,80.0
2,Albania,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,...,87.4,87.5,87.6,87.7,87.8,87.9,88.0,88.2,88.3,88.4
3,Andorra,,,,,,,,,,...,,,,,,,,,,
4,United Arab Emirates,30.7,30.7,30.7,30.7,30.7,30.7,30.7,30.7,30.7,...,82.4,82.5,82.6,82.7,82.8,82.9,83.0,83.1,83.2,83.3


In [13]:
#check the shape of dataframe (raws , columns)
population.shape

(197, 302)

In [14]:
# Accesses the column names 
population.columns

Index(['country', '1800', '1801', '1802', '1803', '1804', '1805', '1806',
       '1807', '1808',
       ...
       '2091', '2092', '2093', '2094', '2095', '2096', '2097', '2098', '2099',
       '2100'],
      dtype='object', length=302)

In [15]:
#Show the data type of each column
population.dtypes

country    object
1800       object
1801       object
1802       object
1803       object
            ...  
2096       object
2097       object
2098       object
2099       object
2100       object
Length: 302, dtype: object

In [16]:
# generates summary statistics for all columns
population.describe(include ='all')

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
count,197,197,197,197,197,197,197,197,197,197,...,196,196,196,196,196,196,196,196,196,196
unique,197,164,183,186,185,179,185,180,181,180,...,190,195,186,190,190,185,187,187,182,189
top,Afghanistan,2.5M,2M,1.01M,2.01M,2.02M,2.02M,2.03M,2.03M,2.04M,...,108M,108M,109M,109M,6.61M,1.68M,110M,17M,13.2M,87.1M
freq,1,6,4,3,3,4,4,3,3,5,...,2,2,2,2,3,3,2,3,3,2


In [17]:
# Display the first 5 rows 
population.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
0,Afghanistan,3.28M,3.28M,3.28M,3.28M,3.28M,3.28M,3.28M,3.28M,3.28M,...,108M,108M,109M,109M,109M,110M,110M,110M,111M,111M
1,Angola,1.57M,1.57M,1.57M,1.57M,1.57M,1.57M,1.57M,1.57M,1.57M,...,125M,126M,127M,128M,129M,130M,131M,131M,132M,133M
2,Albania,400k,402k,404k,405k,407k,409k,411k,413k,414k,...,1.35M,1.32M,1.29M,1.26M,1.23M,1.21M,1.18M,1.15M,1.12M,1.1M
3,Andorra,2650,2650,2650,2650,2650,2650,2650,2650,2650,...,62.5k,62.2k,61.9k,61.7k,61.4k,61.2k,60.9k,60.7k,60.5k,60.2k
4,United Arab Emirates,40.2k,40.2k,40.2k,40.2k,40.2k,40.2k,40.2k,40.2k,40.2k,...,13.5M,13.5M,13.6M,13.7M,13.8M,13.8M,13.9M,14M,14M,14.1M


### Data Handling: 
__Cleaning, transforming, and combining data__

In [18]:
#Checks for missing value in 'gni_per_cap' by column
gni_per_cap.isnull().sum()

country    0
1800       1
1801       1
1802       1
1803       1
          ..
2046       1
2047       1
2048       1
2049       1
2050       1
Length: 252, dtype: int64

In [19]:
#Removes all rows from dataFrame 'gni_per_cap' that contain any missing value
gni_per_cap.dropna(inplace=True)

In [20]:
def convert_gni(gni_str):
    """Convert GNI string with 'k', 'M', or 'B' into a float value."""
    if pd.isna(gni_str):  # Handle missing values
        return None
    
    gni_str = str(gni_str).strip().replace(',', '')  # Clean string
    
    try:
        #Check suffix and convert accordingly
        if gni_str.endswith('k') or gni_str.endswith('k'):
            return float(gni_str[:-1]) * 1_000
        elif gni_str.endswith('m') or gni_str.endswith('m'):
            return float(gni_str[:-1]) * 1_000_000
        elif gni_str.endswith('b') or gni_str.endswith('b'):
            return float(gni_str[:-1]) * 1_000_000_000
        else:
            return float(gni_str)  # Regular number without suffix
    except ValueError:
        return gni_str  # Leave as is if conversion fails

# Apply to all columns except the first one (assumed to be 'Country')
gni_per_cap.iloc[:, 1:] = gni_per_cap.iloc[:, 1:].applymap(convert_gni)

#return the type pf each column after convert
gni_per_cap.dtypes

  gni_per_cap.iloc[:, 1:] = gni_per_cap.iloc[:, 1:].applymap(convert_gni)


country     object
1800       float64
1801       float64
1802       float64
1803       float64
            ...   
2046        object
2047        object
2048        object
2049        object
2050        object
Length: 252, dtype: object

In [31]:
for col in gni_per_cap.columns[1:]:
    gni_per_cap[col] = pd.to_numeric(gni_per_cap[col], errors='coerce')

In [32]:
gni_per_cap.dtypes

country     object
1800       float64
1801       float64
1802       float64
1803       float64
            ...   
2046       float64
2047       float64
2048       float64
2049       float64
2050       float64
Length: 252, dtype: object

In [24]:
#Checks for any missing value  in 'life_expectancy' by column
life_expectancy.isnull().sum()

country    0
1800       9
1801       9
1802       9
1803       9
          ..
2096       9
2097       9
2098       9
2099       9
2100       9
Length: 302, dtype: int64

In [25]:
#Removes all rows from dataFrame 'life_expectancy' that contain any missing value
life_expectancy.dropna(inplace=True)

In [26]:
life_expectancy.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
0,Afghanistan,28.2,28.2,28.2,28.2,28.2,28.2,28.1,28.1,28.1,...,75.5,75.7,75.8,76.0,76.1,76.2,76.4,76.5,76.6,76.8
1,Angola,27.0,27.0,27.0,27.0,27.0,27.0,27.0,27.0,27.0,...,78.8,79.0,79.1,79.2,79.3,79.5,79.6,79.7,79.9,80.0
2,Albania,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,...,87.4,87.5,87.6,87.7,87.8,87.9,88.0,88.2,88.3,88.4
4,United Arab Emirates,30.7,30.7,30.7,30.7,30.7,30.7,30.7,30.7,30.7,...,82.4,82.5,82.6,82.7,82.8,82.9,83.0,83.1,83.2,83.3
5,Argentina,33.2,33.2,33.2,33.2,33.2,33.2,33.2,33.2,33.2,...,86.2,86.3,86.5,86.5,86.7,86.8,86.9,87.0,87.1,87.2


In [27]:
#Checks for any missing value  in 'population' by column
population.isnull().sum()

country    0
1800       0
1801       0
1802       0
1803       0
          ..
2096       1
2097       1
2098       1
2099       1
2100       1
Length: 302, dtype: int64

In [28]:
#Removes all rows from dataFrame 'population' that contain any missing value
population.dropna(inplace=True)

In [29]:
def convert_population(pop_str):
    """Convert population string with 'M' and 'k' to float."""
    pop_str = pop_str.strip()
    
    if 'M' in pop_str:
        return float(pop_str.replace('M', '').replace(',', '').strip()) * 1_000_000
    elif 'k' in pop_str:
        return float(pop_str.replace('k', '').replace(',', '').strip()) * 1_000
    elif 'B' in pop_str:
        return float(pop_str.replace('B', '').replace(',', '').strip()) * 1_000_000_000
    else:
        return float(pop_str.replace(',', '').strip())


# Convert all columns except the first one (country) to float
population.iloc[:, 1:] = population.iloc[:, 1:].applymap(convert_population)

# Display the converted DataFrame
print(population.head())

                country       1800       1801       1802       1803  \
0           Afghanistan  3280000.0  3280000.0  3280000.0  3280000.0   
1                Angola  1570000.0  1570000.0  1570000.0  1570000.0   
2               Albania   400000.0   402000.0   404000.0   405000.0   
3               Andorra     2650.0     2650.0     2650.0     2650.0   
4  United Arab Emirates    40200.0    40200.0    40200.0    40200.0   

        1804       1805       1806       1807       1808  ...         2091  \
0  3280000.0  3280000.0  3280000.0  3280000.0  3280000.0  ...  108000000.0   
1  1570000.0  1570000.0  1570000.0  1570000.0  1570000.0  ...  125000000.0   
2   407000.0   409000.0   411000.0   413000.0   414000.0  ...    1350000.0   
3     2650.0     2650.0     2650.0     2650.0     2650.0  ...      62500.0   
4    40200.0    40200.0    40200.0    40200.0    40200.0  ...   13500000.0   

          2092         2093         2094         2095         2096  \
0  108000000.0  109000000.0  10900

  population.iloc[:, 1:] = population.iloc[:, 1:].applymap(convert_population)


In [33]:
population.dtypes

country    object
1800       object
1801       object
1802       object
1803       object
            ...  
2096       object
2097       object
2098       object
2099       object
2100       object
Length: 302, dtype: object

In [34]:
for col in population.columns[1:]:
   population[col] = pd.to_numeric(population[col], errors='coerce')

In [35]:
#return the type pf each column after convert
population.dtypes

country     object
1800       float64
1801       float64
1802       float64
1803       float64
            ...   
2096       float64
2097       float64
2098       float64
2099       float64
2100       float64
Length: 302, dtype: object

### Analysis: 
__Answering the objectives through data analysis__



In [None]:
# Melt the data into long format (country, year, value)
def melt_data(df, value_name):
    df_melted = df.melt(id_vars="country", var_name="year", value_name=value_name)
    df_melted["year"] = pd.to_numeric(df_melted["year"], errors="coerce")
    return df_melted

gni_melted = melt_data(gni_per_cap, "gni_per_cap")
life_melted = melt_data(life_expectancy, "life_expectancy")
pop_melted = melt_data(population, "population")

# Merge into one dataframe
df = gni_melted.merge(life_melted, on=["country", "year"]).merge(pop_melted, on=["country", "year"])

# Optional: filter years to make plots readable (e.g., 1960–2021)
df = df[(df["year"] >= 1960) & (df["year"] <= 2021)]

In [None]:
# -----------------------------
# 1. Line Plot: GNI vs Life expectancy over time for some countries
# -----------------------------
countries = ["United States", "China", "India", "Brazil", "Germany"]

plt.figure(figsize=(12,6))
sns.lineplot(data=df[df["country"].isin(countries)], 
             x="year", y="life_expectancy", hue="country")
plt.title("Life Expectancy over Time")
plt.show()

plt.figure(figsize=(12,6))
sns.lineplot(data=df[df["country"].isin(countries)], 
             x="year", y="gni_per_cap", hue="country")
plt.title("GNI per Capita over Time")
plt.yscale("log")  # often gni is skewed, log scale helps
plt.show()

In [None]:
# -----------------------------
# 2. Scatter Plot: GNI vs Life Expectancy
# -----------------------------
plt.figure(figsize=(10,6))
sns.scatterplot(data=df[df["year"] == 2020], 
                x="gni_per_cap", y="life_expectancy", 
                size="population", hue="country", alpha=0.7, legend=False)
plt.xscale("log")  # better scaling
plt.title("Wealth vs Health (2020)")
plt.xlabel("GNI per Capita (log scale)")
plt.ylabel("Life Expectancy")
plt.show()


In [None]:
# -----------------------------
# 3. Scatter Plot: Life Expectancy vs Population
# -----------------------------
plt.figure(figsize=(10,6))
sns.scatterplot(data=df[df["year"] == 2020], 
                x="population", y="life_expectancy", 
                hue="country", alpha=0.7, legend=False)
plt.xscale("log")
plt.title("Population vs Life Expectancy (2020)")
plt.show()

In [None]:
# -----------------------------
# 4. Scatter Plot: Population vs GNI per Capita
# -----------------------------
plt.figure(figsize=(10,6))
sns.scatterplot(data=df[df["year"] == 2020], 
                x="population", y="gni_per_cap", 
                hue="country", alpha=0.7, legend=False)
plt.xscale("log")
plt.yscale("log")
plt.title("Population vs GNI per Capita (2020)")
plt.show()

In [None]:
# -----------------------------
# 5. Heatmap of Correlations
# -----------------------------
corr = df[["gni_per_cap", "life_expectancy", "population"]].corr()
plt.figure(figsize=(6,4))
sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation between Wealth, Health, and Population")
plt.show()

In [None]:
def plot_country_trends(df, country_name):
    """
    Plots GNI per capita, Life Expectancy, and Population over time for a given country.
    """
    country_df = df[df["country"] == country_name]

    if country_df.empty:
        print(f"No data found for {country_name}")
        return

    # Create subplots
    fig, axes = plt.subplots(3, 1, figsize=(12, 12), sharex=True)

    # Plot 1: GNI per Capita
    sns.lineplot(data=country_df, x="year", y="gni_per_cap", ax=axes[0], color="blue")
    axes[0].set_title(f"{country_name} - GNI per Capita over Time")
    axes[0].set_ylabel("GNI per Capita")
    axes[0].set_yscale("log")  # GNI often skewed, log scale helps

    # Plot 2: Life Expectancy
    sns.lineplot(data=country_df, x="year", y="life_expectancy", ax=axes[1], color="green")
    axes[1].set_title(f"{country_name} - Life Expectancy over Time")
    axes[1].set_ylabel("Life Expectancy (Years)")

    # Plot 3: Population
    sns.lineplot(data=country_df, x="year", y="population", ax=axes[2], color="red")
    axes[2].set_title(f"{country_name} - Population over Time")
    axes[2].set_ylabel("Population")
    axes[2].set_xlabel("Year")
    axes[2].set_yscale("log")  # population ranges widely

    plt.tight_layout()
    plt.show()

    ## plot_country_trends(df, "India")
plot_country_trends(df, "United States")
plot_country_trends(df, "China")

In [None]:
def plot_top10_by_year_range(df, start_year, end_year):
    """
    Plots top 10 countries by GNI per capita, Life Expectancy, and Population 
    for a given year range (averaged).
    """
    # Filter data for range
    df_range = df[(df["year"] >= start_year) & (df["year"] <= end_year)]

    # Take mean over the period for each country
    df_mean = df_range.groupby("country", as_index=False)[
        ["gni_per_cap", "life_expectancy", "population"]
    ].mean()

    # Get top 10 for each metric
    top_gni = df_mean.nlargest(10, "gni_per_cap")
    top_life = df_mean.nlargest(10, "life_expectancy")
    top_pop = df_mean.nlargest(10, "population")

    # Create subplots
    fig, axes = plt.subplots(1, 3, figsize=(20, 6))

    # Plot 1: Top 10 GNI
    sns.barplot(data=top_gni, x="gni_per_cap", y="country", ax=axes[0], palette="Blues_r")
    axes[0].set_title(f"Top 10 Countries by GNI per Capita ({start_year}-{end_year})")
    axes[0].set_xlabel("GNI per Capita")

    # Plot 2: Top 10 Life Expectancy
    sns.barplot(data=top_life, x="life_expectancy", y="country", ax=axes[1], palette="Greens_r")
    axes[1].set_title(f"Top 10 Countries by Life Expectancy ({start_year}-{end_year})")
    axes[1].set_xlabel("Life Expectancy (Years)")

    # Plot 3: Top 10 Population
    sns.barplot(data=top_pop, x="population", y="country", ax=axes[2], palette="Reds_r")
    axes[2].set_title(f"Top 10 Countries by Population ({start_year}-{end_year})")
    axes[2].set_xlabel("Population (log scale)")
    axes[2].set_xscale("log")  # population is huge, log scale helps readability

    plt.tight_layout()
    plt.show()
###
# Top 10 countries from 2000 to 2020
plot_top10_by_year_range(df, 2000, 2020)

# Top 10 countries in 2020 only
plot_top10_by_year_range(df, 2020, 2020)


In [None]:
def plot_top_rate_of_change(df, start_year, end_year):
    """
    Plots top 10 countries with the highest rate of change in
    GNI per capita, Life Expectancy, and Population between two years.
    """
    # Filter start & end year data
    start_df = df[df["year"] == start_year][["country", "gni_per_cap", "life_expectancy", "population"]]
    end_df   = df[df["year"] == end_year][["country", "gni_per_cap", "life_expectancy", "population"]]

    # Merge on country
    merged = start_df.merge(end_df, on="country", suffixes=("_start", "_end"))

    # Calculate rate of change (%)
    for col in ["gni_per_cap", "life_expectancy", "population"]:
        merged[f"{col}_roc"] = ((merged[f"{col}_end"] - merged[f"{col}_start"]) /
                                 merged[f"{col}_start"].replace(0, pd.NA)) * 100

    # Drop rows with NaN or infinite
    merged = merged.dropna()

    # Get top 10 for each
    top_gni = merged.nlargest(10, "gni_per_cap_roc")
    top_life = merged.nlargest(10, "life_expectancy_roc")
    top_pop = merged.nlargest(10, "population_roc")

    # Create subplots
    fig, axes = plt.subplots(1, 3, figsize=(20, 6))

    # Plot 1: GNI rate of change
    sns.barplot(data=top_gni, x="gni_per_cap_roc", y="country", ax=axes[0], palette="Blues_r")
    axes[0].set_title(f"Top 10 GNI per Capita Growth ({start_year}-{end_year})")
    axes[0].set_xlabel("Rate of Change (%)")

    # Plot 2: Life Expectancy rate of change
    sns.barplot(data=top_life, x="life_expectancy_roc", y="country", ax=axes[1], palette="Greens_r")
    axes[1].set_title(f"Top 10 Life Expectancy Growth ({start_year}-{end_year})")
    axes[1].set_xlabel("Rate of Change (%)")

    # Plot 3: Population rate of change
    sns.barplot(data=top_pop, x="population_roc", y="country", ax=axes[2], palette="Reds_r")
    axes[2].set_title(f"Top 10 Population Growth ({start_year}-{end_year})")
    axes[2].set_xlabel("Rate of Change (%)")

    plt.tight_layout()
    plt.show()

##
# Compare rate of change from 1960 to 2020
plot_top_rate_of_change(df, 1960, 2020)

# Compare only 2000 to 2020
plot_top_rate_of_change(df, 2000, 2020)


In [None]:
def plot_bottom10_by_year_range(df, start_year, end_year):
    """
    Plots bottom 10 countries by GNI per capita, Life Expectancy, and Population 
    for a given year range (averaged if range > 1 year).
    """
    # Filter data for range
    df_range = df[(df["year"] >= start_year) & (df["year"] <= end_year)]

    # Average across years if range > 1 year
    df_mean = df_range.groupby("country", as_index=False)[
        ["gni_per_cap", "life_expectancy", "population"]
    ].mean()

    # Get bottom 10 for each metric
    bottom_gni = df_mean.nsmallest(10, "gni_per_cap")
    bottom_life = df_mean.nsmallest(10, "life_expectancy")
    bottom_pop = df_mean.nsmallest(10, "population")

    # Create subplots
    fig, axes = plt.subplots(1, 3, figsize=(20, 6))

    # Plot 1: Bottom 10 GNI
    sns.barplot(data=bottom_gni, x="gni_per_cap", y="country", ax=axes[0], palette="Blues")
    axes[0].set_title(f"Bottom 10 Countries by GNI per Capita ({start_year}-{end_year})")
    axes[0].set_xlabel("GNI per Capita")

    # Plot 2: Bottom 10 Life Expectancy
    sns.barplot(data=bottom_life, x="life_expectancy", y="country", ax=axes[1], palette="Greens")
    axes[1].set_title(f"Bottom 10 Countries by Life Expectancy ({start_year}-{end_year})")
    axes[1].set_xlabel("Life Expectancy (Years)")

    # Plot 3: Bottom 10 Population
    sns.barplot(data=bottom_pop, x="population", y="country", ax=axes[2], palette="Reds")
    axes[2].set_title(f"Bottom 10 Countries by Population ({start_year}-{end_year})")
    axes[2].set_xlabel("Population")
    axes[2].set_xscale("log")  # makes sense for population

    plt.tight_layout()
    plt.show()
##
# Bottom 10 countries in 2020
plot_bottom10_by_year_range(df, 2020, 2020)

# Bottom 10 averaged from 2000 to 2020
plot_bottom10_by_year_range(df, 2000, 2020)


In [None]:
def plot_top_mean_bottom(df, year):
    """
    Show Top 10, Mean (world average), and Bottom 10 countries 
    in GNI per capita, Life Expectancy, and Population for a given year.
    """
    # Filter data for the year
    df_year = df[df["year"] == year]

    if df_year.empty:
        print(f"No data found for year {year}")
        return

    results = {}
    for col in ["gni_per_cap", "life_expectancy", "population"]:
        # Top 10
        top10 = df_year.nlargest(10, col)[["country", col]]

        # Bottom 10
        bottom10 = df_year.nsmallest(10, col)[["country", col]]

        # Mean (world average)
        mean_val = df_year[col].mean()

        results[col] = {"top10": top10, "bottom10": bottom10, "mean": mean_val}

    # -----------------------------
    # Plotting
    # -----------------------------
    fig, axes = plt.subplots(3, 1, figsize=(12, 15))

    palettes = {"gni_per_cap": "Blues_r", "life_expectancy": "Greens_r", "population": "Reds_r"}

    for ax, col in zip(axes, results.keys()):
        top10 = results[col]["top10"]
        bottom10 = results[col]["bottom10"]
        mean_val = results[col]["mean"]

        # Combine bottom, mean, top into one DataFrame for plotting
        mean_df = pd.DataFrame({"country": ["World Average"], col: [mean_val]})
        plot_df = pd.concat([bottom10, mean_df, top10])

        # Plot
        sns.barplot(data=plot_df, x=col, y="country", ax=ax, palette=palettes[col])
        ax.axvline(mean_val, color="black", linestyle="--", label="World Average")
        ax.set_title(f"{col.replace('_',' ').title()} - {year}")
        ax.set_xlabel(col.replace("_", " ").title())
        ax.legend()

        # Log scale only for population and GNI
        if col in ["gni_per_cap", "population"]:
            ax.set_xscale("log")

    plt.tight_layout()
    plt.show()
##
# For year 2020
plot_top_mean_bottom(df, 2020)

# For year 2000
plot_top_mean_bottom(df, 2000)


In [None]:
import plotly.express as px

def gapminder_bubble(df, start_year=1960, end_year=2020):
    """
    Creates an animated Gapminder-style bubble chart:
    - X-axis: GNI per capita (log scale)
    - Y-axis: Life Expectancy
    - Bubble size: Population
    - Color: Country
    - Animation: Year
    """
    # Filter years
    df_range = df[(df["year"] >= start_year) & (df["year"] <= end_year)].dropna()

    # Create animated bubble chart
    fig = px.scatter(
        df_range,
        x="gni_per_cap",
        y="life_expectancy",
        size="population",
        color="country",
        animation_frame="year",
        animation_group="country",
        hover_name="country",
        size_max=60,
        log_x=True,
        range_x=[100, df_range["gni_per_cap"].max() * 1.1],
        range_y=[df_range["life_expectancy"].min() - 5, df_range["life_expectancy"].max() + 5],
        title=f"Wealth vs Health vs Population ({start_year}-{end_year})"
    )

    fig.update_layout(
        xaxis_title="GNI per Capita (log scale)",
        yaxis_title="Life Expectancy (Years)",
        legend_title="Country",
        template="plotly_white"
    )

    fig.show()
##
# Run an animation from 1960 to 2020
gapminder_bubble(df, 1960, 2020)

# Run animation across the full dataset (e.g., 1800–2020)
gapminder_bubble(df, 1800, 2020)


---

# 🌍 Wealth, Health & Population Analysis – Summary

## 📊 Key Insights

- **GNI per Capita (Wealth)**
  - Wealth has grown significantly worldwide since the 20th century.
  - High-income countries cluster at the top; low-income countries remain behind.
  - Emerging economies (e.g., China, India) show the **fastest growth rates**.

- **Life Expectancy (Health)**
  - Global life expectancy has steadily improved over the last 200 years.
  - Some countries still lag behind due to conflict, poverty, or health crises.
  - Clear **positive correlation** between wealth and health.

- **Population**
  - World population has increased **exponentially** since 1900.
  - Growth is concentrated in Asia & Africa.
  - Population size strongly influences global averages.

- **Relationships**
  - 💰 **Wealth vs Health** → richer countries live longer (but diminishing returns at the top).
  - 👥 **Population vs Health** → high populations do not always mean high health outcomes.
  - 📈 **Population vs Wealth** → large populations with rapid economic growth (China, India) reshape global dynamics.

- **Top vs Bottom**
  - Top 10 countries dominate wealth & health indicators.
  - Bottom 10 countries highlight persistent inequalities.
  - Global **mean values** show steady improvement but mask disparities.

- **Rate of Change**
  - Fastest improvements: Asia (China, South Korea, India).
  - Slower or negative growth in some conflict-affected regions.
  - Life expectancy shows the most **consistent global gains**.

---

✅ This analysis highlights the **strong interconnection between wealth, health, and population**, and how inequalities persist despite global progress.


# ✅ Recommendations & Conclusion

## 🔎 Key Recommendations
- **Policy Focus**
  - Invest in **universal healthcare** and **education** to improve life expectancy.
  - Target **economic reforms** in low-income countries to reduce wealth gaps.
  - Promote **sustainable population management** in rapidly growing regions.

- **Global Collaboration**
  - Strengthen **international aid & partnerships** for healthcare and infrastructure.
  - Share technology and innovations to accelerate development in lagging regions.
  - Support climate resilience and food security to safeguard health outcomes.

- **Data & Monitoring**
  - Encourage **open data initiatives** to track progress more transparently.
  - Use **predictive analytics & AI** to forecast demographic and health challenges.
  - Monitor **inequality trends** to inform targeted interventions.

---

## 🎯 Conclusion
- 🌍 **Wealth, health, and population are deeply interconnected.**  
- 📈 Over the last 200 years, the world has seen **remarkable improvements** in life expectancy and income.  
- ⚖️ Yet, **inequalities remain wide**, with the bottom countries still lagging.  
- 🚀 Emerging economies (e.g., China, India, Brazil) are reshaping the global balance.  
- ✅ The future requires **inclusive, sustainable policies** to ensure progress benefits everyone.  
