Final Project 

Data - World Development Indicators (WDI) collection from the World Bank, which contains economic, social, and environmental indicators for every country from 1960–2024.



Problem Statement : 

Does economic growth necessarily lead to higher carbon emissions, or have developed nations succeeded in decoupling growth from environmental impact?

Goal:

Analyze the relationship between GDP growth and CO2 emissions across countries.

Identify patterns showing whether wealthier nations emit less CO2 per unit of economic output compared to developing economies.

Predict future emission trends based on GDP growth using a machine learning model.

Provide insights that can guide sustainable economic policies for developing countries.

Phase 1

Use World Bank API to fetch data
Use the World Bank Data API to fetch real-time data on CO2 emissions, GDP per capita, energy use, and population. Combine and clean data using Pandas and NumPy for structured analysis.

In [56]:
!pip install wbgapi pandas



In [2]:
import wbgapi as wb
import pandas as pd
import numpy as np
import warnings

In [38]:
indicators = {
    'EN.ATM.CO2E.PC': 'CO2_emissions_per_capita',  # CO2 emissions (metric tons per capita)
    'NY.GDP.PCAP.CD': 'GDP_per_capita_USD',        # GDP per capita (current US$)
    'EG.USE.PCAP.KG.OE': 'Energy_use_per_capita',  # energy use (kg of oil equivalent per capita)
    'SP.POP.TOTL': 'Population'                    # population, total
}

In [39]:
countries = 'all'
time_range = range(1960, 2025)

In [42]:
df_raw = wb.data.DataFrame(
        indicators.keys(),
        economy=countries,
        time=time_range,
        skipAggs=True, # exclude regional/income aggregates, keep only actual countries
        labels=True    # include labels for country and time
    )
    
print("\nData fetched successfully.")


Data fetched successfully.


Load and Clean Data

Filter relevant indicators and countries.

Handle missing or NaN values.

Convert year columns to long format.

In [43]:
df_raw = df_raw.rename(columns=indicators)

In [44]:
df_cleaned = df_raw.reset_index()

# see the column names of the new df
print(df_cleaned.columns.tolist())

['economy', 'series', 'Country', 'Series', 'YR1960', 'YR1961', 'YR1962', 'YR1963', 'YR1964', 'YR1965', 'YR1966', 'YR1967', 'YR1968', 'YR1969', 'YR1970', 'YR1971', 'YR1972', 'YR1973', 'YR1974', 'YR1975', 'YR1976', 'YR1977', 'YR1978', 'YR1979', 'YR1980', 'YR1981', 'YR1982', 'YR1983', 'YR1984', 'YR1985', 'YR1986', 'YR1987', 'YR1988', 'YR1989', 'YR1990', 'YR1991', 'YR1992', 'YR1993', 'YR1994', 'YR1995', 'YR1996', 'YR1997', 'YR1998', 'YR1999', 'YR2000', 'YR2001', 'YR2002', 'YR2003', 'YR2004', 'YR2005', 'YR2006', 'YR2007', 'YR2008', 'YR2009', 'YR2010', 'YR2011', 'YR2012', 'YR2013', 'YR2014', 'YR2015', 'YR2016', 'YR2017', 'YR2018', 'YR2019', 'YR2020', 'YR2021', 'YR2022', 'YR2023', 'YR2024']


In [45]:
print("\nRaw Data first 5 Rows")
display(df_raw.head())


Raw Data first 5 Rows


Unnamed: 0_level_0,Unnamed: 1_level_0,Country,Series,YR1960,YR1961,YR1962,YR1963,YR1964,YR1965,YR1966,YR1967,...,YR2015,YR2016,YR2017,YR2018,YR2019,YR2020,YR2021,YR2022,YR2023,YR2024
economy,series,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
ZWE,NY.GDP.PCAP.CD,Zimbabwe,GDP per capita (current US$),276.419784,279.016489,275.545608,277.005701,281.744539,294.145359,278.567519,294.210571,...,1386.418559,1407.420964,3448.086991,2271.852504,1683.913136,1730.45391,1724.387271,2040.546587,2156.034093,2656.409377
ZMB,NY.GDP.PCAP.CD,Zambia,GDP per capita (current US$),221.559849,209.693206,202.281031,203.219451,229.979246,287.425476,325.025847,340.57994,...,1295.877887,1239.085279,1483.465773,1463.899979,1258.986198,951.644317,1127.160779,1447.123101,1330.727806,1235.084665
YEM,NY.GDP.PCAP.CD,"Yemen, Rep.",GDP per capita (current US$),,,,,,,,,...,1362.173812,975.359417,811.16597,633.887202,,,,,,
PSE,NY.GDP.PCAP.CD,West Bank and Gaza,GDP per capita (current US$),,,,,,,,,...,3272.154324,3527.613824,3620.360487,3562.330943,3656.858271,3233.568638,3678.635657,3799.95527,3455.028529,2592.305912
VIR,NY.GDP.PCAP.CD,Virgin Islands (U.S.),GDP per capita (current US$),,,,,,,,,...,34007.352941,35324.974887,35365.069304,36663.208755,38633.529892,39787.374165,42571.077737,44320.909186,,


In [46]:
df_cleaned = df_raw.reset_index()

In [47]:
column_mapping = {
    'economy': 'Country_Code',
    'time': 'Year',
    'EN.ATM.CO2E.PC': 'CO2_emissions_per_capita',
    'NY.GDP.PCAP.CD': 'GDP_per_capita_USD'
}

In [48]:
df_cleaned = df_cleaned.rename(columns=column_mapping)

In [60]:
df_cleaned = df_cleaned.rename(columns={'economy': 'Country_Code', 'time': 'Year'})

In [53]:

df_cleaned = df_cleaned.rename(columns={'economy': 'Country_Code'})

In [57]:
df_countries = wb.economy.DataFrame(skipAggs=False)

In [None]:
# rename columns from country metadata
df_countries = df_countries[['incomeLevel', 'region']].reset_index()
df_countries = df_countries.rename(columns={'id': 'Country_Code', 'incomeLevel': 'Income_Level', 'region': 'Region'})

In [None]:

df_countries = wb.economy.DataFrame(skipAggs=False)

In [59]:
# merge the fetched data with the country metadata
df_final = pd.merge(df_cleaned, df_countries, on='Country_Code', how='left')