# Data Collection Module 2: World Bank API for GDP Data

<b> API Details: </b>

The World Bank API provides extensive access to global economic data, including Gross Domestic Product (GDP) and GDP growth rates as part of its World Development Indicators dataset. This encompasses both nominal GDP (current US$) and GDP growth (annual %), covering over 200 countries and territories with historical data and recent estimates. World Development Indicators (WDI) Database 

<b> Data on GDP: </b>

Indicator for Nominal GDP: "GDP (current US$)" (Indicator Code: NY.GDP.MKTP.CD).
Indicator for GDP Growth: "GDP growth (annual %)" (Indicator Code: NY.GDP.MKTP.KD.ZG).
<b> Data Update Frequency: </b>

GDP data within the World Bank's World Development Indicators is updated annually, providing a comprehensive overview of global economic trends and country-specific economic health.

<b> Accessing the API: </b>

Access to the World Bank API is open to the public and does not require an API key. This facilitates seamless integration into projects seeking to utilize global economic data for analysis, visualization, or reporting.

<b> API Documentation: </b> https://datahelpdesk.worldbank.org/knowledgebase/articles/898581-api-documentation

<b>Process </b>: Access World Bank API, Collect relevant data, Preprocess the data and tune it according to our requirements, store the data.


### `Goal`: Create a comprehensive real-time dataset regarding major economical statistics. 

In [1]:
# Code snippet for data collection from the API
import requests
import pandas as pd

def fetch_gdp_data(indicator_code, country_code="all"):
    """Fetch GDP data for a given country code and indicator."""
    url = f"http://api.worldbank.org/v2/country/{country_code}/indicator/{indicator_code}?format=json&date=1960:2023&per_page=10000"
    response = requests.get(url)
    data = response.json()

    if len(data) == 2 and isinstance(data[1], list):
        # Extract the indicator code or name from the dictionary for each row
        for row in data[1]:
            row['indicator'] = row['indicator']['id']
        return data[1]
    else:
        return None

# GDP (current US$) - Indicator Code: NY.GDP.MKTP.CD
gdp_data_all_countries = fetch_gdp_data("NY.GDP.MKTP.CD", "all")
gdp_data_world = fetch_gdp_data("NY.GDP.MKTP.CD", "WLD")

# GDP growth (annual %) - Indicator Code: NY.GDP.MKTP.KD.ZG
gdp_growth_data_all_countries = fetch_gdp_data("NY.GDP.MKTP.KD.ZG", "all")
gdp_growth_data_world = fetch_gdp_data("NY.GDP.MKTP.KD.ZG", "WLD")

# Combine GDP and GDP growth data
combined_data = gdp_data_all_countries + gdp_data_world + gdp_growth_data_all_countries + gdp_growth_data_world

# Converting to DataFrame
df_gdp = pd.DataFrame(combined_data)

# Filter for necessary columns and rename them
df_gdp = df_gdp[['countryiso3code', 'date', 'value', 'indicator']]
df_gdp.columns = ['Country Code', 'Year', 'Value', 'Indicator']

# Pivot the dataset to have 'Year' as rows, 'Country Code' as columns, and 'Value' as values
# Separate pivots for GDP and GDP Growth due to different indicators
df_gdp_pivot = df_gdp.pivot_table(index=['Year', 'Indicator'], columns='Country Code', values='Value')

# Saving the data to CSV files
csv_file_path_gdp = '/Users/hrishi/original-desktop/Data Science/Projects/Major_Project/data-collection/data/world_gdp_data.csv'
csv_file_path_gdp_growth = '/Users/hrishi/original-desktop/Data Science/Projects/Major_Project/data-collection/data/world_gdp_growth_data.csv'
df_gdp_pivot.xs('NY.GDP.MKTP.CD', level='Indicator').to_csv(csv_file_path_gdp)
df_gdp_pivot.xs('NY.GDP.MKTP.KD.ZG', level='Indicator').to_csv(csv_file_path_gdp_growth)


## Data Preprocessing

In [2]:
df1 = pd.read_csv(csv_file_path_gdp)
df2 = pd.read_csv(csv_file_path_gdp_growth)

In [3]:
df1

Unnamed: 0,Year,Unnamed: 1,ABW,AFE,AFG,AFW,AGO,ALB,AND,ARB,...,SWZ,TCD,TEA,TEC,TLA,TMN,TSA,TSS,WLD,XKX
0,1960,4.644624e+11,,1.847810e+10,5.377778e+08,1.041165e+10,,,,,...,3.507685e+07,3.135827e+08,8.091161e+10,,,,4.714778e+10,2.834930e+10,1.381135e+12,
1,1961,4.819514e+11,,1.936631e+10,5.488889e+08,1.113592e+10,,,,,...,4.302604e+07,3.339753e+08,7.136963e+10,,,,5.030731e+10,2.996049e+10,1.446356e+12,
2,1962,5.129296e+11,,2.050647e+10,5.466667e+08,1.195171e+10,,,,,...,4.592796e+07,3.576357e+08,6.517602e+10,,,,5.369542e+10,3.190378e+10,1.546369e+12,
3,1963,5.545200e+11,,2.224273e+10,7.511112e+08,1.268581e+10,,,,,...,5.412944e+07,3.717670e+08,7.051144e+10,,,,6.039689e+10,3.429368e+10,1.670666e+12,
4,1964,6.116043e+11,,2.429433e+10,8.000000e+08,1.384900e+10,,,,,...,6.498055e+07,3.922475e+08,8.138635e+10,,,,6.914588e+10,3.744907e+10,1.832616e+12,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58,2018,2.156079e+13,3.276184e+09,1.012521e+12,1.805322e+10,7.681582e+11,7.945069e+10,1.515642e+10,3.218419e+09,2.845898e+12,...,4.666598e+09,1.123917e+10,1.661986e+13,4.076953e+12,5.500846e+12,1.385561e+12,3.534083e+12,1.780680e+12,8.654268e+13,7.878760e+09
59,2019,2.186863e+13,3.395799e+09,1.006191e+12,1.879944e+10,8.234056e+11,7.089796e+10,1.540183e+10,3.155150e+09,2.871655e+12,...,4.466215e+09,1.131495e+10,1.719201e+13,4.159144e+12,5.409466e+12,1.391491e+12,3.658217e+12,1.829597e+12,8.777740e+13,7.899738e+09
60,2020,2.124467e+13,2.558906e+09,9.288802e+11,1.995593e+10,7.869624e+11,4.850156e+10,1.516273e+10,2.891002e+09,2.535509e+12,...,3.982237e+09,1.071540e+10,1.746580e+13,3.900173e+12,4.561170e+12,1.269352e+12,3.489924e+12,1.715843e+12,8.527268e+13,7.717145e+09
61,2021,2.420550e+13,3.103184e+09,1.086531e+12,1.426650e+10,8.449275e+11,6.650513e+10,1.793057e+10,3.325144e+09,2.930480e+12,...,4.850843e+09,1.177998e+10,2.081316e+13,4.616362e+12,5.281323e+12,1.482950e+12,4.062850e+12,1.931458e+12,9.715318e+13,9.412034e+09


In [4]:
df2

Unnamed: 0,Year,Unnamed: 1,ABW,AFE,AFG,AFW,AGO,ALB,AND,ARB,...,SWZ,TCD,TEA,TEC,TLA,TMN,TSA,TSS,WLD,XKX
0,1961,2.565429,,0.254876,,1.857727,,,,,...,,1.397744,-13.327861,,6.092028,,4.213748,0.977380,3.772924,
1,1962,4.071774,,7.965827,,3.772943,,,,,...,,5.360116,-0.640124,,4.187734,,3.407900,6.048576,5.375360,
2,1963,5.435499,,5.148206,,7.277246,,,,,...,,-1.599454,6.439680,,1.913796,,5.291576,6.100847,5.194927,
3,1964,7.248090,,4.579317,,5.412950,,,,,...,,-2.510940,10.784184,,7.266195,,7.758975,4.956462,6.564476,
4,1965,5.527156,,5.346211,,4.084749,,,,,...,,0.606228,10.781795,,5.556513,,-0.541428,4.773029,5.577451,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57,2018,3.753718,2.381730,2.491355,1.189228,2.838829,-1.316362,4.019346,1.588765,2.438225,...,2.380095,2.374038,6.524935,3.687083,1.681927,1.807796,6.334805,2.646421,3.276283,3.406631
58,2019,3.283424,-2.302837,2.040617,3.911603,3.200919,-0.702273,2.087712,2.015548,1.359776,...,2.692168,3.247182,5.781708,2.654353,0.711972,0.898186,3.924138,2.559391,2.590785,4.756801
59,2020,-1.999312,-23.982580,-2.799038,-2.351101,-0.938162,-5.638215,-3.302082,-11.183940,-5.048285,...,-1.559643,-1.600007,1.246190,-1.572027,-6.525979,-3.128088,-4.623995,-1.961830,-3.057810,-5.340275
60,2021,5.283896,27.639357,4.300441,-20.738839,3.976317,1.199211,8.908528,8.287200,3.732646,...,10.683337,-1.199991,7.601198,7.379748,7.235615,4.339013,8.254590,4.153095,6.228594,10.745657


In [5]:
# Convert all numeric values to integers for non-null values, except the 'Year' column
df1 = df1.applymap(lambda x: '{:.0f}'.format(x) if isinstance(x, float) and not pd.isnull(x) else x)

# Dropping the unnecessary column
df1.drop('Unnamed: 1', axis = 1, inplace = True)
df2.drop('Unnamed: 1', axis = 1, inplace = True)

# Now all the numeric columns in df1 are converted to strings representing full integers without 'e' notation
df1

  df1 = df1.applymap(lambda x: '{:.0f}'.format(x) if isinstance(x, float) and not pd.isnull(x) else x)


Unnamed: 0,Year,ABW,AFE,AFG,AFW,AGO,ALB,AND,ARB,ARG,...,SWZ,TCD,TEA,TEC,TLA,TMN,TSA,TSS,WLD,XKX
0,1960,,18478095142,537777811,10411646287,,,,,,...,35076846,313582728,80911608954,,,,47147779567,28349301535,1381135479871,
1,1961,,19366314294,548888896,11135924728,,,,,,...,43026043,333975336,71369633801,,,,50307314799,29960492453,1446355951851,
2,1962,,20506467178,546666678,11951712282,,,,,,...,45927962,357635713,65176021559,,,,53695416446,31903784358,1546369168289,
3,1963,,22242734491,751111191,12685805890,,,,,,...,54129438,371767002,70511435007,,,,60396894530,34293676625,1670666024903,
4,1964,,24294329780,800000044,13848998669,,,,,,...,64980554,392247518,81386349483,,,,69145880250,37449073939,1832615520368,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58,2018,3276184358,1012521425296,18053222735,768158194632,79450688232,15156424015,3218418632,2845898419157,524819892360,...,4666598024,11239167898,16619858519558,4076953150707,5500845963862,1385560841912,3534083235324,1780679619929,86542678179037,7878759715
59,2019,3395798883,1006191000190,18799444415,823405580188,70897962713,15401826127,3155150256,2871654771821,447754683615,...,4466214591,11314951092,17192007435431,4159143876282,5409465895796,1391490811737,3658216510053,1829596580378,87777403956339,7899737577
60,2020,2558906304,928880235177,19955929061,786962436538,48501561230,15162734205,2891002460,2535508549316,385740508437,...,3982236693,10715396042,17465796008752,3900173017455,4561170249972,1269352235068,3489923527068,1715842671715,85272676295874,7717145218
61,2021,3103184102,1086530704900,14266499430,844927536438,66505129989,17930565119,3325143693,2930480380711,487902572164,...,4850842572,11779981333,20813163295456,4616362221589,5281323492901,1482949612650,4062850047299,1931458241338,97153181162291,9412034299


In [6]:
# Converting country codes to country names for readability

# Fetch country information from the World Bank API
url = "http://api.worldbank.org/v2/country?per_page=300&format=json"
response = requests.get(url)
countries_data = response.json()

# Extract relevant data for mapping codes to names, add 'WLD' manually
countries = {country['id']: country['name'] for country in countries_data[1]}
countries['WLD'] = 'World'

In [7]:
print(f'Mapping of Codes to Countries: \n {countries}')

Mapping of Codes to Countries: 
 {'ABW': 'Aruba', 'AFE': 'Africa Eastern and Southern', 'AFG': 'Afghanistan', 'AFR': 'Africa', 'AFW': 'Africa Western and Central', 'AGO': 'Angola', 'ALB': 'Albania', 'AND': 'Andorra', 'ARB': 'Arab World', 'ARE': 'United Arab Emirates', 'ARG': 'Argentina', 'ARM': 'Armenia', 'ASM': 'American Samoa', 'ATG': 'Antigua and Barbuda', 'AUS': 'Australia', 'AUT': 'Austria', 'AZE': 'Azerbaijan', 'BDI': 'Burundi', 'BEA': 'East Asia & Pacific (IBRD-only countries)', 'BEC': 'Europe & Central Asia (IBRD-only countries)', 'BEL': 'Belgium', 'BEN': 'Benin', 'BFA': 'Burkina Faso', 'BGD': 'Bangladesh', 'BGR': 'Bulgaria', 'BHI': 'IBRD countries classified as high income', 'BHR': 'Bahrain', 'BHS': 'Bahamas, The', 'BIH': 'Bosnia and Herzegovina', 'BLA': 'Latin America & the Caribbean (IBRD-only countries)', 'BLR': 'Belarus', 'BLZ': 'Belize', 'BMN': 'Middle East & North Africa (IBRD-only countries)', 'BMU': 'Bermuda', 'BOL': 'Bolivia', 'BRA': 'Brazil', 'BRB': 'Barbados', 'BRN'

In [8]:
# Function to rename columns based on the mapping
def rename_columns(df, mapping):
    # Create a new mapping for the existing columns in the DataFrame
    new_columns = {col: mapping.get(col, col) for col in df.columns}
    # Rename the columns using the new mapping
    df.rename(columns=new_columns, inplace=True)

# Rename columns in df1 and df2
rename_columns(df1, countries)
rename_columns(df2, countries)

In [9]:
df1

Unnamed: 0,Year,Aruba,Africa Eastern and Southern,Afghanistan,Africa Western and Central,Angola,Albania,Andorra,Arab World,Argentina,...,Eswatini,Chad,East Asia & Pacific (IDA & IBRD countries),Europe & Central Asia (IDA & IBRD countries),Latin America & the Caribbean (IDA & IBRD countries),Middle East & North Africa (IDA & IBRD countries),South Asia (IDA & IBRD),Sub-Saharan Africa (IDA & IBRD countries),World,Kosovo
0,1960,,18478095142,537777811,10411646287,,,,,,...,35076846,313582728,80911608954,,,,47147779567,28349301535,1381135479871,
1,1961,,19366314294,548888896,11135924728,,,,,,...,43026043,333975336,71369633801,,,,50307314799,29960492453,1446355951851,
2,1962,,20506467178,546666678,11951712282,,,,,,...,45927962,357635713,65176021559,,,,53695416446,31903784358,1546369168289,
3,1963,,22242734491,751111191,12685805890,,,,,,...,54129438,371767002,70511435007,,,,60396894530,34293676625,1670666024903,
4,1964,,24294329780,800000044,13848998669,,,,,,...,64980554,392247518,81386349483,,,,69145880250,37449073939,1832615520368,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58,2018,3276184358,1012521425296,18053222735,768158194632,79450688232,15156424015,3218418632,2845898419157,524819892360,...,4666598024,11239167898,16619858519558,4076953150707,5500845963862,1385560841912,3534083235324,1780679619929,86542678179037,7878759715
59,2019,3395798883,1006191000190,18799444415,823405580188,70897962713,15401826127,3155150256,2871654771821,447754683615,...,4466214591,11314951092,17192007435431,4159143876282,5409465895796,1391490811737,3658216510053,1829596580378,87777403956339,7899737577
60,2020,2558906304,928880235177,19955929061,786962436538,48501561230,15162734205,2891002460,2535508549316,385740508437,...,3982236693,10715396042,17465796008752,3900173017455,4561170249972,1269352235068,3489923527068,1715842671715,85272676295874,7717145218
61,2021,3103184102,1086530704900,14266499430,844927536438,66505129989,17930565119,3325143693,2930480380711,487902572164,...,4850842572,11779981333,20813163295456,4616362221589,5281323492901,1482949612650,4062850047299,1931458241338,97153181162291,9412034299


In [10]:
df2

Unnamed: 0,Year,Aruba,Africa Eastern and Southern,Afghanistan,Africa Western and Central,Angola,Albania,Andorra,Arab World,Argentina,...,Eswatini,Chad,East Asia & Pacific (IDA & IBRD countries),Europe & Central Asia (IDA & IBRD countries),Latin America & the Caribbean (IDA & IBRD countries),Middle East & North Africa (IDA & IBRD countries),South Asia (IDA & IBRD),Sub-Saharan Africa (IDA & IBRD countries),World,Kosovo
0,1961,,0.254876,,1.857727,,,,,5.427843,...,,1.397744,-13.327861,,6.092028,,4.213748,0.977380,3.772924,
1,1962,,7.965827,,3.772943,,,,,-0.852022,...,,5.360116,-0.640124,,4.187734,,3.407900,6.048576,5.375360,
2,1963,,5.148206,,7.277246,,,,,-5.308197,...,,-1.599454,6.439680,,1.913796,,5.291576,6.100847,5.194927,
3,1964,,4.579317,,5.412950,,,,,10.130298,...,,-2.510940,10.784184,,7.266195,,7.758975,4.956462,6.564476,
4,1965,,5.346211,,4.084749,,,,,10.569433,...,,0.606228,10.781795,,5.556513,,-0.541428,4.773029,5.577451,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57,2018,2.381730,2.491355,1.189228,2.838829,-1.316362,4.019346,1.588765,2.438225,-2.617396,...,2.380095,2.374038,6.524935,3.687083,1.681927,1.807796,6.334805,2.646421,3.276283,3.406631
58,2019,-2.302837,2.040617,3.911603,3.200919,-0.702273,2.087712,2.015548,1.359776,-2.000861,...,2.692168,3.247182,5.781708,2.654353,0.711972,0.898186,3.924138,2.559391,2.590785,4.756801
59,2020,-23.982580,-2.799038,-2.351101,-0.938162,-5.638215,-3.302082,-11.183940,-5.048285,-9.900485,...,-1.559643,-1.600007,1.246190,-1.572027,-6.525979,-3.128088,-4.623995,-1.961830,-3.057810,-5.340275
60,2021,27.639357,4.300441,-20.738839,3.976317,1.199211,8.908528,8.287200,3.732646,10.718010,...,10.683337,-1.199991,7.601198,7.379748,7.235615,4.339013,8.254590,4.153095,6.228594,10.745657


In [11]:
df2[['Year', 'India', 'World']]

Unnamed: 0,Year,India,World
0,1961,3.722743,3.772924
1,1962,2.931128,5.375360
2,1963,5.994353,5.194927
3,1964,7.452950,6.564476
4,1965,-2.635770,5.577451
...,...,...,...
57,2018,6.453851,3.276283
58,2019,3.871437,2.590785
59,2020,-5.831053,-3.057810
60,2021,9.050278,6.228594


### Export the data

In [12]:
df1.to_csv(csv_file_path_gdp)
df2.to_csv(csv_file_path_gdp_growth)