# Data Collection Module 6: World Bank API for Unemployment Data

<b>API Details:</b>

The World Bank API grants access to a vast array of global economic and social data, including key labor market indicators such as unemployment rates, through its World Development Indicators dataset. This comprehensive dataset includes the unemployment rate, representing the percentage of the labor force that is without work but available for and seeking employment, across over 200 countries and territories, providing both historical data and recent estimates.

<b>Data on Unemployment:</b>

Indicator for Unemployment Rate: "Unemployment, total (% of total labor force)" (Indicator Code: SL.UEM.TOTL.ZS).

<b>Data Update Frequency:</b>

Unemployment data within the World Bank's World Development Indicators is updated annually, ensuring a current and detailed overview of labor market conditions worldwide and the economic health of nations.

<b>Accessing the API:</b>

The World Bank API is publicly accessible and does not require an API key, enabling straightforward integration of critical unemployment statistics into projects aimed at analyzing, visualizing, or reporting on global and country-specific labor market trends.

<b>API Documentation:</b> https://datahelpdesk.worldbank.org/knowledgebase/articles/898581-api-documentation

<b>Process:</b> This module involves leveraging the World Bank API to collect relevant data on unemployment rates, followed by preprocessing this data to align with specific project requirements, and ultimately organizing the refined dataset for future analysis, visualization, or reporting.

Goal: Assemble an up-to-date and comprehensive dataset on global unemployment statistics.

In [1]:
# Code snippet for data collection from the API
import requests
import pandas as pd

def fetch_unemployment_data(indicator_code, country_code="all"):
    """Fetch unemployment rate data for a given country code and indicator."""
    url = f"http://api.worldbank.org/v2/country/{country_code}/indicator/{indicator_code}?format=json&date=1960:2023&per_page=10000"
    response = requests.get(url)
    data = response.json()

    if len(data) == 2 and isinstance(data[1], list):
        # Extract the indicator code or name from the dictionary for each row
        for row in data[1]:
            row['indicator'] = row['indicator']['id']
        return data[1]
    else:
        return None

# Unemployment, total (% of total labor force) - Indicator Code: SL.UEM.TOTL.ZS
unemployment_data_all_countries = fetch_unemployment_data("SL.UEM.TOTL.ZS", "all")
unemployment_data_world = fetch_unemployment_data("SL.UEM.TOTL.ZS", "WLD")

# Combine unemployment data
combined_data = unemployment_data_all_countries + unemployment_data_world

# Converting to DataFrame
df_unemployment = pd.DataFrame(combined_data)

# Filter for necessary columns and rename them
df_unemployment = df_unemployment[['countryiso3code', 'date', 'value', 'indicator']]
df_unemployment.columns = ['Country Code', 'Year', 'Unemployment Rate', 'Indicator']

# Pivot the dataset to have 'Year' as rows, 'Country Code' as columns, and 'Unemployment Rate' as values
df_unemployment_pivot = df_unemployment.pivot_table(index='Year', columns='Country Code', values='Unemployment Rate')

# Saving the data to CSV files
csv_file_path_unemployment = '/Users/hrishi/original-desktop/Data Science/Projects/Major_Project/data-collection/data/world_unemployment_data.csv'
df_unemployment_pivot.to_csv(csv_file_path_unemployment)

print("Unemployment data has been saved.")


Unemployment data has been saved.


## Data Preprocessing

In [2]:
df = pd.read_csv(csv_file_path_unemployment)
df.tail(10)

Unnamed: 0,Year,Unnamed: 1,AFE,AFG,AFW,AGO,ALB,ARB,ARG,ARM,...,SST,SWZ,TCD,TEA,TEC,TLA,TMN,TSA,TSS,WLD
23,2014,5.983684,6.8845,7.91,3.932125,16.317,18.05,11.120267,7.27,11.989,...,9.411413,23.774,0.883,4.062431,7.567357,6.204175,11.823229,6.98891,5.784343,6.022234
24,2015,5.952401,6.983064,9.002,4.197576,16.448,17.19,11.282761,7.583,12.308,...,9.251919,23.263,0.931,4.137486,7.535756,6.732944,12.257082,7.10764,5.949869,6.055028
25,2016,5.884376,7.152157,10.092,4.16413,16.577,15.42,11.076547,8.085,12.625,...,9.234971,22.72,1.022,4.054018,7.370061,7.831663,12.396321,6.922156,6.0445,6.021302
26,2017,5.777685,7.274298,11.18,4.238482,16.639,13.62,11.45384,8.35,12.914,...,8.827796,22.65,1.111,3.931583,7.017382,8.140188,12.626251,6.957783,6.15268,5.92976
27,2018,5.590956,7.237563,11.131,4.266695,16.626,12.3,11.059249,9.22,13.21,...,8.389649,22.637,1.13,3.824603,6.574347,8.04481,12.15805,6.990164,6.139889,5.768692
28,2019,5.416838,7.426777,11.082,4.277483,16.5,11.47,10.673567,9.84,12.2,...,8.236355,22.535,1.057,3.91211,6.621041,8.124764,11.385762,6.200332,6.26435,5.591542
29,2020,6.490695,7.910291,11.71,4.737501,16.698,12.833,11.975018,11.46,12.18,...,9.309683,24.769,1.684,4.391316,7.122652,10.367272,12.027818,7.503444,6.738893,6.603279
30,2021,6.020522,8.303939,12.075,4.585014,15.8,12.59,11.684176,8.74,10.01,...,9.308139,24.372,1.591,4.140853,6.74847,9.334103,11.737017,6.393257,6.933669,6.064105
31,2022,5.228406,7.748144,14.1,3.80908,14.478,11.629,10.732883,6.805,8.588,...,8.71609,22.643,1.1,4.253778,5.597056,7.002037,10.857603,5.050291,6.293733,5.267477
32,2023,5.150237,7.740008,15.378,3.6645,14.223,11.083,10.758283,6.841,8.378,...,8.509249,22.19,0.992,4.20979,5.14342,6.27509,10.75754,4.949247,6.23516,5.123161


In [3]:
# Dropping the unnecessary column
df.drop('Unnamed: 1', axis = 1, inplace = True)

# Converting country codes to country names for readability

# Fetch country information from the World Bank API
url = "http://api.worldbank.org/v2/country?per_page=300&format=json"
response = requests.get(url)
countries_data = response.json()

# Extract relevant data for mapping codes to names, add 'WLD' manually
countries = {country['id']: country['name'] for country in countries_data[1]}
countries['WLD'] = 'World'

In [4]:
print(f'Mapping of Codes to Countries: \n {countries}')

Mapping of Codes to Countries: 
 {'ABW': 'Aruba', 'AFE': 'Africa Eastern and Southern', 'AFG': 'Afghanistan', 'AFR': 'Africa', 'AFW': 'Africa Western and Central', 'AGO': 'Angola', 'ALB': 'Albania', 'AND': 'Andorra', 'ARB': 'Arab World', 'ARE': 'United Arab Emirates', 'ARG': 'Argentina', 'ARM': 'Armenia', 'ASM': 'American Samoa', 'ATG': 'Antigua and Barbuda', 'AUS': 'Australia', 'AUT': 'Austria', 'AZE': 'Azerbaijan', 'BDI': 'Burundi', 'BEA': 'East Asia & Pacific (IBRD-only countries)', 'BEC': 'Europe & Central Asia (IBRD-only countries)', 'BEL': 'Belgium', 'BEN': 'Benin', 'BFA': 'Burkina Faso', 'BGD': 'Bangladesh', 'BGR': 'Bulgaria', 'BHI': 'IBRD countries classified as high income', 'BHR': 'Bahrain', 'BHS': 'Bahamas, The', 'BIH': 'Bosnia and Herzegovina', 'BLA': 'Latin America & the Caribbean (IBRD-only countries)', 'BLR': 'Belarus', 'BLZ': 'Belize', 'BMN': 'Middle East & North Africa (IBRD-only countries)', 'BMU': 'Bermuda', 'BOL': 'Bolivia', 'BRA': 'Brazil', 'BRB': 'Barbados', 'BRN'

In [5]:
# Function to rename columns based on the mapping
def rename_columns(df, mapping):
    # Create a new mapping for the existing columns in the DataFrame
    new_columns = {col: mapping.get(col, col) for col in df.columns}
    # Rename the columns using the new mapping
    df.rename(columns=new_columns, inplace=True)

# Rename columns in df1 and df2
rename_columns(df, countries)

In [6]:
df.tail(10)

Unnamed: 0,Year,Africa Eastern and Southern,Afghanistan,Africa Western and Central,Angola,Albania,Arab World,Argentina,Armenia,Australia,...,Small states,Eswatini,Chad,East Asia & Pacific (IDA & IBRD countries),Europe & Central Asia (IDA & IBRD countries),Latin America & the Caribbean (IDA & IBRD countries),Middle East & North Africa (IDA & IBRD countries),South Asia (IDA & IBRD),Sub-Saharan Africa (IDA & IBRD countries),World
23,2014,6.8845,7.91,3.932125,16.317,18.05,11.120267,7.27,11.989,6.08,...,9.411413,23.774,0.883,4.062431,7.567357,6.204175,11.823229,6.98891,5.784343,6.022234
24,2015,6.983064,9.002,4.197576,16.448,17.19,11.282761,7.583,12.308,6.06,...,9.251919,23.263,0.931,4.137486,7.535756,6.732944,12.257082,7.10764,5.949869,6.055028
25,2016,7.152157,10.092,4.16413,16.577,15.42,11.076547,8.085,12.625,5.71,...,9.234971,22.72,1.022,4.054018,7.370061,7.831663,12.396321,6.922156,6.0445,6.021302
26,2017,7.274298,11.18,4.238482,16.639,13.62,11.45384,8.35,12.914,5.59,...,8.827796,22.65,1.111,3.931583,7.017382,8.140188,12.626251,6.957783,6.15268,5.92976
27,2018,7.237563,11.131,4.266695,16.626,12.3,11.059249,9.22,13.21,5.3,...,8.389649,22.637,1.13,3.824603,6.574347,8.04481,12.15805,6.990164,6.139889,5.768692
28,2019,7.426777,11.082,4.277483,16.5,11.47,10.673567,9.84,12.2,5.16,...,8.236355,22.535,1.057,3.91211,6.621041,8.124764,11.385762,6.200332,6.26435,5.591542
29,2020,7.910291,11.71,4.737501,16.698,12.833,11.975018,11.46,12.18,6.46,...,9.309683,24.769,1.684,4.391316,7.122652,10.367272,12.027818,7.503444,6.738893,6.603279
30,2021,8.303939,12.075,4.585014,15.8,12.59,11.684176,8.74,10.01,5.12,...,9.308139,24.372,1.591,4.140853,6.74847,9.334103,11.737017,6.393257,6.933669,6.064105
31,2022,7.748144,14.1,3.80908,14.478,11.629,10.732883,6.805,8.588,3.7,...,8.71609,22.643,1.1,4.253778,5.597056,7.002037,10.857603,5.050291,6.293733,5.267477
32,2023,7.740008,15.378,3.6645,14.223,11.083,10.758283,6.841,8.378,3.618,...,8.509249,22.19,0.992,4.20979,5.14342,6.27509,10.75754,4.949247,6.23516,5.123161


In [7]:
df[['Year', 'India', 'World']].tail(10)

Unnamed: 0,Year,India,World
23,2014,7.976,6.022234
24,2015,7.891,6.055028
25,2016,7.808,6.021302
26,2017,7.728,5.92976
27,2018,7.65,5.768692
28,2019,6.51,5.591542
29,2020,7.86,6.603279
30,2021,6.38,6.064105
31,2022,4.822,5.267477
32,2023,4.668,5.123161


### Exporting the data

In [8]:
df.to_csv(csv_file_path_unemployment)