# Data Collection Module 7: World Bank API for Poverty Headcount Ratio Data

<b>API Details:</b>

The World Bank API is an invaluable resource for accessing a broad range of global development data, including key indicators related to poverty such as the Poverty Headcount Ratio. The World Development Indicators dataset provided by the API includes the Poverty Headcount Ratio at both national and international poverty lines, offering insights into the percentage of the population living below the poverty threshold across over 200 countries and territories. This data encompasses historical records along with the most recent estimates, enabling a comprehensive analysis of poverty trends worldwide.

<b>Data on Poverty Headcount Ratio:</b>

Indicator for Poverty Headcount Ratio at National Poverty Lines (% of population): "Poverty headcount ratio at national poverty lines (% of population)" (Indicator Code: SI.POV.NAHC).
Indicator for Poverty Headcount Ratio at $1.90/day (2011 PPP): "Poverty headcount ratio at $1.90 a day (2011 PPP) (% of population)" (Indicator Code: SI.POV.DDAY).


<b>Data Update Frequency:</b>

Poverty data within the World Development Indicators is updated periodically as new surveys and statistical information become available. This ensures the dataset remains up-to-date, providing an accurate portrayal of poverty levels and trends on both a global scale and within specific countries.

<b>Accessing the API:</b>

Public access to the World Bank API allows for the easy integration of critical poverty statistics into projects aimed at analyzing, visualizing, or reporting on poverty worldwide, without the need for an API key.

<b>API Documentation:</b> https://datahelpdesk.worldbank.org/knowledgebase/articles/898581-api-documentation

<b>Process:</b> This module involves using the World Bank API to collect relevant data on the Poverty Headcount Ratio, followed by preprocessing this data to align with specific project needs, and finally organizing the refined dataset for subsequent analysis, visualization, or reporting purposes.

### `Goal`: Assemble an up-to-date and comprehensive dataset on global poverty statistics.

In [60]:
# Code snippet for data collection from the API
import requests
import pandas as pd

def fetch_poverty_data(indicator_code, country_code="all"):
    """Fetch Poverty Headcount Ratio data for a given country code and indicator."""
    url = f"http://api.worldbank.org/v2/country/{country_code}/indicator/{indicator_code}?format=json&date=1960:2023&per_page=10000"
    response = requests.get(url)
    data = response.json()

    if len(data) == 2 and isinstance(data[1], list):
        # Extract the indicator code or name from the dictionary for each row
        for row in data[1]:
            row['indicator'] = row['indicator']['id']
        return data[1]
    else:
        return None

# Poverty Headcount Ratio at $1.90 a day (2011 PPP) (% of population) - Indicator Code: SI.POV.DDAY
poverty_data_all_countries = fetch_poverty_data("SI.POV.DDAY", "all")
poverty_data_world = fetch_poverty_data("SI.POV.DDAY", "WLD")

# Combine Poverty Headcount Ratio data
combined_data = poverty_data_all_countries + poverty_data_world

# Converting to DataFrame
df_poverty = pd.DataFrame(combined_data)

# Filter for necessary columns and rename them
df_poverty = df_poverty[['countryiso3code', 'date', 'value', 'indicator']]
df_poverty.columns = ['Country Code', 'Year', 'Poverty Headcount Ratio', 'Indicator']

# Pivot the dataset to have 'Year' as rows, 'Country Code' as columns, and 'Poverty Headcount Ratio' as values
df_poverty_pivot = df_poverty.pivot_table(index='Year', columns='Country Code', values='Poverty Headcount Ratio')

# Saving the data to CSV files
csv_file_path_poverty = 'world_poverty_headcount_ratio_data.csv'
df_poverty_pivot.to_csv(csv_file_path_poverty)

print("Poverty headcount ratio data has been saved.")

Poverty headcount ratio data has been saved.


## Data Preprocessing

In [61]:
df = pd.read_csv(csv_file_path_poverty)
df.tail(10)

Unnamed: 0,Year,Unnamed: 1,AGO,ALB,ARG,ARM,AUS,AUT,AZE,BDI,...,LCN,LMY,MEA,SAS,SLV,SSF,SWZ,TCD,WLD,XKX
41,2014,16.1,,1.0,0.6,1.4,0.5,0.2,,,...,4.3,13.1,2.8,17.9,3.4,38.1,,,11.1,1.1
42,2015,16.05,,0.1,,1.1,,0.7,,,...,4.1,12.6,3.7,16.6,2.3,38.2,,,10.6,0.8
43,2016,15.85,,0.1,0.7,1.1,0.5,0.7,,,...,4.3,12.3,4.5,15.8,2.6,38.0,36.1,,10.4,0.8
44,2017,15.25,,0.4,0.6,0.8,,0.3,,,...,4.3,11.3,4.7,12.6,2.2,37.5,,,9.6,0.4
45,2018,4.866667,31.1,0.0,1.0,1.3,0.5,0.6,,,...,4.2,10.4,4.7,10.1,1.8,36.9,,30.9,8.8,
46,2019,4.833333,,0.0,1.1,1.0,,0.6,,,...,4.2,10.5,,10.6,1.4,36.7,,,8.9,
47,2020,5.333333,,0.0,1.2,0.4,,0.7,,62.1,...,3.8,11.5,,13.0,,,,,9.7,
48,2021,5.033333,,,0.9,0.5,,0.5,,,...,4.5,11.2,,11.5,3.6,,,,9.5,
49,2022,4.566667,,,0.6,0.8,,,,,...,3.5,10.6,,9.7,3.4,,,30.8,9.0,
50,2023,,,,,,,,,,...,,,,,,,,,,


In [62]:
# Dropping the unnecessary column
df.drop('Unnamed: 1', axis = 1, inplace = True)

# Converting country codes to country names for readability

# Fetch country information from the World Bank API
url = "http://api.worldbank.org/v2/country?per_page=300&format=json"
response = requests.get(url)
countries_data = response.json()

# Extract relevant data for mapping codes to names, add 'WLD' manually
countries = {country['id']: country['name'] for country in countries_data[1]}
countries['WLD'] = 'World'

In [67]:
countries

{'ABW': 'Aruba',
 'AFE': 'Africa Eastern and Southern',
 'AFG': 'Afghanistan',
 'AFR': 'Africa',
 'AFW': 'Africa Western and Central',
 'AGO': 'Angola',
 'ALB': 'Albania',
 'AND': 'Andorra',
 'ARB': 'Arab World',
 'ARE': 'United Arab Emirates',
 'ARG': 'Argentina',
 'ARM': 'Armenia',
 'ASM': 'American Samoa',
 'ATG': 'Antigua and Barbuda',
 'AUS': 'Australia',
 'AUT': 'Austria',
 'AZE': 'Azerbaijan',
 'BDI': 'Burundi',
 'BEA': 'East Asia & Pacific (IBRD-only countries)',
 'BEC': 'Europe & Central Asia (IBRD-only countries)',
 'BEL': 'Belgium',
 'BEN': 'Benin',
 'BFA': 'Burkina Faso',
 'BGD': 'Bangladesh',
 'BGR': 'Bulgaria',
 'BHI': 'IBRD countries classified as high income',
 'BHR': 'Bahrain',
 'BHS': 'Bahamas, The',
 'BIH': 'Bosnia and Herzegovina',
 'BLA': 'Latin America & the Caribbean (IBRD-only countries)',
 'BLR': 'Belarus',
 'BLZ': 'Belize',
 'BMN': 'Middle East & North Africa (IBRD-only countries)',
 'BMU': 'Bermuda',
 'BOL': 'Bolivia',
 'BRA': 'Brazil',
 'BRB': 'Barbados',
 '

In [63]:
# Function to rename columns based on the mapping
def rename_columns(df, mapping):
    # Create a new mapping for the existing columns in the DataFrame
    new_columns = {col: mapping.get(col, col) for col in df.columns}
    # Rename the columns using the new mapping
    df.rename(columns=new_columns, inplace=True)

# Rename columns in df1 and df2
rename_columns(df, countries)

In [64]:
df.tail()

Unnamed: 0,Year,Angola,Albania,Argentina,Armenia,Australia,Austria,Azerbaijan,Burundi,Belgium,...,Latin America & Caribbean,Low & middle income,Middle East & North Africa,South Asia,El Salvador,Sub-Saharan Africa,Eswatini,Chad,World,Kosovo
46,2019,,0.0,1.1,1.0,,0.6,,,0.1,...,4.2,10.5,,10.6,1.4,36.7,,,8.9,
47,2020,,0.0,1.2,0.4,,0.7,,62.1,0.1,...,3.8,11.5,,13.0,,,,,9.7,
48,2021,,,0.9,0.5,,0.5,,,0.0,...,4.5,11.2,,11.5,3.6,,,,9.5,
49,2022,,,0.6,0.8,,,,,,...,3.5,10.6,,9.7,3.4,,,30.8,9.0,
50,2023,,,,,,,,,,...,,,,,,,,,,


In [65]:
df[['Year', 'India', 'World']]

Unnamed: 0,Year,India,World
0,1970,,
1,1971,,
2,1973,,
3,1975,,
4,1977,63.5,
5,1978,,
6,1979,,
7,1980,,
8,1981,,43.8
9,1982,,43.2


In [66]:
df.to_csv(csv_file_path_poverty)