# Data Collection Module 5: World Bank API for Mortality Rate Data

<b>API Details:</b>

The World Bank API serves as a comprehensive source of global development data, encompassing a broad spectrum of indicators pertinent to health, including Under-5 Mortality Rate statistics. As part of the World Development Indicators dataset, it offers data on the Under-5 Mortality Rate across more than 200 countries and territories. This indicator reflects the probability per 1,000 live births that a newborn baby will die before reaching age five, assuming that age-specific mortality rates remain constant. World Development Indicators (WDI) Database 

<b>Data on Under-5 Mortality Rate:</b>

Indicator for Under-5 Mortality Rate: "Mortality rate, under-5 (per 1,000 live births)" (Indicator Code: SH.DYN.MORT).

<b>Data Update Frequency:</b>

The Under-5 Mortality Rate data in the World Bank's World Development Indicators is regularly updated to incorporate the latest health surveys and statistical models, ensuring an accurate portrayal of current trends and improvements in child health worldwide.

<b>Accessing the API:</b>

Access to the World Bank API is unrestricted and does not require an API key, facilitating easy integration of vital child health statistics into analytical, visualization, or reporting projects focusing on global health issues.

<b>API Documentation:</b> https://datahelpdesk.worldbank.org/knowledgebase/articles/898581-api-documentation

<b>Process:</b> The workflow for this module involves leveraging the World Bank API to gather pertinent Under-5 Mortality Rate data, followed by the preprocessing of this data to align with specific project requirements, and culminating in the organization and storage of the processed dataset for subsequent analysis or visualization.

### `Goal`: Compile a contemporary and dynamic dataset on global Under-5 Mortality Rate statistics.

In [1]:
# Code snippet for data collection from the API
import requests
import pandas as pd

def fetch_mortality_data(indicator_code, country_code="all"):
    """Fetch mortality rate data for a given country code and indicator."""
    url = f"http://api.worldbank.org/v2/country/{country_code}/indicator/{indicator_code}?format=json&date=1960:2023&per_page=10000"
    response = requests.get(url)
    data = response.json()

    if len(data) == 2 and isinstance(data[1], list):
        # Extract the indicator code or name from the dictionary for each row
        for row in data[1]:
            row['indicator'] = row['indicator']['id']
        return data[1]
    else:
        return None

# Mortality rate, under-5 (per 1,000 live births) - Indicator Code: SH.DYN.MORT
mortality_data_all_countries = fetch_mortality_data("SH.DYN.MORT", "all")
mortality_data_world = fetch_mortality_data("SH.DYN.MORT", "WLD")

# Combine mortality data
combined_data = mortality_data_all_countries + mortality_data_world

# Converting to DataFrame
df_mortality = pd.DataFrame(combined_data)

# Filter for necessary columns and rename them
df_mortality = df_mortality[['countryiso3code', 'date', 'value', 'indicator']]
df_mortality.columns = ['Country Code', 'Year', 'Mortality Rate', 'Indicator']

# Pivot the dataset to have 'Year' as rows, 'Country Code' as columns, and 'Mortality Rate' as values
df_mortality_pivot = df_mortality.pivot_table(index='Year', columns='Country Code', values='Mortality Rate')

# Saving the data to CSV files
csv_file_path_mortality = '/Users/hrishi/original-desktop/Data Science/Projects/Major_Project/data-collection/data/world_mortality_rate_data.csv'
df_mortality_pivot.to_csv(csv_file_path_mortality)

print("Mortality rate data has been saved.")

Mortality rate data has been saved.


## Data Preprocessing

In [2]:
df = pd.read_csv(csv_file_path_mortality)
df.tail(10)

Unnamed: 0,Year,Unnamed: 1,AFE,AFG,AFW,AGO,ALB,AND,ARB,ARG,...,TCD,TEA,TEC,TLA,TMN,TSA,TSS,VGB,WLD,XKX
52,2012,42.475,78.787727,80.3,116.249469,104.9,11.2,4.1,41.258526,13.3,...,140.6,20.292886,15.433234,20.115184,29.381246,56.982971,94.802483,13.8,47.6,17.5
53,2013,41.0,75.44269,76.8,113.66425,98.3,10.5,3.9,40.578429,12.7,...,136.9,19.492319,14.709529,19.514214,28.988933,54.325497,91.771793,13.4,46.0,16.3
54,2014,39.625,72.492384,73.4,111.432027,92.9,9.9,3.7,39.672627,12.2,...,133.3,18.458659,13.933376,18.993822,28.120076,51.7313,89.107486,13.1,44.4,15.3
55,2015,38.325,69.869186,70.2,109.185389,88.3,9.6,3.5,38.761946,11.6,...,129.5,17.849763,13.219249,18.516114,27.271205,49.258457,86.636507,12.7,43.2,14.4
56,2016,37.05,67.363929,67.2,106.891019,84.4,9.4,3.4,38.183071,11.0,...,125.8,16.812946,12.512951,18.688866,26.594517,46.787929,84.212437,12.3,41.9,13.5
57,2017,35.825,65.091053,64.6,104.627006,81.1,9.3,3.2,36.905068,10.3,...,121.9,16.007518,11.976023,17.932663,25.411192,44.690035,81.901471,11.9,40.6,12.6
58,2018,34.825,62.857189,62.2,102.014144,78.0,9.3,3.1,36.404985,9.5,...,118.0,16.026165,11.553728,17.470102,25.183852,42.505482,79.471672,11.5,40.0,11.9
59,2019,33.8,60.855917,59.9,99.55196,75.0,9.4,3.0,35.725503,8.6,...,114.3,15.608078,11.212613,16.9779,24.880702,40.58937,77.238969,11.2,39.3,11.2
60,2020,32.825,59.087309,57.8,96.928307,72.1,9.4,2.8,34.809462,7.7,...,110.5,15.661156,10.816419,16.505269,24.264014,38.846494,75.113403,10.9,38.7,10.5
61,2021,31.95,57.284727,55.7,94.372235,69.4,9.5,2.8,34.454511,6.9,...,107.1,15.536157,10.440716,16.022407,24.062561,37.05196,72.994734,10.5,38.1,10.0


In [3]:
# Dropping the unnecessary column
df.drop('Unnamed: 1', axis = 1, inplace = True)

# Converting country codes to country names for readability

# Fetch country information from the World Bank API
url = "http://api.worldbank.org/v2/country?per_page=300&format=json"
response = requests.get(url)
countries_data = response.json()

# Extract relevant data for mapping codes to names, add 'WLD' manually
countries = {country['id']: country['name'] for country in countries_data[1]}
countries['WLD'] = 'World'

In [4]:
print(f'Mapping of Codes to Countries: \n {countries}')

Mapping of Codes to Countries: 
 {'ABW': 'Aruba', 'AFE': 'Africa Eastern and Southern', 'AFG': 'Afghanistan', 'AFR': 'Africa', 'AFW': 'Africa Western and Central', 'AGO': 'Angola', 'ALB': 'Albania', 'AND': 'Andorra', 'ARB': 'Arab World', 'ARE': 'United Arab Emirates', 'ARG': 'Argentina', 'ARM': 'Armenia', 'ASM': 'American Samoa', 'ATG': 'Antigua and Barbuda', 'AUS': 'Australia', 'AUT': 'Austria', 'AZE': 'Azerbaijan', 'BDI': 'Burundi', 'BEA': 'East Asia & Pacific (IBRD-only countries)', 'BEC': 'Europe & Central Asia (IBRD-only countries)', 'BEL': 'Belgium', 'BEN': 'Benin', 'BFA': 'Burkina Faso', 'BGD': 'Bangladesh', 'BGR': 'Bulgaria', 'BHI': 'IBRD countries classified as high income', 'BHR': 'Bahrain', 'BHS': 'Bahamas, The', 'BIH': 'Bosnia and Herzegovina', 'BLA': 'Latin America & the Caribbean (IBRD-only countries)', 'BLR': 'Belarus', 'BLZ': 'Belize', 'BMN': 'Middle East & North Africa (IBRD-only countries)', 'BMU': 'Bermuda', 'BOL': 'Bolivia', 'BRA': 'Brazil', 'BRB': 'Barbados', 'BRN'

In [5]:
# Function to rename columns based on the mapping
def rename_columns(df, mapping):
    # Create a new mapping for the existing columns in the DataFrame
    new_columns = {col: mapping.get(col, col) for col in df.columns}
    # Rename the columns using the new mapping
    df.rename(columns=new_columns, inplace=True)

# Rename columns in df1 and df2
rename_columns(df, countries)

In [6]:
df.tail(10)

Unnamed: 0,Year,Africa Eastern and Southern,Afghanistan,Africa Western and Central,Angola,Albania,Andorra,Arab World,Argentina,Armenia,...,Chad,East Asia & Pacific (IDA & IBRD countries),Europe & Central Asia (IDA & IBRD countries),Latin America & the Caribbean (IDA & IBRD countries),Middle East & North Africa (IDA & IBRD countries),South Asia (IDA & IBRD),Sub-Saharan Africa (IDA & IBRD countries),British Virgin Islands,World,Kosovo
52,2012,78.787727,80.3,116.249469,104.9,11.2,4.1,41.258526,13.3,16.8,...,140.6,20.292886,15.433234,20.115184,29.381246,56.982971,94.802483,13.8,47.6,17.5
53,2013,75.44269,76.8,113.66425,98.3,10.5,3.9,40.578429,12.7,15.9,...,136.9,19.492319,14.709529,19.514214,28.988933,54.325497,91.771793,13.4,46.0,16.3
54,2014,72.492384,73.4,111.432027,92.9,9.9,3.7,39.672627,12.2,15.1,...,133.3,18.458659,13.933376,18.993822,28.120076,51.7313,89.107486,13.1,44.4,15.3
55,2015,69.869186,70.2,109.185389,88.3,9.6,3.5,38.761946,11.6,14.4,...,129.5,17.849763,13.219249,18.516114,27.271205,49.258457,86.636507,12.7,43.2,14.4
56,2016,67.363929,67.2,106.891019,84.4,9.4,3.4,38.183071,11.0,13.7,...,125.8,16.812946,12.512951,18.688866,26.594517,46.787929,84.212437,12.3,41.9,13.5
57,2017,65.091053,64.6,104.627006,81.1,9.3,3.2,36.905068,10.3,13.0,...,121.9,16.007518,11.976023,17.932663,25.411192,44.690035,81.901471,11.9,40.6,12.6
58,2018,62.857189,62.2,102.014144,78.0,9.3,3.1,36.404985,9.5,12.4,...,118.0,16.026165,11.553728,17.470102,25.183852,42.505482,79.471672,11.5,40.0,11.9
59,2019,60.855917,59.9,99.55196,75.0,9.4,3.0,35.725503,8.6,11.8,...,114.3,15.608078,11.212613,16.9779,24.880702,40.58937,77.238969,11.2,39.3,11.2
60,2020,59.087309,57.8,96.928307,72.1,9.4,2.8,34.809462,7.7,11.3,...,110.5,15.661156,10.816419,16.505269,24.264014,38.846494,75.113403,10.9,38.7,10.5
61,2021,57.284727,55.7,94.372235,69.4,9.5,2.8,34.454511,6.9,10.7,...,107.1,15.536157,10.440716,16.022407,24.062561,37.05196,72.994734,10.5,38.1,10.0


In [7]:
df[['Year', 'Sub-Saharan Africa (IDA & IBRD countries)', 'India', 'World']]

Unnamed: 0,Year,Sub-Saharan Africa (IDA & IBRD countries),India,World
0,1960,,242.8,
1,1961,,239.3,
2,1962,,236.1,
3,1963,,233.2,
4,1964,,230.4,
...,...,...,...,...
57,2017,81.901471,38.7,40.6
58,2018,79.471672,36.4,40.0
59,2019,77.238969,34.3,39.3
60,2020,75.113403,32.4,38.7


### Exporting the data

In [8]:
df.to_csv(csv_file_path_mortality)