# Data Collection Module 3: World Bank API for Life Expectancy Data

<b>API Details:</b>

The World Bank API offers a rich repository of global development data, including life expectancy statistics, as part of its World Development Indicators dataset. This includes data on life expectancy at birth for both sexes combined, providing insight into the average number of years a newborn is expected to live if current mortality rates continue to apply. World Development Indicators (WDI) Database 

<b>Data on Life Expectancy:</b>

Indicator for Life Expectancy at Birth, Total (years): "Life expectancy at birth, total (years)" (Indicator Code: SP.DYN.LE00.IN).

<b>Data Update Frequency:</b>

Life expectancy data in the World Bank's World Development Indicators is updated annually. This ensures that the dataset provides a timely overview of changes in global and country-specific life expectancy trends.

<b>Accessing the API:</b>

The World Bank API is publicly accessible and does not require an API key, enabling straightforward integration for projects aimed at analyzing, visualizing, or reporting on global life expectancy trends.

<b>API Documentation:</b> https://datahelpdesk.worldbank.org/knowledgebase/articles/898581-api-documentation

<b>Process:</b> The workflow involves accessing the World Bank API to collect relevant life expectancy data, preprocessing this data to align with project requirements, and storing the refined dataset for further analysis or visualization.

### `Goal`: Develop a dynamic and comprehensive dataset on global life expectancy statistics.

This module is dedicated to collecting life expectancy data that will later be utilized to visualize a comprehensive world map depicting life expectancy across different countries and regions. The aim is to provide a clear and accessible representation of life expectancy trends worldwide, offering valuable insights into global health and development outcomes.

In [1]:
# Code snippet for data collection from the API
import requests
import pandas as pd

def fetch_life_expectancy_data(indicator_code, country_code="all"):
    """Fetch life expectancy data for a given country code and indicator."""
    url = f"http://api.worldbank.org/v2/country/{country_code}/indicator/{indicator_code}?format=json&date=1960:2023&per_page=10000"
    response = requests.get(url)
    data = response.json()

    if len(data) == 2 and isinstance(data[1], list):
        # Extract the indicator code or name from the dictionary for each row
        for row in data[1]:
            row['indicator'] = row['indicator']['id']
        return data[1]
    else:
        return None

# Life Expectancy at Birth, Total (years) - Indicator Code: SP.DYN.LE00.IN
life_expectancy_data_all_countries = fetch_life_expectancy_data("SP.DYN.LE00.IN", "all")
life_expectancy_data_world = fetch_life_expectancy_data("SP.DYN.LE00.IN", "WLD")

# Combine life expectancy data
combined_data = life_expectancy_data_all_countries + life_expectancy_data_world

# Converting to DataFrame
df_life_expectancy = pd.DataFrame(combined_data)

# Filter for necessary columns and rename them
df_life_expectancy = df_life_expectancy[['countryiso3code', 'date', 'value', 'indicator']]
df_life_expectancy.columns = ['Country Code', 'Year', 'Life Expectancy', 'Indicator']

# Pivot the dataset to have 'Year' as rows, 'Country Code' as columns, and 'Life Expectancy' as values
df_life_expectancy_pivot = df_life_expectancy.pivot_table(index='Year', columns='Country Code', values='Life Expectancy')

# Saving the data to a CSV file
csv_file_path_life_expectancy = '/Users/hrishi/original-desktop/Data Science/Projects/Major_Project/data-collection/data/world_life_expectancy_data.csv'
df_life_expectancy_pivot.to_csv(csv_file_path_life_expectancy)

print("Life expectancy data has been saved.")


Life expectancy data has been saved.


## Data Preprocessing

In [2]:
df = pd.read_csv(csv_file_path_life_expectancy)
df

Unnamed: 0,Year,Unnamed: 1,ABW,AFE,AFG,AFW,AGO,ALB,ARB,ARG,...,TCD,TEA,TEC,TLA,TMN,TSA,TSS,VGB,WLD,XKX
0,1960,50.002124,64.152,44.085552,32.535,37.845152,38.211,54.439,44.972899,63.978,...,38.374,37.539508,64.850633,54.659327,44.621149,45.102367,41.422546,59.564,50.894180,61.485
1,1961,51.364416,64.537,44.386697,33.068,38.164950,37.267,55.634,45.676401,64.360,...,38.631,43.208667,65.076216,55.217428,45.336940,45.418649,41.739739,60.219,52.846336,61.836
2,1962,52.980222,64.752,44.752182,33.547,38.735102,37.539,56.671,46.122576,64.244,...,38.835,51.075176,64.916197,55.670493,45.610731,45.802340,42.200721,61.601,55.208684,62.134
3,1963,53.297682,65.132,44.913159,34.016,39.063715,37.824,57.844,46.972472,64.449,...,39.072,51.577645,65.483302,56.113680,46.771595,46.114925,42.440620,63.533,55.542341,62.440
4,1964,53.845625,65.294,45.479043,34.494,39.335360,38.131,58.983,47.895758,64.363,...,39.333,52.307700,65.966297,56.612891,47.443876,46.505592,42.890838,64.647,56.034875,62.734
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57,2017,71.838447,75.903,62.922390,63.016,56.888446,61.680,79.047,71.429596,76.833,...,52.308,75.415733,73.876536,74.655350,72.743229,69.979289,60.477138,74.785,72.541695,78.783
58,2018,72.105622,76.072,63.365863,63.081,57.189139,62.144,79.184,71.633017,76.999,...,52.825,75.835693,74.042479,74.776488,72.963389,70.253479,60.863028,75.831,72.783210,78.696
59,2019,72.370969,76.248,63.755678,63.565,57.555796,62.448,79.282,71.844626,77.284,...,53.259,76.036657,74.337936,74.971824,73.119390,70.472755,61.244291,75.863,72.979143,79.022
60,2020,71.681576,75.723,63.313860,62.575,57.226373,62.261,76.989,70.923360,75.892,...,52.777,75.968890,72.696082,72.927955,72.146376,69.748331,60.848894,75.849,72.243466,76.567


In [3]:
# Dropping the unnecessary column
df.drop('Unnamed: 1', axis = 1, inplace = True)

# Converting country codes to country names for readability

# Fetch country information from the World Bank API
url = "http://api.worldbank.org/v2/country?per_page=300&format=json"
response = requests.get(url)
countries_data = response.json()

# Extract relevant data for mapping codes to names, add 'WLD' manually
countries = {country['id']: country['name'] for country in countries_data[1]}
countries['WLD'] = 'World'

In [4]:
print(f'Mapping of Codes to Countries: \n {countries}')

Mapping of Codes to Countries: 
 {'ABW': 'Aruba', 'AFE': 'Africa Eastern and Southern', 'AFG': 'Afghanistan', 'AFR': 'Africa', 'AFW': 'Africa Western and Central', 'AGO': 'Angola', 'ALB': 'Albania', 'AND': 'Andorra', 'ARB': 'Arab World', 'ARE': 'United Arab Emirates', 'ARG': 'Argentina', 'ARM': 'Armenia', 'ASM': 'American Samoa', 'ATG': 'Antigua and Barbuda', 'AUS': 'Australia', 'AUT': 'Austria', 'AZE': 'Azerbaijan', 'BDI': 'Burundi', 'BEA': 'East Asia & Pacific (IBRD-only countries)', 'BEC': 'Europe & Central Asia (IBRD-only countries)', 'BEL': 'Belgium', 'BEN': 'Benin', 'BFA': 'Burkina Faso', 'BGD': 'Bangladesh', 'BGR': 'Bulgaria', 'BHI': 'IBRD countries classified as high income', 'BHR': 'Bahrain', 'BHS': 'Bahamas, The', 'BIH': 'Bosnia and Herzegovina', 'BLA': 'Latin America & the Caribbean (IBRD-only countries)', 'BLR': 'Belarus', 'BLZ': 'Belize', 'BMN': 'Middle East & North Africa (IBRD-only countries)', 'BMU': 'Bermuda', 'BOL': 'Bolivia', 'BRA': 'Brazil', 'BRB': 'Barbados', 'BRN'

In [5]:
# Function to rename columns based on the mapping
def rename_columns(df, mapping):
    # Create a new mapping for the existing columns in the DataFrame
    new_columns = {col: mapping.get(col, col) for col in df.columns}
    # Rename the columns using the new mapping
    df.rename(columns=new_columns, inplace=True)

# Rename columns in df1 and df2
rename_columns(df, countries)

In [6]:
df

Unnamed: 0,Year,Aruba,Africa Eastern and Southern,Afghanistan,Africa Western and Central,Angola,Albania,Arab World,Argentina,Armenia,...,Chad,East Asia & Pacific (IDA & IBRD countries),Europe & Central Asia (IDA & IBRD countries),Latin America & the Caribbean (IDA & IBRD countries),Middle East & North Africa (IDA & IBRD countries),South Asia (IDA & IBRD),Sub-Saharan Africa (IDA & IBRD countries),British Virgin Islands,World,Kosovo
0,1960,64.152,44.085552,32.535,37.845152,38.211,54.439,44.972899,63.978,61.431,...,38.374,37.539508,64.850633,54.659327,44.621149,45.102367,41.422546,59.564,50.894180,61.485
1,1961,64.537,44.386697,33.068,38.164950,37.267,55.634,45.676401,64.360,61.803,...,38.631,43.208667,65.076216,55.217428,45.336940,45.418649,41.739739,60.219,52.846336,61.836
2,1962,64.752,44.752182,33.547,38.735102,37.539,56.671,46.122576,64.244,62.125,...,38.835,51.075176,64.916197,55.670493,45.610731,45.802340,42.200721,61.601,55.208684,62.134
3,1963,65.132,44.913159,34.016,39.063715,37.824,57.844,46.972472,64.449,62.223,...,39.072,51.577645,65.483302,56.113680,46.771595,46.114925,42.440620,63.533,55.542341,62.440
4,1964,65.294,45.479043,34.494,39.335360,38.131,58.983,47.895758,64.363,62.418,...,39.333,52.307700,65.966297,56.612891,47.443876,46.505592,42.890838,64.647,56.034875,62.734
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57,2017,75.903,62.922390,63.016,56.888446,61.680,79.047,71.429596,76.833,74.906,...,52.308,75.415733,73.876536,74.655350,72.743229,69.979289,60.477138,74.785,72.541695,78.783
58,2018,76.072,63.365863,63.081,57.189139,62.144,79.184,71.633017,76.999,75.064,...,52.825,75.835693,74.042479,74.776488,72.963389,70.253479,60.863028,75.831,72.783210,78.696
59,2019,76.248,63.755678,63.565,57.555796,62.448,79.282,71.844626,77.284,75.439,...,53.259,76.036657,74.337936,74.971824,73.119390,70.472755,61.244291,75.863,72.979143,79.022
60,2020,75.723,63.313860,62.575,57.226373,62.261,76.989,70.923360,75.892,72.173,...,52.777,75.968890,72.696082,72.927955,72.146376,69.748331,60.848894,75.849,72.243466,76.567


In [7]:
df[['Year', 'India', 'World']]

Unnamed: 0,Year,India,World
0,1960,45.218,50.894180
1,1961,45.398,52.846336
2,1962,45.659,55.208684
3,1963,45.936,55.542341
4,1964,46.184,56.034875
...,...,...,...
57,2017,70.467,72.541695
58,2018,70.710,72.783210
59,2019,70.910,72.979143
60,2020,70.150,72.243466


### Exporting the data


In [8]:
df.to_csv(csv_file_path_life_expectancy)