# Data Collection Module 4: World Bank API for Literacy Rate Data

<b>API Details:</b>

The World Bank API is a rich source of global development data, offering access to a wide array of indicators, including those related to education such as literacy rates. Through its World Development Indicators dataset, it provides data on adult and youth literacy rates across over 200 countries and territories, reflecting the percentage of people who can, with understanding, read and write a short, simple statement on their everyday life. World Development Indicators (WDI) Database 

<b>Data on Literacy Rate:</b>

Indicator for Adult Total Literacy Rate: "Literacy rate, adult total (% of people ages 15 and above)" (Indicator Code: SE.ADT.LITR.ZS).
Indicator for Youth Literacy Rate: "Literacy rate, youth total (% of people ages 15-24)" (Indicator Code: SE.ADT.1524.LT.ZS).

<b>Data Update Frequency:</b>

Literacy rate data within the World Bank's World Development Indicators are updated as new data become available from UNESCO and other sources, typically on an annual basis. This ensures that the dataset remains current and provides an accurate reflection of global and national trends in literacy.

<b>Accessing the API:</b>

The World Bank API is publicly accessible, requiring no API key, which simplifies the process of integrating this data into projects aimed at analyzing, visualizing, or reporting on literacy rates globally.

<b>API Documentation:</b> https://datahelpdesk.worldbank.org/knowledgebase/articles/898581-api-documentation

<b>Process:</b> This module involves accessing the World Bank API to collect relevant data on literacy rates, preprocessing this data to suit project-specific, and storing the data.

### `Goal`: Develop an up-to-date and dynamic dataset on global literacy rate statistics.

In [1]:
# Code snippet for data collection from the API
import requests
import pandas as pd

def fetch_data(indicator_code, country_code="all"):
    """Fetch data for a given country code and indicator."""
    url = f"http://api.worldbank.org/v2/country/{country_code}/indicator/{indicator_code}?format=json&date=1960:2023&per_page=10000"
    response = requests.get(url)
    data = response.json()

    if len(data) == 2 and isinstance(data[1], list):
        # Extract the indicator code or name from the dictionary for each row
        for row in data[1]:
            row['indicator'] = row['indicator']['id']
        return data[1]
    else:
        return None

# Literacy rate, adult total (% of people ages 15 and above) - Indicator Code: SE.ADT.LITR.ZS
literacy_rate_data_all_countries = fetch_data("SE.ADT.LITR.ZS", "all")
literacy_rate_data_world = fetch_data("SE.ADT.LITR.ZS", "WLD")

# Combine literacy rate data
combined_data = literacy_rate_data_all_countries + literacy_rate_data_world

# Converting to DataFrame
df_literacy_rate = pd.DataFrame(combined_data)

# Filter for necessary columns and rename them
df_literacy_rate = df_literacy_rate[['countryiso3code', 'date', 'value', 'indicator']]
df_literacy_rate.columns = ['Country Code', 'Year', 'Literacy Rate', 'Indicator']

# Pivot the dataset to have 'Year' as rows, 'Country Code' as columns, and 'Literacy Rate' as values
df_literacy_rate_pivot = df_literacy_rate.pivot_table(index='Year', columns='Country Code', values='Literacy Rate')

# Saving the data to CSV files
csv_file_path_literacy_rate = '/Users/hrishi/original-desktop/Data Science/Projects/Major_Project/data-collection/data/world_literacy_rate_data.csv'
df_literacy_rate_pivot.to_csv(csv_file_path_literacy_rate)

## Data Preprocessing

In [2]:
df = pd.read_csv(csv_file_path_literacy_rate)
df.tail(10)

Unnamed: 0,Year,Unnamed: 1,ABW,AFE,AFG,AFW,AGO,ALB,ARB,ARG,...,SST,SWZ,TCD,TEA,TEC,TLA,TMN,TSA,TSS,WLD
42,2013,75.189668,,69.383102,,53.002781,,,73.276482,,...,84.960602,,,94.432419,98.883049,92.282288,74.810059,68.054527,62.830429,84.860138
43,2014,75.810644,,69.83477,,54.00676,66.030113,,77.577873,,...,85.230263,,,94.892281,98.91011,92.741333,79.083733,68.793167,63.513378,85.433708
44,2015,76.152826,,70.313423,,54.818321,,,74.981209,,...,85.797142,,26.002991,95.098907,98.932503,92.909889,76.516083,69.757156,64.132278,85.60183
45,2016,76.822341,,71.0952,,55.43792,,,76.386726,,...,86.150352,,22.31155,95.262619,99.044769,93.403793,77.691162,71.104927,64.861397,86.061157
46,2017,77.375983,,71.009071,,56.485538,,,77.170303,,...,86.305801,,,95.429558,99.046692,93.473358,78.762627,71.790359,65.223244,86.288231
47,2018,77.430127,,71.392616,,59.568459,,,74.286133,,...,86.471458,,,95.529839,99.203148,93.610336,75.668541,72.081512,66.68364,86.33905
48,2019,77.668287,,72.634972,,59.511719,,,74.603661,,...,86.671227,,,95.912888,99.221291,94.071571,76.050842,71.942787,67.410858,86.489601
49,2020,77.947858,97.989998,72.785622,,59.617512,,,75.022881,,...,86.849854,89.279999,,96.224907,99.29792,94.098381,76.412781,72.658607,67.54763,86.71151
50,2021,78.278093,,72.581161,37.266041,60.034611,,,75.231178,,...,86.919403,,,96.251106,99.328598,94.363098,76.78054,73.42247,67.591927,86.852753
51,2022,78.540741,,72.600403,,60.312698,72.400002,98.5,75.171532,,...,87.019661,,27.280001,,99.349823,94.512901,77.176361,74.187759,67.715012,87.011749


In [3]:
# Dropping the unnecessary column
df.drop('Unnamed: 1', axis = 1, inplace = True)

# Converting country codes to country names for readability

# Fetch country information from the World Bank API
url = "http://api.worldbank.org/v2/country?per_page=300&format=json"
response = requests.get(url)
countries_data = response.json()

# Extract relevant data for mapping codes to names, add 'WLD' manually
countries = {country['id']: country['name'] for country in countries_data[1]}
countries['WLD'] = 'World'

In [4]:
print(f'Mapping of Codes to Countries: \n {countries}')

Mapping of Codes to Countries: 
 {'ABW': 'Aruba', 'AFE': 'Africa Eastern and Southern', 'AFG': 'Afghanistan', 'AFR': 'Africa', 'AFW': 'Africa Western and Central', 'AGO': 'Angola', 'ALB': 'Albania', 'AND': 'Andorra', 'ARB': 'Arab World', 'ARE': 'United Arab Emirates', 'ARG': 'Argentina', 'ARM': 'Armenia', 'ASM': 'American Samoa', 'ATG': 'Antigua and Barbuda', 'AUS': 'Australia', 'AUT': 'Austria', 'AZE': 'Azerbaijan', 'BDI': 'Burundi', 'BEA': 'East Asia & Pacific (IBRD-only countries)', 'BEC': 'Europe & Central Asia (IBRD-only countries)', 'BEL': 'Belgium', 'BEN': 'Benin', 'BFA': 'Burkina Faso', 'BGD': 'Bangladesh', 'BGR': 'Bulgaria', 'BHI': 'IBRD countries classified as high income', 'BHR': 'Bahrain', 'BHS': 'Bahamas, The', 'BIH': 'Bosnia and Herzegovina', 'BLA': 'Latin America & the Caribbean (IBRD-only countries)', 'BLR': 'Belarus', 'BLZ': 'Belize', 'BMN': 'Middle East & North Africa (IBRD-only countries)', 'BMU': 'Bermuda', 'BOL': 'Bolivia', 'BRA': 'Brazil', 'BRB': 'Barbados', 'BRN'

In [5]:
# Function to rename columns based on the mapping
def rename_columns(df, mapping):
    # Create a new mapping for the existing columns in the DataFrame
    new_columns = {col: mapping.get(col, col) for col in df.columns}
    # Rename the columns using the new mapping
    df.rename(columns=new_columns, inplace=True)

# Rename columns in df1 and df2
rename_columns(df, countries)

In [6]:
df.tail(10)

Unnamed: 0,Year,Aruba,Africa Eastern and Southern,Afghanistan,Africa Western and Central,Angola,Albania,Arab World,Argentina,Armenia,...,Small states,Eswatini,Chad,East Asia & Pacific (IDA & IBRD countries),Europe & Central Asia (IDA & IBRD countries),Latin America & the Caribbean (IDA & IBRD countries),Middle East & North Africa (IDA & IBRD countries),South Asia (IDA & IBRD),Sub-Saharan Africa (IDA & IBRD countries),World
42,2013,,69.383102,,53.002781,,,73.276482,,,...,84.960602,,,94.432419,98.883049,92.282288,74.810059,68.054527,62.830429,84.860138
43,2014,,69.83477,,54.00676,66.030113,,77.577873,,,...,85.230263,,,94.892281,98.91011,92.741333,79.083733,68.793167,63.513378,85.433708
44,2015,,70.313423,,54.818321,,,74.981209,,,...,85.797142,,26.002991,95.098907,98.932503,92.909889,76.516083,69.757156,64.132278,85.60183
45,2016,,71.0952,,55.43792,,,76.386726,,99.744408,...,86.150352,,22.31155,95.262619,99.044769,93.403793,77.691162,71.104927,64.861397,86.061157
46,2017,,71.009071,,56.485538,,,77.170303,,99.736069,...,86.305801,,,95.429558,99.046692,93.473358,78.762627,71.790359,65.223244,86.288231
47,2018,,71.392616,,59.568459,,,74.286133,,,...,86.471458,,,95.529839,99.203148,93.610336,75.668541,72.081512,66.68364,86.33905
48,2019,,72.634972,,59.511719,,,74.603661,,,...,86.671227,,,95.912888,99.221291,94.071571,76.050842,71.942787,67.410858,86.489601
49,2020,97.989998,72.785622,,59.617512,,,75.022881,,99.788612,...,86.849854,89.279999,,96.224907,99.29792,94.098381,76.412781,72.658607,67.54763,86.71151
50,2021,,72.581161,37.266041,60.034611,,,75.231178,,,...,86.919403,,,96.251106,99.328598,94.363098,76.78054,73.42247,67.591927,86.852753
51,2022,,72.600403,,60.312698,72.400002,98.5,75.171532,,,...,87.019661,,27.280001,,99.349823,94.512901,77.176361,74.187759,67.715012,87.011749


In [7]:
df[['Year', 'World']]

Unnamed: 0,Year,World
0,1970,
1,1972,
2,1973,
3,1974,
4,1975,
5,1976,65.586548
6,1977,65.878326
7,1978,66.501038
8,1979,67.155006
9,1980,67.583382


### Export the data

In [8]:
df.to_csv(csv_file_path_literacy_rate)