# Namibia's Master Facility List: Obtaining the data via API

A master facility list (MFL) contains the full complement of health facilities of a country, including those in public as well as private sector. The list should include the geographic location of each facility along with other attributes such as the type of facility (e.g. hospital, clinic, pharmacy, or other), services rendered, infrastructure available, and more. The World Health Organisation published a [guide for the development and strengthening of MFLs](https://www.who.int/healthinfo/country_monitoring_evaluation/mfl/en/).

A number of African countries have started to develop master health facility lists. Namibia, amongst others, have taken the extra steps to make the data available online via an application programming interface (API).

As part of work around MFLs in Africa, we accessed the [Namibian MFL](https://www.who.int/healthinfo/country_monitoring_evaluation/mfl/en/) API using the Python code below. This data will be used in further comparisons with other MFLs from Africa.

In [3]:
# Import packages to access the data
import json
import requests
import pandas as pd
# Needed to ensure we don't access the API too often
from time import sleep

## API access via Python

### SSL trouble

Despite the fact that the API can be accessed directly from a web browser without errors, it was not possible to follow the direct route to download the data in Python or R. The code below was suggested by [Peter van Heusden](https://twitter.com/pvanheus?lang=en) to circumvent the challenges experienced due to troublesome SSL configuration on the side of the API.

### Data structure

It is necessary to make two calls to the API to access detailed information about each facility listed in the MFL. The first call gets `facilities.json` that contains a list of facilities and their location but most importantly, allows us to access the ID of each facility. In a subsequent API call, we can use the facility ID to get a JSON file containing detailed information about each facility, including services and infrastructure available.

In the code below we will download a JSON file as Python dictionary. The JSON contains several layers of nesting. A single key `locations` contains a list of dictionaries with keys `id`, `parent_id`, `name`, `long_name`, `contact_person`, `phone_number`, `alt_phone_number`, `address`, `catchment_population`, `point_x`, `point_y`, `location_type`, `location_ownership`, `children`. In this dictionary the `id` refers to the ID of the region.

Each `locations`.`children` contains another dictionary called `children` that contains keys `id`, `parent_id`, `long_name`, `contact_person`, `phone_number`, `alt_phone_number`, `address`, `catchment_population`, `point_x`, `point_y`, `location_type`, `location_ownership`, `children`.

In the first API call we aim to get a list of `locations.children.id` since this is the actual ID of the facility itself.

In [4]:
# Code created by Peter van Heusden (helped to overcome SSL problem)

# This URL gives access to rudimentary information about each facility including the ID
url = 'https://mfl.mhss.gov.na/api/facilities.json'
response = requests.get(url, verify=False)
# Data is a python dictionary
data = response.json()



In [5]:
# This function is needed to gain access to the third layer dictionary where we find the 
# facility IDs as explained in the text above
# The code was developed by Brett Mullins and kindly shared on the Internet in June 2019
# https://bcmullins.github.io/parsing-json-python/


def extract_element_from_json(obj, path):
    '''
    Extracts an element from a nested dictionary or
    a list of nested dictionaries along a specified path.
    If the input is a dictionary, a list is returned.
    If the input is a list of dictionary, a list of lists is returned.
    obj - list or dict - input dictionary or list of dictionaries
    path - list - list of strings that form the path to the desired element
    '''
    def extract(obj, path, ind, arr):
        '''
            Extracts an element from a nested dictionary
            along a specified path and returns a list.
            obj - dict - input dictionary
            path - list - list of strings that form the JSON path
            ind - int - starting index
            arr - list - output list
        '''
        key = path[ind]
        if ind + 1 < len(path):
            if isinstance(obj, dict):
                if key in obj.keys():
                    extract(obj.get(key), path, ind + 1, arr)
                else:
                    arr.append(None)
            elif isinstance(obj, list):
                if not obj:
                    arr.append(None)
                else:
                    for item in obj:
                        extract(item, path, ind, arr)
            else:
                arr.append(None)
        if ind + 1 == len(path):
            if isinstance(obj, list):
                if not obj:
                    arr.append(None)
                else:
                    for item in obj:
                        arr.append(item.get(key, None))
            elif isinstance(obj, dict):
                arr.append(obj.get(key, None))
            else:
                arr.append(None)
        return arr
    if isinstance(obj, dict):
        return extract(obj, path, 0, [])
    elif isinstance(obj, list):
        outer_arr = []
        for item in obj:
            outer_arr.append(extract(item, path, 0, []))
        return outer_arr

In [6]:
# Create a list with all facility IDs

facility_ids = extract_element_from_json(data, ["locations", "children", "children", "id"])

### Getting the actual facility details

Now that we have a list of facility IDs, we can do a second API call that references each facility and gets us the following information: `id`, `name`, `long_name`, `contact_person`, `phone_number`, `alt_phone_number`, `address`, `catchment_population`, `point_x`, `point_y` with sub_dictionaries: 

- `parent_location` with keys `id`, `name`
- `location_type` with keys `name`
- `location_ownership` with keys `name`
- `infrastructures` with sub_dictionaries each with keys `id`, `name`
- `services` with sub_dictionaries each with keys `id`, `name`

`infrastructures` can include Ambulances, Beds, Electricity, Running Water, Toilets, Computers, Vehicles, Enrolled Nurses, Registered Nurses, Doctors, etc.

`services` can include HIV Testing Services, General Clinical Service, Expanded Programme on Immunizations, Preventing Mother to Child Transmission Services, Viral Load Testing, Sexual Transmitted Infections, Anti Retroviral Therapy, Ante Natal Clinic Services, Family Planning Services, and more.

In [7]:
# Create emtpy list to append individual dataframes - 1 dataframe per facility
facility_list = []

# Iterate over all facility IDs identified in previous step
for id in facility_ids:
    # Construct API call from facility ID identified in earlier step
    facility_url = "https://mfl.mhss.gov.na/api/facilities/id/" + str(id) + ".json"
    
    # Request data from API
    facility_response = requests.get(facility_url, verify=False)
    
    # Convert to dict object
    facility_data = facility_response.json()
    
    # Convert contents from 'facility_details' dict to dataframe otherwise each dataframe consist of a single column
    fac_df = pd.DataFrame(facility_data["facility_details"])
    
    # Add new dataframe to list of dataframes
    facility_list.append(fac_df)
    
    # Wait 1 second between requests to API (there are no guidelines about using the API)
    sleep(1)

    
# Convert list of dataframes to single dataframe
all_facilities = pd.concat(facility_list)













































In [8]:
# Check that we actually have a row for each facility
len(all_facilities) == len(facility_ids)

True

In [9]:
#Check the data formatting - make sure this is something sensible
all_facilities.head()

Unnamed: 0,address,alt_phone_number,catchment_population,contact_person,id,infrastructures,location_ownership,location_type,long_name,name,parent_location,phone_number,point_x,point_y,services
0,,0,,,11981,[],{'name': 'Public_MoHSS'},{'name': 'Facility'},,Zambezi Regional Health Office,"{'id': 10598, 'name': 'Katima Mulilo District'}",,-17.4994,24.2788,[]
0,,0,4111.0,,10131,"[{'id': 1, 'name': 'Ambulances'}, {'id': 2, 'n...",{'name': 'Public_MoHSS'},{'name': 'Facility'},,Sibbinda Health Centre,"{'id': 10598, 'name': 'Katima Mulilo District'}",,-17.7851,23.8212,"[{'id': 1, 'name': 'HIV Testing Services'}, {'..."
0,,0,5156.0,,10130,"[{'id': 5, 'name': 'Electricity'}, {'id': 6, '...",{'name': 'Public_MoHSS'},{'name': 'Facility'},,Sesheke Clinic,"{'id': 10598, 'name': 'Katima Mulilo District'}",,-17.7518,23.3975,"[{'id': 1, 'name': 'HIV Testing Services'}, {'..."
0,,0,1844.0,,10129,"[{'id': 4, 'name': 'Mortuary'}, {'id': 5, 'nam...",{'name': 'Public_MoHSS'},{'name': 'Facility'},,Schuckmansburg Clinic,"{'id': 10598, 'name': 'Katima Mulilo District'}",,-17.5487,24.815,"[{'id': 1, 'name': 'HIV Testing Services'}, {'..."
0,,0,4791.0,,10128,"[{'id': 1, 'name': 'Ambulances'}, {'id': 2, 'n...",{'name': 'Public_MoHSS'},{'name': 'Facility'},,Sangwali Health Centre,"{'id': 10598, 'name': 'Katima Mulilo District'}",,-18.2649,23.6375,"[{'id': 1, 'name': 'HIV Testing Services'}, {'..."


In [10]:
# Write df to CSV for further analysis alongside other African MFLs in the R script

all_facilities.to_csv('../data/raw_data/namibia.csv', index=False, encoding='utf-8')