# Notebook for the analysis of tHFA data

## Version control

Changes mades:

- Version 2.11
    - Vacancy: Added breakdowns for all the different categories
- Version 2.10
    - CLM: Included skip logic expected in data cleaning to the function based on availability of data enumerators and HW awareness of CLM
- Version 2.9
    - LAB: Change the calculation to assign a value of 0 to facilities without labs
    - LAB: Include columns to show the percentage of facilities in the three categories (with labs, testing w/o labs, no testing)
- Version 2.8
    - CLM: Changed code to collect more data for CLM on each of the six statements
    - testing capacity: Changed code to get results for all the 28 tests
    - CHW: Included something to see CHWs with any parts of a contract
    - Oxy: Included breakdown for all the different components
- Version 2.7 
    - Changed the record reviewed numbers to back to be based on the number of records as it is not possible to have blank records. This should only be applied for chw interviews and patient interviews 
- Version 2.6
    - Change all the record and interviews to go the 5 and not based on the number of recorded sessions
    - Revised vacancy scores to be nan when both the numerator and denominator are 0
- Version 2.5
    - patient-centeredness: removed the begin criteria so that all 5 patient interviews are checked for their values
    - Updated data for UGA and LBR
- Version 2.4
    - facility composite: Changed TB treatment to treat not documented and client still on treatment as np.nan.
- Version 2.3
    - ISS: Removing places where supervison occurs more than 3 months ago.
    - Gestation age: Changes to gestation age to make 98 (not recorded) return 0
    - Guidelines: Include and condition for guidelines for TB to require both guidelines for children and adults to be present
    - Composite score/DPT: Capped DPT score at 100
    - Included location rows to the pivot table output
- Version 2.2
    - Patient centeredness: Changed the way the no-problem function is applied to be whenever the necessary contains "Other_(Specify)" and not just is equal to it.
    - Patiend centeredness: Included question averaging in the output
    - Patient centeredness: Improved the treatment of nan/'' for questions 'D004' and 'D011'
    - Updated data to remove tertiary and secondary facilities with testing without labs
- Version 2.1
    - Guidelines: Separated the guide_hiv sub-domain to guide-hiv-test and guide_ART
    - Vacancy rate: Capped the score at 100 for places where the unfilled is larger than the funded posts
    - System-CHW: Included number of CHWs per facility
- Version 2.0
    - After verification of data with TGF
- Version 1.4
    - High Quality Services:
        - Patient-centeredness: changed the way that challenges were processed to include the list of responses that are not challenges
- Version 1.3 (Aug 9, 2024)
    - High Quality Services:
        - Patient-centeredness: Averaged the responses per client first and then averaged client scores for facilities
    - ISS indicator:  
        - technical content: Addressed error on this indicator. This error was based on a faulty estimation of the conduct of HTM by facilities. A faulty estimation that used the sum of ART, HIV-testing, TB, and malaria was used.
        - Summary Stats: Treated "DNK" as NaN instead of 0
    - CHW indicator: 
        - ISS: to specify that if HTM is conducted in a session then it is ISS and if CHW doesn't support HTM with at least 2 diseases results in ISS. 
        - contract and tools: "DNK" are treated as "No" as an interim step from TGF (As per conversation with Pankhuri on Aug 9, 2024)
    - Oxygen servcies: Indicator to only use "Observed" as 100 (Remove "Reported, Not seen") 
    - Lab testing: Indicator to count "DNK" as "No"
- Version 1.2 (Aug 5, 2024)
    - Fixed the problem with the ISS for CHWs having all 100. The formula was wrong and did not actually compare expected service areas with ISS service areas per month
    - There was a fix for the implementation of any(). This fix also caused small changes in the group-PS dimension of the iSS indicator.
- Version 1.1
    - Updated the outputs to include the underlying attributes
    - Used the QCed excel files provided by Devashish
    - Included a function to make the necessary pivot tables


- Version 1.0
    - Version used to start writing the report for tHFA in Malawi
    - Version shared with TGF for further comparison on Wed July 31
- Version 0.9
    - Updated oxygen function to correct formulas based on the tHFA Annex 1
    - Updated guidelines function to correct formula (there was a problem with using "value == target1 or target2")
    - Minor changes to facility score indicator
    - Changed high quality header name to "Quality_Score"
- Version 0.8
    - Included testing capacity indicator
- Version 0.7
    - Included all indicators except testing capacity
    - Sent to TGF for comparison


In [1]:
# input cells
code = 'IDN'
version = '2.11'

## Data loading

In [2]:
# file creation
countries = {
    'GHA': 'ghana',
    'GMB': 'gambia',
    'IDN': 'indonesia',
    'LBR': 'liberia',
    'MDG': 'madagascar',
    'MWI': 'malawi',
    'NGA': 'nigeria',
    'UGA': 'uganda',
    'COG': 'congo',
    'MOZ': 'mozambique'
}

country = countries[code]

import os
os.makedirs(f'./output/{code}/',exist_ok=True)

file_path = f"./Updated tHFA data/tHFA_{code}.xlsx"

pivot_out = f"./output/{code}/{country}_pivot_v{version}.csv"
indicators_out = f'./output/{code}/tHFA-{country}-analysis_v{version}.csv'
map_out = f"./output/{code}/{code}-facility-points.csv"

In [3]:
# package import and data loading
import pandas as pd
import numpy as np
import math

# Step 1: Load the CSV file into a pandas DataFrame

df = pd.read_excel(file_path, header=None)
metadata = dict(zip(df.iloc[1], df.iloc[0]))
df.columns = df.iloc[0]
df = df[1:]
df.set_index('Q100', inplace=True)
survey_details_df = df.iloc[:, :11]
facility_details_df = pd.concat([df.iloc[:, 11:23], df.iloc[:, 29:37]], axis=1)
facility_details_df.to_csv(map_out)
facility_details_df = facility_details_df.drop(columns=['FI_2','FI_3',"Q100a",'Q106_a','Q109-Latitude','Q109-Longitude','Q109-Altitude', 'Q109-Accuracy'])


test = df[:5]
test_fac_df = facility_details_df[:5]


## Indicator and service functions

### KPI Indicators

In [4]:
# Processing of no problems
file_name = "./reasons/no-problem-reasons_all.txt"
with open(file_name, 'r') as file:
    no_problems = file.read().splitlines()

def problem_check(reason):
    """
    Function to check if a given reason is in the no_problems list.
    
    Parameters:
    reason (str): The reason to check.
    
    Returns:
    int: 100 if the reason is in no_problems, 0 otherwise.
    """
    # Ensure the input reason is treated as a string
    see = str(reason)
    
    # Convert the string to lowercase
    see = see.lower()
    
    # Check if the lowercase string is in the no_problems list
    answer = 100 if see in no_problems else 0
    return answer



In [5]:
# Function for high quality service
def calculate_service_quality(row):
    
    """

   Description


    """

    def calculate_column_score(value, column):
        if column in ['tHFA_D004']:
            if (value == 'YES' and row[f'tHFA_D005_{i}'] != "DON’T KNOW"):
                return 100  
            elif value == "NO":
                return 0
            elif (value == 'YES' and row[f'tHFA_D005_{i}'] == "DON’T KNOW"):
                return 0
            elif value == "DON’T KNOW":
                return np.nan
            elif np.isnan(float(value)):
                return np.nan
            else: return np.nan
        elif column == 'tHFA_D006':
            if value == 'YES':  return 0 
            elif value == 'NO': return 100
            else: return np.nan
        elif column == 'tHFA_D011':
            value = str(value)
            if 'OTHERS_(SPECIFY)' in value:
                return problem_check(row[f'tHFA_D011_1_{i}'])
            elif value == "DON'T_KNOW" or value == 'nan':
                return np.nan
            else: return 0
        elif column in ['PREQ132', 'PREQ122', 'PREQ125', 'PREQ119', 'PREQ129']:
            scale = {'NEVER': 0, 'RARELY': 25, 'SOMETIMES': 50, 'OFTEN': 75, 'ALWAYS': 100}
            return scale.get(value, np.nan)
        else:
            if value == 'YES':
                return 100 
            elif value == 'NO':
                return 0
            else: return np.nan

    provides_ART = 1 if row['R2312'] == 'YES' else 0
    provides_TB = 1 if row['R2400'] == 'YES' else 0
    provides_ANC = 1 if row['R1810'] == 'YES' else 0



    # Calculate HW competence scores
    # ANC scores
    hw_scores = []
    anc_scores = []
    tb_scores = []
    art_scores = []

    if provides_ANC > 0:
        anc_count = sum (1 for col in [f'RR_ANC_Begin_{a}' for a in range(1,6)] if row[col] == 'YES')
        
        for i in range(1, anc_count + 1):
            client_anc_scores = []
            # Attribute 1: HIV testing and counselling
            if row[f'Q13009_1_{i}'] == 'YES':
                client_anc_scores.append(np.nan)
            elif row[f'tHFA_F003_{i}'] == 'YES':
                client_anc_scores.append(100)
            elif row[f'tHFA_F003_{i}'] == 'NO':
                client_anc_scores.append(0)
            else:
                client_anc_scores.append(np.nan)

            # Attribute 2: Access to ART
            if row[f'tHFA_F001_{i}'] == 'NO':
                client_anc_scores.append(np.nan)
            elif row[f'tHFA_F001_{i}'] == 'YES' and row[f'tHFA_F004_{i}'] == 'YES':
                client_anc_scores.append(100)
            elif row[f'tHFA_F001_{i}'] == 'YES' and row[f'tHFA_F004_{i}'] == 'NO':
                client_anc_scores.append(0)
            else:
                client_anc_scores.append(np.nan)

            # Attribute 3: Access to iptp
            if row[f'Q13007_3_{i}'] == 'YES':
                client_anc_scores.append(100)
            elif row[f'Q13007_3_{i}'] == 'NO':
                client_anc_scores.append(0)
            else: 
                client_anc_scores.append(np.nan)

            # Attribute 4: Access to TB screening
            tb_values = [row[f'TBSF002_A_{i}'],
                         row[f'TBSF002_B_{i}'], 
                         row[f'TBSF002_C_{i}'],
                         row[f'TBSF002_D_{i}']]
            if all(val == 'YES' for val in tb_values):
                client_anc_scores.append(100)
            else:
                client_anc_scores.append(0)
            
            # Attritube 5, 7, 8, 9, 10, 11
            # Integrated ANC and SRH, Blood pressure, Folic acid, Danger signs
            # Intestinal worms, and hemoglobin
            q = ['13005_1',
                    '13004_8',
                    '13004_10',
                    '13004_11',
                    '13005_2',
                    '13005_5']
            
            for attr in q: 
                if row[f'Q{attr}_{i}'] == 'YES':
                    client_anc_scores.append(100)
                elif row[f'Q{attr}_{i}'] == 'NO':
                    client_anc_scores.append(0)
                else: 
                    client_anc_scores.append(np.nan)

            # Attribute 6: ANC age
            # if row[f'Q13004_1_{i}']
            if int(row[f'Q13004_1_{i}']) == 98: 
                client_anc_scores.append(0)
            elif int(row[f'Q13004_1_{i}']) >= 32:
                client_anc_scores.append(100)
            elif int(row[f'Q13004_1_{i}']) < 32:
                client_anc_scores.append(0)
            else: 
                client_anc_scores.append(np.nan)
            
            client_anc_score = np.nanmean(client_anc_scores)
            anc_scores.append(client_anc_score)

    if provides_ART > 0:
        art_count = sum (1 for col in [f'RR_ART_Begin_{a}' for a in range(1,6)] if row[col] == 'YES')
        
        for i in range(1, art_count+1):
            client_art_scores = []
            # Attribute 1: TB screening
            tb_values = [row[f'Q13408_01_{i}'],
                        row[f'Q13408_02_{i}'], 
                        row[f'Q13408_03_{i}'],
                        row[f'tHFAG003_{i}']]
            if all(val == 'YES' for val in tb_values):
                client_art_scores.append(100)
            else:
                client_art_scores.append(0)
            
            #Attribute 2 : TB treatment for ART patients
            if row[f'Q13408_08_{i}'] == 'ACTIVE TB':
                client_art_scores.append(100) if row[f'Q13408_09_{i}'] == 'YES' else client_art_scores.append(0) 
            else: client_art_scores.append(np.nan)

            # Attribute 3: BP measured
            bp = 'tHFAG001'
            if row[f'{bp}_{i}'] == 'YES':
                client_art_scores.append(100)
            elif row[f'{bp}_{i}'] == 'NO':
                client_art_scores.append(0)
            else: 
                client_art_scores.append(np.nan)
            
            client_art_score = np.nanmean(client_art_scores)
            art_scores.append(client_art_score)
        
    if provides_TB >0 :
        tb_count = sum (1 for col in [f'RR_TB_Begin_{a}' for a in range(1,6)] if row[col] == 'YES')

        for i in range(1,tb_count+1):
            client_tb_scores=[]

            # Attribute 1 and 2:
            # HIV test and symptpms
            tb = ['Q13507_01',
                'Q13510_01']
        
            for attr in tb: 
                if row[f'{attr}_{i}'] == 'YES':
                    client_tb_scores.append(100)
                elif row[f'{attr}_{i}'] == 'NO':
                    client_tb_scores.append(0)
                else: 
                    client_tb_scores.append(np.nan)
            
            client_tb_score = np.nanmean(client_tb_scores)
            tb_scores.append(client_tb_score)
    
    
    anc_score = np.nanmean(anc_scores) if len(anc_scores) > 0 else np.nan
    art_score = np.nanmean(art_scores) if len(art_scores) > 0 else np.nan
    tb_score = np.nanmean(tb_scores) if len(tb_scores) > 0 else np.nan
    hw_scores = anc_scores + art_scores + tb_scores
    hw_score = np.nanmean(hw_scores) if len(hw_scores) > 0 else np.nan
   

    # Calculate Patient centeredness scores
    patient_centeredness_columns = [
        'PREQ132', # Communication
        'PREQ122', # Respect
        'PREQ125', # Autonomy
        'PREQ119', # Confidentiality
        'PREQ129', # Social support
        'tHFA_D003', # HW told
        'tHFA_D004', # Danger signs
        'tHFA_D006', # Rudeness
        'tHFA_D011' # Challenges
    ]
    patient_count = sum(1 for col in [f'PE_Begin_{i}' for i in range(1,6)] if row[col] == 'YES')
    pc_scores = []
    pc_scores_attr = []
    
    # for i in range(1, patient_count + 1):
    for i in range(1, 6):
        pc_i_scores = []
        for col in patient_centeredness_columns:
            interview_col = f'{col}_{i}'
            if interview_col in row:
                attr_score = calculate_column_score(row[interview_col], col)
                pc_i_scores.append(attr_score)
                pc_scores_attr.append(attr_score)

    
        pc_i_score = np.nanmean(pc_i_scores) if len(pc_i_scores) > 0 else np.nan
        pc_scores.append(pc_i_score)
    
    pc_score = np.nanmean(pc_scores) if len(pc_scores) > 0 else np.nan
    
    # Calculate facility score
    facility_score = np.nanmean([hw_score, pc_score])

    pc_attr_avg = {}
    ques = [
            'Communication',
            'Respect',
            'Autonomy',
            'Confidentiality' ,
            'Social support',
            'HW told',
            'Danger signs',
            'Rudeness',
            'Challenges',
    ]

    for i in range(9):
        indexes =  [j for j in range(i,len(pc_scores_attr),9)]
        values = [pc_scores_attr[idx] for idx in indexes]
        pc_attr_avg[f"PC_avg_{ques[i]}"] = np.nanmean(values)

    result = {
        'Quality_Score': facility_score,
        'HW_Competence_Score': hw_score,
        'HW_Check': hw_scores,
        'ANC_score': anc_score,
        'ART_score': art_score,
        'TB_score': tb_score,
        'Patient_Centeredness_Score': pc_score,
        'PC_Individuals': pc_scores,
        'PC_attributes': pc_scores_attr        
    }

    result.update(pc_attr_avg)

    return pd.Series(result)

    
# test = test.append(df.loc[['COG_000100',
# 'COG_000121',
# 'COG_000268']])
# see = test.apply(calculate_service_quality, axis=1)
# see


In [6]:
# Function for calculation of integrated HTM for ANC clients 

def calculate_HTM_ANC(row):
    
    """

    Pregnant women receive the following services in an integrated fashion: 
    HIV counseling and testing, access to ARVs for HIV positive women, IPTp, 
    and TB screening.


    """

    provides_ART = 1 if row['R2312'] == 'YES' else 0
    provides_TB = 1 if row['R2400'] == 'YES' else 0
    provides_malaria = 1 if row['R1400'] == 'YES' else 0
    provides_hiv_test = 1 if row['R2300'] == 'YES' else 0
    provides_anc = 1 if row['R1810'] == 'YES' else 0
    provides_testing = 1 if row['R2900'] == 'YES' else 0
    provides_oxygen = 1 if row["R1323"] == 'YES' else 0

       
    htm_anc_scores = []
    hiv_testing_scores = []
    ART_access_scores = []
    iptp_access_scores = []
    TB_screening_scores = []
    if provides_anc > 0:
        anc_count = sum(1 for col in ['RR_ANC_Begin_1', 'RR_ANC_Begin_2', 'RR_ANC_Begin_3', 'RR_ANC_Begin_4', 'RR_ANC_Begin_5'] if row[col] == 'YES')
        for i in range(1, anc_count+1):            
            client_anc_scores = []
            # Attribute 1: HIV testing and counselling
            if row[f'Q13009_1_{i}'] == 'YES':
                client_anc_scores.append(np.nan)
            elif row[f'tHFA_F003_{i}'] == 'YES':
                client_anc_scores.append(100)
            elif row[f'tHFA_F003_{i}'] == 'NO':
                client_anc_scores.append(0)
            else:
                client_anc_scores.append(np.nan)
            
            hiv_testing_scores.append(client_anc_scores[-1])

            # Attribute 2: Access to ART
            if row[f'tHFA_F001_{i}'] == 'NO':
                client_anc_scores.append(np.nan)
            elif row[f'tHFA_F001_{i}'] == 'YES' and row[f'tHFA_F004_{i}'] == 'YES':
                client_anc_scores.append(100)
            elif row[f'tHFA_F001_{i}'] == 'YES' and row[f'tHFA_F004_{i}'] == 'NO':
                client_anc_scores.append(0)
            else:
                client_anc_scores.append(np.nan)
            ART_access_scores.append(client_anc_scores[-1])

            # Attribute 3: Access to iptp
            if row[f'Q13007_3_{i}'] == 'YES':
                client_anc_scores.append(100)
            elif row[f'Q13007_3_{i}'] == 'NO':
                client_anc_scores.append(0)
            else: 
                client_anc_scores.append(np.nan)
            iptp_access_scores.append(client_anc_scores[-1])

            # Attribute 4: Access to TB screening
            tb_values = [row[f'TBSF002_A_{i}'],
                         row[f'TBSF002_B_{i}'], 
                         row[f'TBSF002_C_{i}'],
                         row[f'TBSF002_D_{i}']]
            if all(val == 'YES' for val in tb_values):
                client_anc_scores.append(100)
            else:
                client_anc_scores.append(0)
            TB_screening_scores.append(client_anc_scores[-1])

            client_anc_score = np.nanmean(client_anc_scores)
            htm_anc_scores.append(client_anc_score)
        

        htm_anc_score = np.nanmean(htm_anc_scores)

    else:
        htm_anc_score = np.nan

    hiv_testing_score = np.nanmean(hiv_testing_scores) if len(hiv_testing_scores)>0 else np.nan 
    ART_access_score = np.nanmean(ART_access_scores) if len(ART_access_scores)>0 else np.nan 
    iptp_access_score = np.nanmean(iptp_access_scores) if len(iptp_access_scores)>0 else np.nan 
    TB_screening_score = np.nanmean(TB_screening_scores) if len(TB_screening_scores)>0 else np.nan 

    return pd.Series({
        'HTM-ANC': htm_anc_score,
        # 'htm-anc-check': htm_anc_scores,
        'hiv testing': hiv_testing_score,
        'ART access': ART_access_score,
        'iptp scores': iptp_access_score,
        'TB screening': TB_screening_score
        })

# see = test.apply(calculate_HTM_ANC, axis=1)
# see

In [7]:
# Function to calculate system readiness

def calculate_system_CHW(row):
    
    """

   KPI S5: “Systems readiness for CHWs”


    """

    chw_count = sum(1 for col in ['CHW_begin_1', 'CHW_begin_2', 'CHW_begin_3', 'CHW_begin_4', 'CHW_begin_5'] if row[col] == 'YES')
    
    # add consent E002_1
    paid_columns = ['tHFA_E020','tHFA_E021', 'tHFA_E022', 'tHFA_E023']

    chw_scores = []

    chw_atttributes = {
        	'fac_supvsn_scores': [],
            'fac_contract_scores': [],
            'fac_contract_any_scores': [],
            'fac_paid_scores': [],
            'fac_jobtools_scores': [],
    }

    for i in range(1, 6):
        chw_i_score = np.nan
        chw_i_scores = []

        if row[f'tHFA_E002_{i}'] != 'YES':
            continue

        # Attribute 1 (ISS)
        expected = set([f'{b}' for b in ['A','B', 'C', 'D','E'] if row[f'tHFA_E003_{b}_{i}'] == 'YES'])
        month_1 = set([f'{b}' for b in ['A','B', 'C', 'D','E'] if row[f'tHFA_E005_{b}_{i}'] == 'YES'])
        month_2 = set([f'{b}' for b in ['A','B', 'C', 'D','E'] if row[f'tHFA_E007_{b}_{i}'] == 'YES'])
        month_3 = set([f'{b}' for b in ['A','B', 'C', 'D','E'] if row[f'tHFA_E009_{b}_{i}'] == 'YES'])
        htm = set(['A','B','C'])
        if htm.issubset(expected):
            chw_iss = 100 if any([htm.issubset(month_1),htm.issubset(month_2),htm.issubset(month_3)]) else 0
        else:
            chw_iss = 100 if any([len(expected.intersection(month_1))>= 2, len(expected.intersection(month_2))>= 2, len(expected.intersection(month_3))>= 2 ]) else 0
        chw_i_scores.append(chw_iss)
        chw_atttributes['fac_supvsn_scores'].append(chw_iss)

        # Attribute 2 (Presence of contract)
        contract = [f'tHFA_E0{b}_{i}' for b in range(10,20)]
        contract_scores = []
        for question in contract:
            if row[question]=='YES': contract_scores.append(100)
            # elif row[question]=='NO': contract_scores.append(0) # To treat 'DNK' as 0 
            else: contract_scores.append(0) # To treat 'DNK' as 0
        chw_contract = 100 if np.nanmean(contract_scores) == 100 else 0
        chw_contract_any = 100 if np.nanmean(contract_scores) > 0 else 0
        chw_i_scores.append(chw_contract)
        chw_atttributes['fac_contract_scores'].append(chw_contract)
        chw_atttributes['fac_contract_any_scores'].append(chw_contract_any)
        
        # Attribute 3 (Payment accoording to contract)
        paid_score = 0 
        for col in paid_columns:
            interview_col = f'{col}_{i}'
            if row[interview_col] == 'YES': 
                paid_score = paid_score + 1
        chw_paid = 100 if paid_score == 4 else 0
        chw_i_scores.append(chw_paid)
        chw_atttributes['fac_paid_scores'].append(chw_paid)

        # Attribute 4 (Job tools)
        tools = [f'tHFA_E0{b}_{i}' for b in [25,27,29]]
        tools_scores = []
        for question in tools:
            if row[question]=='YES': tools_scores.append(100)
            # elif row[question]=='NO': tools_scores.append(0) # To treat 'DNK' as 0 
            else: tools_scores.append(0) # To treat 'DNK' as 0 
        chw_tools = 100 if np.nanmean(tools_scores) == 100 else 0
        chw_i_scores.append(chw_tools)
        chw_atttributes['fac_jobtools_scores'].append(chw_tools)

        chw_i_score = np.nanmean(chw_i_scores)
        chw_scores.append(chw_i_score)

    chw_score = np.nanmean(chw_scores)
    
    chw_means = {}

    # Loop through the dictionary and calculate the mean for each score list
    for key, value in chw_atttributes.items():
        mean_key = key.replace('_scores', '_score')  # Create a new key for the mean
        chw_means[mean_key] = np.nanmean(value)


    return pd.Series({
        'system_CHW_score': chw_score,
        'chw_count': chw_count,
        'chw_show': chw_scores,
        'chw_iss': chw_means['fac_supvsn_score'],
        'chw_contract': chw_means['fac_contract_score'],
        'chw_contract_any': chw_means['fac_contract_any_score'],
        'chw_paid': chw_means['fac_paid_score'],
        'chw_tools': chw_means['fac_jobtools_score'],
        })

# see = test.apply(calculate_system_CHW, axis=1)
# see

In [8]:
# Function for integrated supportive supervision

def calculate_ISS(row):
    
    """

   Description


    """

    provides_ART = 1 if row['R2312'] == 'YES' else 0
    provides_TB = 1 if row['R2400'] == 'YES' else 0
    provides_immu = 1 if row['R2100'] == 'YES' else 0
    provides_imci = 1 if row['R2000'] == 'YES' else 0
    provides_malaria = 1 if row['R1400'] == 'YES' else 0
    provides_hiv_test = 1 if row['R2300'] == 'YES' else 0
    provides_anc = 1 if row['R1810'] == 'YES' else 0
    provides_testing = 1 if row['R2900'] == 'YES' else 0
    provides_oxygen = 1 if row["R1323"] == 'YES' else 0
    provides_hiv = 1 if provides_hiv_test + provides_ART > 0 else 0

    htm = [provides_hiv,provides_TB,provides_malaria]
    all = [provides_hiv,provides_TB,provides_malaria,provides_anc,provides_imci,provides_immu]
    fac_tech =[]
    fac_stats = []
    fac_group_PS = []
    fac_comms_data = []
    iss_scores = []
    
    # Attribute 1 (Supervision in the past 3 months)
    if row['M610'] == 'NO':
        iss_score = 0
        fac_iss = 0

    elif row['M610'] == 'YES'and row ['M611'] in ['4-12 MONTHS AGO', 'MORE THAN 12 MONTHS AGO']:
        iss_score = 0
        fac_iss = 0

    elif row['M610'] == 'YES'and row ['M611'] in ['DON’T KNOW', 'nan',]:
        iss_score = np.nan
        fac_iss = np.nan
        

    elif row['M610'] == 'YES' and row ['M611'] in ['WITHIN THE PAST MONTH',  '2-3 MONTHS AGO']:
        fac_iss = 100
        
        # Attribute 2 (Technical content)
        if sum(htm) == 3:
            iss_done  = sum(1 if any([row[f'tHFA_C00{i}_{j}'] == 'YES' for i in range(1, 7)]) else 0 for j in ['A', 'B', 'C'])
            iss_scores.append(100) if iss_done == 3 else iss_scores.append(0)
        elif sum(all) >= 3:
            iss_done = sum(1 if any([row[f'tHFA_C00{i}_{j}'] == 'YES' for i in range(1, 7)]) else 0 for j in ['A', 'B', 'C','D','E','F'])
            iss_scores.append(100) if iss_done >= 3 else iss_scores.append(0)
        else:
            iss_scores.append(np.nan)
        fac_tech.append(iss_scores[-1])
        

        # Attribute 3 (Summary stats discussed)
        if row['tHFA_C007'] == 'YES':
            iss_scores.append(100)
        elif row['tHFA_C007'] == 'DON’T KNOW':
            iss_scores.append(np.nan) 
        else: iss_scores.append(0)
        fac_stats.append(iss_scores[-1])
    
        # Attribute 4 (Group problem solving)
        if row['tHFA_C008'] == 'YES' and row['tHFA_C009'] == 'YES':
            iss_scores.append(100)
        elif any([row['tHFA_C008'] == "DON’T KNOW", row['tHFA_C009'] == "DON’T KNOW"]):
            iss_scores.append(np.nan)
        else: 
            iss_scores.append(0)

        fac_group_PS.append(iss_scores[-1])

        # Attribute 5 (Data on community activities)
        if row['tHFA_C010'] == 'NO':
            iss_scores.append(np.nan)
        elif row['tHFA_C010'] == 'YES' and row['tHFA_C011'] == 'DON’T KNOW':
            iss_scores.append(np.nan)
        elif row['tHFA_C010'] == 'YES' and row['tHFA_C011'] == 'NO':
            iss_scores.append(0)
        elif row['tHFA_C010'] == 'YES' and row['tHFA_C011'] == 'YES':
            iss_scores.append(100)
        else: 
            iss_scores.append(np.nan)
        
        fac_comms_data.append(iss_scores[-1])

        iss_score = np.nanmean(iss_scores)
    else: 
        iss_score = iss_scores = fac_iss = np.nan 
        fac_tech = np.nan
        fac_stats = np.nan
        fac_comms_data = np.nan
        fac_group_PS = np.nan
        

    return pd.Series({
        'iss_score': iss_score,
        'iss_check': iss_scores,
        'fac_iss': fac_iss,
        'fac_tech': np.nanmean(fac_tech),
        'fac_stats': np.nanmean(fac_stats),
        'fac_group_PS': np.nanmean(fac_group_PS),
        'fac_comms_data': np.nanmean(fac_comms_data),
        
        })

# test = test.append(df.loc['MWI_000491'])
# see = test.apply(calculate_ISS, axis=1)
# see

### Non-KPI indicators

In [9]:
# Function for calculation of facility composite indicator 

def calculate_facility_composite(row):
    
    """

    RSSH/PP HRH-6: Percentage of facilities providing effective services

    Composite facility level indicator with seven components: 
    1)	% of facilities observed to provide integrated services at ANC (TB, malaria, HIV) at the time of visit; 
    2)	Provider availability (absence rate on day of visit);
    3)	Provider caseload (number of outpatient visits per clinician per day);
    4)	ANC dropout rate; 
    5)	DPT dropout rate; 
    6)	Treatment completion rate for new TB cases;
    7)	Twelve-month retention on ART


    """

    provides_ART = 1 if row['R2312'] == 'YES' else 0
    provides_TB = 1 if row['R2400'] == 'YES' else 0
    provides_immu = 1 if row['R2100'] == 'YES' else 0
    provides_imci = 1 if row['R2000'] == 'YES' else 0
    provides_malaria = 1 if row['R1400'] == 'YES' else 0
    provides_hiv_test = 1 if row['R2300'] == 'YES' else 0
    provides_anc = 1 if row['R1810'] == 'YES' else 0
    provides_testing = 1 if row['R2900'] == 'YES' else 0
    provides_oxygen = 1 if row["R1323"] == 'YES' else 0

    outpatients_ystrdy = float(row['tHFA_B034'])
    clincians_present = float(row['tHFA_B026'])
    clinicians_today = float(row['tHFA_B025'])

    # Attribute 1 (Integrated HTM with ANC)
    if provides_anc > 0:
        htm_anc_fxn = calculate_HTM_ANC(row)
        htm_anc_score = htm_anc_fxn['HTM-ANC']
    else:
        htm_anc_score = np.nan


    # Attribute 2 (Clincian presence) - Annex 1 definition is wrong (it is not the absence rate that is asked)
    if clinicians_today > 0 and clinicians_today != 999 and clincians_present != 999:
        presence = (clincians_present/clinicians_today)*100
    else: presence = np.nan 


    # Attribute 3 (Caseload)
    if clincians_present > 0 and outpatients_ystrdy != 999 and clincians_present != 999:
        caseload = (outpatients_ystrdy/clincians_present)*5
        if caseload >100:
            caseload = 100 
    else: caseload = np.nan

    

    # Attribute 4 (ANC dropout rate)
    anc_scores = []
    if provides_anc > 0:
        anc_count = sum(1 for col in ['RR_ANC_Begin_1', 'RR_ANC_Begin_2', 'RR_ANC_Begin_3', 'RR_ANC_Begin_4', 'RR_ANC_Begin_5'] if row[col] == 'YES')
        for i in range(1, anc_count+1):
            if math.isnan(row[f'Q13004_7_{i}']):
                anc_scores.append(np.nan)
            elif int(row[f'Q13004_7_{i}']) == 99 or int(row[f'Q13004_7_{i}']) == 98 :
                anc_scores.append(np.nan)
            elif int(row[f'Q13004_7_{i}']) > 3 :
                anc_scores.append(100) 
            else: anc_scores.append(0)

        anc_score = np.nanmean(anc_scores)

    else:
        anc_score = np.nan


    # Attribute 5 (DTP dropout rate)
    DPT_dropout = np.nan
    if provides_immu > 0:
        if float(row['tHFA_B032']) != 999 and float(row['tHFA_B032']) != 0 and float(row['tHFA_B033']) != 999:
            DPT_dropout = (float(row['tHFA_B032']) - float(row['tHFA_B033']))/float(row['tHFA_B032'])
            DPT_score = (1-DPT_dropout) * 100
            if DPT_score > 100: DPT_score = 100
            else: DPT_score = DPT_score

        else:
            DPT_score = np.nan
    
    else: 
        DPT_score = np.nan


    # Attribute 6 (treatment completion rate for new TB clients)
    if provides_TB > 0:
        tb_count = sum(1 for col in ['RR_TB_Begin_1', 'RR_TB_Begin_2', 'RR_TB_Begin_3', 'RR_TB_Begin_4', 'RR_TB_Begin_5'] if row[col] == 'YES')
        tb_trtmnt_scores = []
        
        for i in range(1, tb_count+1):
            tb_trtmnt = np.nan
            if row[f'Q13506_04_{i}'] == 'YES, CLIENT WAS CURED OR COMPLETED TREATMENT':
                tb_trtmnt = 100
            elif row[f'Q13506_04_{i}'] in ['CLIENT STILL ON TREATMENT', 'NOT DOCUMENTED',] :
                tb_trtmnt = np.nan
            else: 
                tb_trtmnt = 0

            tb_trtmnt_scores.append(tb_trtmnt)
        
        tb_trtmnt_score = np.nanmean(tb_trtmnt_scores)
    
    else: 
        tb_trtmnt_score = np.nan
        tb_trtmnt_scores = []

    # Attribute for ART 12 month retention
    if provides_ART > 0:   
        art_count = sum(1 for col in ['RR_ART_Begin_1', 'RR_ART_Begin_2', 'RR_ART_Begin_3', 'RR_ART_Begin_4', 'RR_ART_Begin_5'] if row[col] == 'YES')
        art_retnt_scores = []
        for i in range(1, art_count+1):
            art_retnt = np.nan
            if row[f'tHFAG002_{i}'] == 'NO': 
                art_retnt = np.nan
            elif row[f'tHFAG002_{i}'] == 'YES' and row[f'tHFAG005_{i}'] == 'YES':
                art_retnt = 100
            else:
                art_retnt = 0
            
            art_retnt_scores.append(art_retnt)
        art_retnt_score = np.nanmean(art_retnt_scores)
    else:
        art_retnt_score = np.nan
        art_retnt_scores = []
    

    # Aggregation of scores
    attribute_scores = [htm_anc_score,presence,caseload,anc_score,DPT_score,tb_trtmnt_score,art_retnt_score]
    comp_score = np.nanmean(attribute_scores)
    
    
    return pd.Series({
        'facility_composite': comp_score,
        'HTM-ANC': htm_anc_score,
        'presence_score': presence,
        'caseload': caseload,
        'ANC_score': anc_score,
        'DPT_score': DPT_score,
        'TB_treatment_completion': tb_trtmnt_score,
        'ART_retention': art_retnt_score,
        
        })


# see = test.apply(calculate_facility_composite, axis=1)
# see

In [10]:
# Function for calculation of CHW paid on time and in full

def calculate_CHW_paid(row):
    
    """

    RSSH/PP HRH-3: Percentage of community health workers remunerated on time and in-full
    
    Paid is calculated with 'tHFA_E020','tHFA_E021', 'tHFA_E022', 'tHFA_E023' and all need to be present to have the value of 100

    """
    chw_count = sum(1 for col in ['CHW_begin_1', 'CHW_begin_2', 'CHW_begin_3', 'CHW_begin_4', 'CHW_begin_5'] if row[col] == 'YES')

    chw_paid_scores = []

    paid_columns = ['tHFA_E020','tHFA_E021', 'tHFA_E022', 'tHFA_E023']

    for i in range(1, 6):
        chw_i_score = 0
        
        if row[f'tHFA_E002_{i}'] != 'YES':
            continue
        
        for col in paid_columns:
            interview_col = f'{col}_{i}'
            if row[interview_col] == 'YES': 
                chw_i_score = chw_i_score + 1
        chw_paid = 100 if chw_i_score == 4 else 0 
        chw_paid_scores.append(chw_paid)

    chw_paid_score = np.nanmean(chw_paid_scores)
    return pd.Series({
        'CHW_count': chw_count,
        'CHW_paid_full': chw_paid_score,
        'CHW_scores': chw_paid_scores  
        })

# see = df.apply(calculate_CHW_paid, axis=1)
# see

In [11]:
# Function for calculation of oxygen therapy

def calculate_oxygen(row):
    """
    RSSH/PP RCS-1: Percentage of health facilities able to provide oxygen therapy related services
    
    Using only oxygen currently available in this unit and oxygen be brought to this unit from a different unit

    """
    provides_oxygen = 'YES' if row["R1323"] == 'YES' else "NO"
    obs_options = ['OBSERVED']
    
    tools = {
        'flowmeter': 'R1325_04',
        'humidifier': 'R1325_05',
        'oxygen_delivery': 'R1325_06',
        'pulse_oximeter': 'R1322_09',
    }

    tools_resp = {}
    
    if provides_oxygen == 'YES':
       avl = 100 if any ([row ['R1324'] == 'YES', row['R1326'] == 'YES']) and all([row[f'R1325_0{i}A'] in obs_options for i in ['4','5','6']]) and all([row[f'R1325_0{i}B'] == 'YES' for i in ['4','5','6']]) else 0
       avo = 100 if all([row[f'R1322_09A'] in obs_options , row[f'R1322_09B'] == 'YES' ]) else 0
       tools_resp['oxy_available'] = 100 if row ['R1324'] == 'YES' else 0
       tools_resp['oxy_deliverable'] = 100 if row['R1326'] == 'YES' else 0
       for key,value in tools.items():
           tools_resp[f'{key}_avl'] = 100 if row[f'{value}A'] in obs_options else 0
           tools_resp[f'{key}_func'] = 100 if row[f'{value}B'] == 'YES' else 0
    else:
        avl = np.nan
        avo = np.nan
    
    score = np.nanmean([avl,avo])

    output = {
        'oxygen_score': score,
        'oxygen_avl': avl,
        'oxygen_avo': avo,
    }

    output.update(tools_resp)

    return pd.Series(output)

# see = test.apply(calculate_oxygen, axis=1)
# see


In [12]:
# Function for the calculation of vacancy rate

def calculate_vacancy(row):
    """RSSH/PP HRH-1: Vacancy rate"""
    funded_posts = float(row['tHFA_B046'])
    unfilled_posts = float(row['tHFA_B047'])
    if (funded_posts == 0 and unfilled_posts == 0) or (funded_posts == 0):
        score = np.nan
    elif funded_posts != 999 and unfilled_posts != 999:
        score = ((unfilled_posts / funded_posts)) * 100 if unfilled_posts < funded_posts else np.nan
    else: score = np.nan 

    numbers = [float(row[f'tHFA_B0{i}']) for i in range(48,58)]
    
    return   pd.Series({
        'vacancy_score': score,
        'all_funded_posts': funded_posts,
        'all_unfilled_posts': unfilled_posts,
        'doctor_posts':numbers[0],
        'doctor_vacancies':numbers[1],
        'nurses_posts':numbers[2],
        'nurses_vacancies':numbers[3],
        'lab-tech_posts':numbers[4],
        'lab-tech_vacancies':numbers[5],
        'pharmacists_posts':numbers[6],
        'pharmacists_vacancies':numbers[7],
        'CHW_posts':numbers[8],
        'CHW_vacancies':numbers[9],
    })

# see = test.apply(calculate_vacancy, axis=1)
# see

In [13]:
# Function for evaluating testing capacity
def calculate_testing_capacity(row):
    
    """

    Description
    RSSH/PP LAB-5: “Percentage of health facilities that have an appropriate set of diagnostics 
    for their healthcare facility level, based on adapted WHO model list of essential in vitro diagnostics (EDL 3)”


    This needs data for the EDL tests from the 

    """

    # Responses for facilities with labs



    all_tests = ['Antibodies to HIV 1/2 (anti HIV Ab)',
    'Qualitative (EID) or quantitative (viral load) HIV nucleic acid test (NAT)',
    'CD4 cell enumeration',
    'Influenza A and B antigen',
    'Plasmodium spp. antigens; species-specific (e.g. HRP2) and/or pan species-specific (e.g. pan-pLDH)',
    'Antibodies to Treponema pallidum OR Combined antibodies to T. pallidum and HIV 1/2',
    'Lipoarabinomannan (LAM) antigen',
    'Staining procedures',
    'Culture OR Culture and Genus and species identification of bacteria and fungi',
    'Blood culture',
    'Antimicrobial susceptibility testing (AST)',
    'Liver profile/Liver Function test',
    'Renal function test',
    'Blood gas',
    'C-reactive protein (CRP)',
    'Glucose',
    'Complete blood count (CBC), automated',
    'Prothrombin time and international normalized ratio (PT/INR)',
    'SARS-CoV-2 test',
    'Hepatitis B surface antigen (HBsAg)',
    'Influenza A and B diagnostic',
    'Qualitative dengue virus nucleic acid test, Dengue virus IgM antibody or Dengue virus antigen (NS1)',
    'Qualitative nucleic acid test (NAT) for Chlamydia trachomatis (CT) and Neisseria gonorrhoeae (NG) infections',
    'Mycobacterium tuberculosis bacteria',
    'M. tuberculosis DNA',
    'Drug susceptibility testing of M. tuberculosis culture',
    'Measles nucleic acid test or Measle serology',
    'Vibrio cholerae antigen']

    ordered_tests = [
        #hiv
        'Antibodies to HIV 1/2 (anti HIV Ab)',
        'Qualitative (EID) or quantitative (viral load) HIV nucleic acid test (NAT)',
        'CD4 cell enumeration',
        'Antibodies to Treponema pallidum OR Combined antibodies to T. pallidum and HIV 1/2',

        # TB
        'Lipoarabinomannan (LAM) antigen',
        'Mycobacterium tuberculosis bacteria',
        'M. tuberculosis DNA',
        'Drug susceptibility testing of M. tuberculosis culture',

        # Malaria
        'Plasmodium spp. antigens; species-specific (e.g. HRP2) and/or pan species-specific (e.g. pan-pLDH)',

        # Bacterial and other infections
        'Staining procedures',
        'Culture OR Culture and Genus and species identification of bacteria and fungi',
        'Antimicrobial susceptibility testing (AST)',
        'C-reactive protein (CRP)',

        # Blood tests and general health
        'Complete blood count (CBC), automated',
        'Prothrombin time and international normalized ratio (PT/INR)',
        'Blood culture',
        'Blood gas',
        'Glucose',
        'Liver profile/Liver Function test',
        'Renal function test',

        # Others
        'SARS-CoV-2 test',
        'Hepatitis B surface antigen (HBsAg)',
        #flu
        'Influenza A and B antigen',
        'Influenza A and B diagnostic',

        'Qualitative dengue virus nucleic acid test, Dengue virus IgM antibody or Dengue virus antigen (NS1)',
        'Qualitative nucleic acid test (NAT) for Chlamydia trachomatis (CT) and Neisseria gonorrhoeae (NG) infections',
        'Measles nucleic acid test or Measle serology',
        'Vibrio cholerae antigen'
        ]

    # Responses for facilities without labs 
    all_facility_questions = [f'EDL_RESPONSE_0{i}' for i in range(1,10)] + [f'EDL_RESPONSE_{i}' for i in range(10,29)]
    remove_no_lab = [i for i in range(7,18)] + [i for i in range(21,27)]
    remove_lab = [27]

    test_capac_scores = []
    lab_score = np.nan
    no_lab_score = np.nan
    has_lab = 100 if row['R2900'] == 'YES, WITH LABORATORY IN THE FACILITY' else 0
    test_no_lab = 100 if row['R2900'] == 'YES, WITHOUT LABORATORY IN THE FACILITY' else 0 
    no_test = 100 if row['R2900'] == 'NO LABORATORY TESTS PERFORMED' else 0

    for i in range(len(all_tests)):
        if row[all_facility_questions[i]] == 'YES': test_capac_scores.append(100)
        # elif row[question] == 'NO': test_capac_scores.append(0)
        else: test_capac_scores.append(0)

    if row['R2900'] == 'YES, WITH LABORATORY IN THE FACILITY':
        for idx in remove_lab:
            test_capac_scores[idx] = np.nan
        lab_score = np.nanmean(test_capac_scores)

    else:
        for idx in remove_no_lab:
            test_capac_scores[idx] = np.nan
        no_lab_score = np.nanmean(test_capac_scores)

    # Commenting out the parts to treat facilities with testing w/o labs and facilities w/o testing differently

    # elif row['R2900'] == 'YES, WITHOUT LABORATORY IN THE FACILITY':
    #     for idx in remove_no_lab:
    #         test_capac_scores[idx] = np.nan
    #     no_lab_score = np.nanmean(test_capac_scores) 
        
    # elif row['R2900'] == 'NO LABORATORY TESTS PERFORMED':
    #     for idx in range(len(all_tests)):
    #         test_capac_scores[idx] = np.nan


    test_capac_score = np.nanmean(test_capac_scores)

    score_check_rand = {}

    for i in range(len(all_tests)):
        score_check_rand[all_tests[i]] = test_capac_scores[i]


    score_check = {key: score_check_rand[key] for key in ordered_tests }



    result = {
        'test-capac': test_capac_score,
        'fac_has_lab': has_lab,
        'fac_test_no_lab': test_no_lab,
        'fac_no_test': no_test,
        'score with lab': lab_score,
        'score without lab': no_lab_score,
        }
    
    result.update(score_check)


    return pd.Series(result)

# # add = list(df[df['R2900']!= "YES, WITH LABORATORY IN THE FACILITY"]['R2900'].index)
# # test = pd.concat([test,df.loc[add]])
# see = test.apply(calculate_testing_capacity, axis=1)
# see

In [14]:
# Function for guideline availability

def calculate_guideline_availability(row):
    
    """

   “Percentage of facilities with written and updated clinical guidelines for HIV, TB, malaria and/or 
   PHC (based on the services provided) developed by the national or sub-national government 
   (as appropriate by the country context)
   

    Returns a guideline availability score that is a np.nanmean of the scores for the relevant diseases 
    """

    provides_ART = 1 if row['R2312'] == 'YES' else 0
    provides_TB = 1 if row['R2400'] == 'YES' else 0
    provides_immu = 1 if row['R2100'] == 'YES' else 0
    provides_imci = 1 if row['R2000'] == 'YES' else 0
    provides_malaria = 1 if row['R1400'] == 'YES' else 0
    provides_hiv_test = 1 if row['R2300'] == 'YES' else 0
    provides_anc = 1 if row['R1810'] == 'YES' else 0
    provides_testing = 1 if row['R2900'] == 'YES' else 0
    provides_oxygen = 1 if row["R1323"] == 'YES' else 0

    positive_responses = ['YES, OBSERVED', 'YES, REPORTED, NOT SEEN' ]

    guidelines = [ provides_TB,
                provides_hiv_test, 
                  provides_ART,
                  provides_malaria,
                  provides_anc,
                  provides_immu,
                  provides_imci]
    
    guideline_cols = [ ['R2419_01','R2419_03'],
                        'R2308_01',
                      'R2315B_01',
                      
                      'R1406',
                      'R1817_01',
                      'R2119_01',
                      'R2005_01']
    guideline_scores = []
    if guidelines[0] > 0:
        if row[guideline_cols[0][0]] in positive_responses and row[guideline_cols[0][1]] in positive_responses:
            guideline_scores.append(100)
        else: guideline_scores.append(0)
    else: guideline_scores.append(np.nan)

    for i in range(1,len(guidelines)): 
        if guidelines[i] > 0 :
            if row[guideline_cols[i]] in positive_responses :
                guideline_scores.append(100)
            else: 
                guideline_scores.append(0)
        else: 
            guideline_scores.append(np.nan)
    
    
    guideline_score = np.nanmean(guideline_scores)

    return pd.Series({
        'guideline_score': guideline_score,
        # 'guideline_check': guideline_scores,
        'guide_HIV_test': guideline_scores[1],
        'guide_ART': guideline_scores[2],
        'guide_TB': guideline_scores[0],
        'guide_malaria': guideline_scores[3],
        'guide_anc': guideline_scores[4],
        'guide_immu': guideline_scores[5],
        'guide_imci': guideline_scores[6],
         })

# see = test.apply(calculate_guideline_availability, axis=1)
# see


In [15]:
# Function for  checking community-led mechanism 
# # Unclear how don't knows should be treated here
def calculate_CLM(row):
    
    """

   Indicator for CSS-3: “Percentage of health service delivery 
   sites with a community-led monitoring mechanism in place”

   Returns:
    0 - 100 based on number of strongly agree or agree responses

    """

    questions = ['tHFA_B021', 'tHFA_B022', 'tHFA_B023', 'tHFA_B024', 'tHFA_B035', 'tHFA_B045']
    # agree_count = sum(1 for q in questions if row[q] in ['STRONGLY AGREE', 'AGREE'])
    score_breakdown = []
    for i in range(len(questions)):
        agree_answer = 100 if row[questions[i]] in ['STRONGLY AGREE', 'AGREE'] else 0
        score_breakdown.append(agree_answer)
    
    if score_breakdown[0] != 100:
        score_breakdown[1] = score_breakdown[2]= score_breakdown[3]= score_breakdown[5] = 0
    if score_breakdown[2] != 100:
        score_breakdown[2] = 0
        score_breakdown [3] = 0
    score = np.nanmean(score_breakdown)


    return pd.Series({
        'CLM_score': score,
        'comm_data_collectors': score_breakdown[0],
        'quarterly_feedback': score_breakdown[1],
        'comm_data_analysis': score_breakdown[5],
        'CLM_aware': score_breakdown[2],
        'CLM_promotion': score_breakdown[3],
        'client_feedback': score_breakdown[4]
        })

# # test_gmb = df.loc[['GMB_000119', 'GMB_000117', 'GMB_000051', 'GMB_000092']] 
see = test.apply(calculate_CLM, axis=1)
see

Unnamed: 0_level_0,CLM_score,comm_data_collectors,quarterly_feedback,comm_data_analysis,CLM_aware,CLM_promotion,client_feedback
Q100,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
IDN_000335,100.0,100.0,100.0,100.0,100.0,100.0,100.0
IDN_000367,100.0,100.0,100.0,100.0,100.0,100.0,100.0
IDN_000010,100.0,100.0,100.0,100.0,100.0,100.0,100.0
IDN_000348,100.0,100.0,100.0,100.0,100.0,100.0,100.0
IDN_000168,100.0,100.0,100.0,100.0,100.0,100.0,100.0


### Service offered by facilities

In [16]:
# Function to show the services provided in each facility

def see_services(row):
    
    """

   Description


    """

    provides_ART = 1 if row['R2312'] == 'YES' else 0
    provides_TB = 1 if row['R2400'] == 'YES' else 0
    provides_immu = 1 if row['R2100'] == 'YES' else 0
    provides_imci = 1 if row['R2000'] == 'YES' else 0
    provides_malaria = 1 if row['R1400'] == 'YES' else 0
    provides_hiv_test = 1 if row['R2300'] == 'YES' else 0
    provides_anc = 1 if row['R1810'] == 'YES' else 0
    provides_testing = 1 if row['R2900'] == 'YES' else 0
    provides_HIV = 1 if provides_hiv_test + provides_ART > 0 else 0

    return pd.Series({
        'HIV': provides_HIV*100,
        'TB': provides_TB*100,
        'Malaria': provides_malaria*100,
        'ANC': provides_anc*100,
        'Immu': provides_immu*100,
    
        })

# services_df = facility_details_df

# services = df.apply(see_services, axis=1)
# services_df = pd.concat([services_df, services], axis=1)
# services_df

# services_df.to_csv('./tHFA-malawi-services_v1.0.csv')

## Function Implementation and Aggregation

In [17]:
functions = [
    see_services,
    calculate_service_quality,
    calculate_ISS,
    calculate_HTM_ANC,
    calculate_system_CHW,
    calculate_facility_composite,
    calculate_CHW_paid,
    calculate_CLM,
    calculate_guideline_availability,
    calculate_oxygen,
    calculate_testing_capacity,
    calculate_vacancy,    
    ]

In [18]:
# test apply the calculation to each row

test_indicators_df = test_fac_df
for func in functions: 
    see = test.apply(func, axis=1)
    test_indicators_df = pd.concat([test_indicators_df, see], axis=1)

  pc_attr_avg[f"PC_avg_{ques[i]}"] = np.nanmean(values)
  'fac_tech': np.nanmean(fac_tech),
  'fac_stats': np.nanmean(fac_stats),
  'fac_group_PS': np.nanmean(fac_group_PS),
  'fac_comms_data': np.nanmean(fac_comms_data),
  ART_access_score = np.nanmean(ART_access_scores) if len(ART_access_scores)>0 else np.nan
  ART_access_score = np.nanmean(ART_access_scores) if len(ART_access_scores)>0 else np.nan


In [19]:
# FULL Apply the calculation to each row
indicators_df = facility_details_df

for func in functions: 
    see = df.apply(func, axis=1)
    indicators_df = pd.concat([indicators_df, see], axis=1)

indicators_df

  pc_attr_avg[f"PC_avg_{ques[i]}"] = np.nanmean(values)
  pc_i_score = np.nanmean(pc_i_scores) if len(pc_i_scores) > 0 else np.nan
  pc_score = np.nanmean(pc_scores) if len(pc_scores) > 0 else np.nan
  facility_score = np.nanmean([hw_score, pc_score])
  'fac_tech': np.nanmean(fac_tech),
  'fac_stats': np.nanmean(fac_stats),
  'fac_group_PS': np.nanmean(fac_group_PS),
  'fac_comms_data': np.nanmean(fac_comms_data),
  ART_access_score = np.nanmean(ART_access_scores) if len(ART_access_scores)>0 else np.nan
  htm_anc_score = np.nanmean(htm_anc_scores)
  hiv_testing_score = np.nanmean(hiv_testing_scores) if len(hiv_testing_scores)>0 else np.nan
  chw_score = np.nanmean(chw_scores)
  chw_means[mean_key] = np.nanmean(value)
  ART_access_score = np.nanmean(ART_access_scores) if len(ART_access_scores)>0 else np.nan
  anc_score = np.nanmean(anc_scores)
  tb_trtmnt_score = np.nanmean(tb_trtmnt_scores)
  htm_anc_score = np.nanmean(htm_anc_scores)
  art_retnt_score = np.nanmean(art_retnt_scores)
  

Unnamed: 0_level_0,Q102,Q105,Q106,Q105_a,Q113,Q113_A,Q116,Q116_A,Q117,Q118,...,doctor_posts,doctor_vacancies,nurses_posts,nurses_vacancies,lab-tech_posts,lab-tech_vacancies,pharmacists_posts,pharmacists_vacancies,CHW_posts,CHW_vacancies
Q100,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
IDN_000335,PUSKESMAS SIDAREJA,Central Java,Kabupaten Cilacap,PRIMARY,HEALTH CENTRE,,LOCAL GOVERNMENT,,RURAL,BOTH OUT AND INPATIENT,...,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
IDN_000367,RS Umum Daerah Cilacap,Central Java,Kabupaten Cilacap,SECONDARY,DISTRICT HOSPITAL,,LOCAL GOVERNMENT,,URBAN,BOTH OUT AND INPATIENT,...,24.0,0.0,999.0,999.0,23.0,0.0,16.0,0.0,0.0,0.0
IDN_000010,PUSKESMAS MUMBULSARI,East Java,Kabupaten Jember,PRIMARY,HEALTH CENTRE,,LOCAL GOVERNMENT,,RURAL,BOTH OUT AND INPATIENT,...,8.0,2.0,37.0,1.0,2.0,0.0,2.0,0.0,999.0,999.0
IDN_000348,PUSKESMAS NUSAWUNGU II,Central Java,Kabupaten Cilacap,PRIMARY,HEALTH CENTRE,,LOCAL GOVERNMENT,,RURAL,BOTH OUT AND INPATIENT,...,4.0,0.0,35.0,1.0,2.0,0.0,2.0,0.0,0.0,0.0
IDN_000168,RS Umum Daerah Cibinong,West Java,Kabupaten Bogor,SECONDARY,REGIONAL/PROVINCIAL REFERRAL HOSPITAL,,LOCAL GOVERNMENT,,PERIURBAN,BOTH OUT AND INPATIENT,...,999.0,999.0,999.0,999.0,999.0,999.0,999.0,999.0,999.0,999.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
IDN_000320,RS Umum Daerah Koja,Jakarta,Kota Adm. Jakarta Utara,SECONDARY,DISTRICT HOSPITAL,,LOCAL GOVERNMENT,,URBAN,BOTH OUT AND INPATIENT,...,87.0,0.0,132.0,0.0,87.0,0.0,57.0,12.0,0.0,0.0
IDN_000213,PUSKESMAS PARAKAN SALAK,West Java,Kabupaten Sukabumi,PRIMARY,HEALTH CENTRE,,MINISTRY OF HEALTH,,URBAN,OUTPATIENT ONLY,...,2.0,1.0,24.0,0.0,1.0,1.0,1.0,1.0,143.0,43.0
IDN_000435,RS Umum Daerah Lawang,East Java,Kabupaten Malang,SECONDARY,DISTRICT HOSPITAL,,LOCAL GOVERNMENT,,RURAL,BOTH OUT AND INPATIENT,...,35.0,10.0,264.0,0.0,15.0,4.0,7.0,3.0,0.0,0.0
IDN_000434,RS Umum Daerah Kanjuruhan Kepanjen Kab. Malang,East Java,Kabupaten Malang,SECONDARY,DISTRICT HOSPITAL,,LOCAL GOVERNMENT,,RURAL,BOTH OUT AND INPATIENT,...,77.0,30.0,359.0,50.0,27.0,0.0,10.0,2.0,0.0,0.0


### Aggregation

In [20]:
def aggregate_indicators(df, group_by_columns):
    # Select only numeric columns for the mean aggregation
    numeric_cols = df.select_dtypes(include=['number']).columns
    size_df = df.groupby(group_by_columns).size().rename('Facility count')
    mean_df = df.groupby(group_by_columns)[numeric_cols].mean()
    out = pd.concat([size_df, mean_df], axis=1)
    # out = out.reset_index()
    return  out

aggregated_by_type = aggregate_indicators(indicators_df, ['tHFA_A001','Q117'])
aggregated_by_location = aggregate_indicators(indicators_df, ['Q117'])
aggregated_by_ty = aggregate_indicators(indicators_df, ['tHFA_A001'])

In [21]:
for idx in aggregated_by_ty.index:
    see = pd.DataFrame()
    see = aggregated_by_ty.loc[[idx]].copy().reset_index()
    see['Q117'] = "General"
    see = see.set_index(['tHFA_A001','Q117'])
    aggregated_by_type = pd.concat([aggregated_by_type, see])

for idx in aggregated_by_location.index:
    see = pd.DataFrame()
    see = aggregated_by_location.loc[[idx]].copy().reset_index()
    see['tHFA_A001'] = "General"
    see = see.set_index(['tHFA_A001','Q117'])
    aggregated_by_type = pd.concat([aggregated_by_type, see])

aggregated_by_type

Unnamed: 0_level_0,Unnamed: 1_level_0,Facility count,HIV,TB,Malaria,ANC,Immu,Quality_Score,HW_Competence_Score,ANC_score,ANC_score,...,doctor_posts,doctor_vacancies,nurses_posts,nurses_vacancies,lab-tech_posts,lab-tech_vacancies,pharmacists_posts,pharmacists_vacancies,CHW_posts,CHW_vacancies
tHFA_A001,Q117,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
PRIMARY,PERIURBAN,3,100.0,100.0,66.666667,100.0,100.0,93.422138,88.464646,72.060606,100.0,...,2.666667,0.0,17.0,2.0,1.333333,0.333333,1.0,0.333333,61.666667,7.666667
PRIMARY,RURAL,33,100.0,100.0,81.818182,100.0,93.939394,87.314254,82.97289,69.5,66.923077,...,3.757576,91.242424,63.060606,62.272727,31.848485,60.818182,31.454545,91.0,205.939394,158.424242
PRIMARY,URBAN,29,100.0,100.0,62.068966,100.0,96.551724,89.528893,84.890756,76.744108,89.52381,...,42.275862,173.0,94.448276,177.172414,37.482759,172.896552,37.241379,138.241379,178.206897,244.137931
SECONDARY,PERIURBAN,3,100.0,100.0,33.333333,100.0,33.333333,90.071128,86.484848,76.121212,100.0,...,386.0,333.0,501.333333,333.0,350.333333,333.0,343.333333,333.0,333.0,333.0
SECONDARY,RURAL,11,100.0,100.0,63.636364,100.0,45.454545,87.340278,82.794444,73.011111,55.0,...,137.909091,95.727273,239.090909,97.818182,107.0,91.909091,98.0,91.909091,91.363636,91.363636
SECONDARY,URBAN,38,100.0,100.0,73.684211,100.0,65.789474,90.131959,86.645668,71.324387,73.529412,...,134.394737,214.894737,300.368421,251.763158,95.210526,211.815789,92.789474,212.026316,289.605263,289.421053
TERTIARY,URBAN,3,100.0,100.0,100.0,100.0,100.0,87.740741,84.740741,64.222222,100.0,...,331.666667,721.0,1070.333333,773.0,410.333333,999.0,389.0,674.666667,333.0,333.0
PRIMARY,General,65,100.0,100.0,72.307692,100.0,95.384615,88.569466,84.042887,72.576956,78.4,...,20.892308,123.507692,74.938462,110.753846,32.953846,108.030769,32.630769,107.892308,186.907692,189.707692
SECONDARY,General,52,100.0,100.0,69.230769,100.0,59.615385,89.537902,85.833278,71.975589,72.0,...,149.653846,196.5,299.0,223.884615,112.423077,193.442308,108.346154,193.596154,250.173077,250.038462
TERTIARY,General,3,100.0,100.0,100.0,100.0,100.0,87.740741,84.740741,64.222222,100.0,...,331.666667,721.0,1070.333333,773.0,410.333333,999.0,389.0,674.666667,333.0,333.0


In [22]:
mean = pd.Series()
mean.at['Facility count'] = len(indicators_df)
numeric_cols_full = indicators_df.select_dtypes(include=['number']).columns
if code in ["UGA"]:
    sharp = indicators_df[numeric_cols_full].mean()
elif code in ['GMB','IDN','COG','MOZ']:
    sharp = indicators_df[numeric_cols_full].mean()
else: sharp = indicators_df[numeric_cols_full].mean()
mean  = pd.concat([mean,sharp])
mean.name = ('COUNTRY', 'AVERAGE')
mean

Facility count           120.000000
HIV                      100.000000
TB                       100.000000
Malaria                   71.666667
ANC                      100.000000
                            ...    
lab-tech_vacancies       167.316667
pharmacists_posts         74.350000
pharmacists_vacancies    159.200000
CHW_posts                217.975000
CHW_vacancies            219.433333
Name: (COUNTRY, AVERAGE), Length: 128, dtype: float64

In [23]:
aggregated_by_type = pd.concat([aggregated_by_type, mean.to_frame().T])
aggregated_by_type= aggregated_by_type.sort_index()
aggregated_by_type

Unnamed: 0,Unnamed: 1,Facility count,HIV,TB,Malaria,ANC,Immu,Quality_Score,HW_Competence_Score,ANC_score,ANC_score.1,...,doctor_posts,doctor_vacancies,nurses_posts,nurses_vacancies,lab-tech_posts,lab-tech_vacancies,pharmacists_posts,pharmacists_vacancies,CHW_posts,CHW_vacancies
COUNTRY,AVERAGE,120.0,100.0,100.0,71.666667,100.0,80.0,88.971756,84.821934,72.086685,76.122449,...,84.458333,170.075,196.916667,176.333333,76.825,167.316667,74.35,159.2,217.975,219.433333
General,PERIURBAN,6.0,100.0,100.0,50.0,100.0,66.666667,91.746633,87.474747,74.090909,100.0,...,194.333333,166.5,259.166667,167.5,175.833333,166.666667,172.166667,166.666667,197.333333,170.333333
General,RURAL,44.0,100.0,100.0,77.272727,100.0,81.818182,87.32076,82.931391,70.335979,64.117647,...,37.295455,92.363636,107.068182,71.159091,50.636364,68.590909,48.090909,91.227273,177.295455,141.659091
General,URBAN,70.0,100.0,100.0,70.0,100.0,80.0,89.783271,85.843441,73.07869,80.689655,...,104.685714,219.228571,248.057143,243.2,84.8,229.428571,82.471429,201.285714,245.314286,272.528571
PRIMARY,General,65.0,100.0,100.0,72.307692,100.0,95.384615,88.569466,84.042887,72.576956,78.4,...,20.892308,123.507692,74.938462,110.753846,32.953846,108.030769,32.630769,107.892308,186.907692,189.707692
PRIMARY,PERIURBAN,3.0,100.0,100.0,66.666667,100.0,100.0,93.422138,88.464646,72.060606,100.0,...,2.666667,0.0,17.0,2.0,1.333333,0.333333,1.0,0.333333,61.666667,7.666667
PRIMARY,RURAL,33.0,100.0,100.0,81.818182,100.0,93.939394,87.314254,82.97289,69.5,66.923077,...,3.757576,91.242424,63.060606,62.272727,31.848485,60.818182,31.454545,91.0,205.939394,158.424242
PRIMARY,URBAN,29.0,100.0,100.0,62.068966,100.0,96.551724,89.528893,84.890756,76.744108,89.52381,...,42.275862,173.0,94.448276,177.172414,37.482759,172.896552,37.241379,138.241379,178.206897,244.137931
SECONDARY,General,52.0,100.0,100.0,69.230769,100.0,59.615385,89.537902,85.833278,71.975589,72.0,...,149.653846,196.5,299.0,223.884615,112.423077,193.442308,108.346154,193.596154,250.173077,250.038462
SECONDARY,PERIURBAN,3.0,100.0,100.0,33.333333,100.0,33.333333,90.071128,86.484848,76.121212,100.0,...,386.0,333.0,501.333333,333.0,350.333333,333.0,343.333333,333.0,333.0,333.0


## Saving 

In [24]:
aggregated_by_type.to_csv(pivot_out)
indicators_df.to_csv(indicators_out)

print(f"Data processing completed and files saved to {pivot_out} and {indicators_out}")

Data processing completed and files saved to ./output/IDN/indonesia_pivot_v2.11.csv and ./output/IDN/tHFA-indonesia-analysis_v2.11.csv
