# Optimizing Malaria Care in Kenya: An Unsupervised Learning Approach for Segmenting Health Facility Preparedness
***


## BUSINESS UNDERSTANDING
***
### Business Overview
Malaria remains a major public health challenge in many countries, significantly impacting morbidity and mortality rates, particularly among vulnerable populations. From a study in 2022, nearly 3.42 million cases of malaria were confirmed in Kenya along with some 219 deaths. The National Malaria Control Program (NMCP) is the organization committed to reducing malaria-related deaths through improved healthcare delivery, effective treatment protocols, and strengthened health facility preparedness in Kenya. This project leverages nationwide survey data to assess the quality of malaria care focusing on facility readiness, health worker competencies, and patient outcomes. By employing advanced analytics and deep learning techniques, the project aims to provide actionable insights that can drive targeted interventions and resource allocation.


### Problem Statement
The National Malaria Control Program (NMCP), the primary stakeholder in this initiative, is tasked with reducing malaria morbidity and mortality by 75% relative to 2016 levels by 2029. However, recent funding pauses from major donors such as USAID and WHO have intensified the need for targeted, cost-effective interventions. These pauses have further limited the NMCP’s capacity to expand interventions. Existing monitoring systems, which rely on high-dimensional survey data, lack the predictive precision and granularity needed to pinpoint specific deficiencies and inform targeted, cost-effective improvements. NMCP decision-makers require a precise, data-driven method to:
•	Identify which health facilities are underperforming in terms of preparedness and case management.
•	Prioritize limited resources and design interventions that address specific weaknesses.
•	Monitor performance improvements over time and adjust strategies rapidly.


### Proposed Solution
#### Methodology Overview:
The project will use a two-step modeling approach:
•	Step 1: Autoencoder Development
A neural network-based autoencoder will be constructed and trained on the multi-dimensional survey data. This model will compress the data into a latent representation that captures essential, non-linear relationships among key indicators (infrastructure, supply chain, health worker competencies). Success will be measured by a low reconstruction loss (target MSE ≤ 0.015 on normalized data).
•	Step 2: Clustering in the Latent Space
The latent features obtained from the autoencoder will serve as input to a clustering algorithm (e.g., K-Means). Clusters will be evaluated using metrics such as the silhouette score (target ≥ 0.55) and Davies-Bouldin index (target < 1.0), ensuring well-separated and compact groups.


### Main Objective
Develop an unsupervised learning pipeline that leverages a neural network–based autoencoder to learn a compact, latent representation of the survey data. Subsequently, apply a clustering algorithm (e.g., K-Means) on these latent features to identify distinct groups. This model aims to capture the complex non-linear relationships in the data and produce actionable segments for targeted interventions.


### Success Criteria
Autoencoder Reconstruction (MSE):
•	Aim for an MSE of 0.015 or lower on normalized validation data, meaning the autoencoder accurately rebuilds the input.
Clustering Quality (Silhouette Score):
•	Target an average silhouette score of 0.55 or higher (ideally around 0.60) to ensure clusters are well separated.
Cluster Stability:
•	Achieve a cluster assignment consistency (e.g., measured by Jaccard similarity) of 0.8 or higher across different runs.



### Specific Objectives
1.	Assess Facility Preparedness:
Evaluate the readiness of health facilities by integrating data on infrastructure (e.g., electricity, water, equipment availability) and medication stocks, laboratory stocks and training indicators
2.	Investigate what latent (hidden) factors underlie the observed variability in facility performance that traditional linear models might overlook?
3.	Evaluate Health Worker Competence:
Analyze survey responses on training, treatment knowledge, and experience to score health workers and identify areas where further training is needed.
4.	Analyze Patient Outcomes and Satisfaction:
Utilize exit survey data to determine patient treatment outcomes and satisfaction levels. 
5.	Identify Regional Patterns and Key Drivers:
Examine how facility preparedness and health worker performance vary by region or facility type, highlighting the main factors that influence these differences to support targeted interventions.


## DATA UNDERSTANDING
The data originates from a National Annual Quality of Care Survey conducted by the National Malaria Control Program (NMCP) in Kenya. This survey is administered annually to assess various aspects of malaria care quality across the country.
The survey collects comprehensive information from multiple perspectives, including facility preparedness, health worker knowledge, and patient experiences. 
The datasets provided include:
1.	Health Facility Questionnaires (hf1.xlsx, hf2.xlsx, hf3.xlsx):
These three files represent different sections of a comprehensive survey on health facility preparedness for malaria care. They include details on infrastructure (electricity, water, equipment), medication stocks, laboratory capacities, logistics, and adherence to treatment protocols.
Data Types:
Categorical/Binary: Many responses (e.g., yes/no for equipment functionality, presence of guidelines)
Ordinal/Rating: Some indicators are provided as ratings or levels (e.g., facility level, staff qualifications)
Continuous/Numerical: Counts (e.g., number of medication packs, patient load) and dates (e.g., last supervisory visit).
A unique facility identifier (originally noted as P_HF) appears in all three files.
2.	Health Worker Questionnaire (hw.xlsx):
This dataset contains information on individual health workers, including demographics, training records, and a knowledge assessment related to malaria treatment protocols.
Data Types:
Numerical: Knowledge assessment scores, years of experience
Categorical: Cadre, type of training received, gender, medication to be administered.
It provides context on the human factors that can influence facility performance.
3.	Exit Survey Data (exit.xlsx):
This dataset captures patient-level information such as demographics, treatment received, and satisfaction levels. It offers critical insights into patient outcomes and service quality.
Data Types:
Categorical: Patient sex, diagnosis, treatment outcome.
Numerical: Age, sometimes quantitative satisfaction ratings.



In [None]:
import polars as pl
import numpy as np


In [3]:
outpatient_hf = pl.read_excel("fwdmalariahealthfacilityassessmentdatasubmittedasat\Outpatient-Form-1-Health-Facility-Assessment.xlsx")
outpatient_hf.head()

Could not determine dtype for column 2, falling back to string
Could not determine dtype for column 3, falling back to string
Could not determine dtype for column 16, falling back to string
Could not determine dtype for column 21, falling back to string
Could not determine dtype for column 27, falling back to string
Could not determine dtype for column 31, falling back to string
Could not determine dtype for column 37, falling back to string
Could not determine dtype for column 38, falling back to string
Could not determine dtype for column 60, falling back to string
Could not determine dtype for column 61, falling back to string
Could not determine dtype for column 74, falling back to string
Could not determine dtype for column 75, falling back to string
Could not determine dtype for column 88, falling back to string
Could not determine dtype for column 94, falling back to string
Could not determine dtype for column 104, falling back to string
Could not determine dtype for column 113,

SubmissionDate,password,hf_info-opd_cm,hf_info-opd_hfa,hf_info-datetim,hf_info-team,hf_info-team_supervisor,hf_info-team_member_name,hf_info-hf_info_county,hf_info-hf_info_sub_county,hf_info-hf_name,hf_info-hf_id,hf_info-hf_type,hf_info-hf_replaced,hf_info-hf_replaced_reason,hf_info-hf_replaced_name,hf_info-data_collector,hf_info-gps_coord-Latitude,hf_info-gps_coord-Longitude,hf_info-gps_coord-Altitude,hf_info-gps_coord-Accuracy,hf_infrstrctr-hf_infrstrctr_title,hf_infrstrctr-hf_infrstrctr_elec,hf_infrstrctr-hf_infrstrctr_wtr,hf_infrstrctr-hf_infrstrctr_wgh_scal,hf_infrstrctr-hf_infrstrctr_func_thmtr,hf_infrstrctr-hf_infrstrctr_ntwrk_phne,hf_guid_chrts-hf_guid_chrts_title,hf_guid_chrts-hf_guid_chrts_guidln,hf_guid_chrts-hf_guid_chrts_imci,hf_guid_chrts-hf_guid_chrts_mal_mngt_buk,hf_guid_chrts-wall_chrt_expsd,hf_guid_chrts-hf_guid_chrts_alg_tx_chld,hf_guid_chrts-hf_guid_chrts_al_dos_schdl,hf_guid_chrts-hf_guid_chrts_mal_op_alg_adlt,hf_guid_chrts-hf_guid_chrts_mal_op_alg_adlt_chld_new,hf_guid_chrts-hf_guid_chrts_artsnt_iv_im_poster,…,stck_out_sp_tab-sp_tab_all_oct,stck_out_sp_tab-sp_tab_oct,stck_out_sp_tab-sp_tab_all_nov,stck_out_sp_tab-sp_tab_nov,stck_out_sp_tab-sp_tab_all_dec,stck_out_sp_tab-sp_tab_dec,stck_out_log_qn_tab-qn_tab_all_oct,stck_out_log_qn_tab-qn_tab_oct,stck_out_log_qn_tab-qn_tab_all_nov,stck_out_log_qn_tab-qn_tab_nov,stck_out_log_qn_tab-qn_tab_all_dec,stck_out_log_qn_tab-qn_tab_dec,stck_out_log_qn_inj-qn_inj_all_oct,stck_out_log_qn_inj-qn_inj_oct,stck_out_log_qn_inj-qn_inj_all_nov,stck_out_log_qn_inj-qn_inj_nov,stck_out_log_qn_inj-qn_inj_all_dec,stck_out_log_qn_inj-qn_inj_dec,stck_out_log_art_inj-artsn_inj_all_oct,stck_out_log_art_inj-artsn_inj_oct,stck_out_log_art_inj-artsn_inj_all_nov,stck_out_log_art_inj-artsn_inj_nov,stck_out_log_art_inj-artsn_inj_all_dec,stck_out_log_art_inj-artsn_inj_dec,end_,end_fin,meta-instanceID,meta-instanceName,KEY,SubmitterID,SubmitterName,AttachmentsPresent,AttachmentsExpected,Status,ReviewState,DeviceID,Edits
str,str,str,str,date,str,str,str,str,str,str,str,str,i64,str,str,str,f64,f64,f64,f64,str,i64,i64,i64,i64,i64,str,i64,i64,i64,str,i64,i64,i64,i64,i64,…,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,str,str,str,str,str,i64,str,i64,i64,str,str,str,i64
"""2024-04-21T16:32:49.270Z""","""HFA2024""",,,2024-04-08,"""team_3""","""Nicholas""","""Nicholas Lagat""","""kajiado""","""kajiado_west""","""olkiramatian_disp""","""3_16""","""D""",1,"""Facility inaccessible""","""Oltepesi dispensary""",,-1.55872,36.47675,991.0,3.9,,1,1,1,1,1,,1,1,1,,2,2,2,2,1,…,2,,2,,2,,2,,2,,2,,2,,2,,2,,1,,1,,1,,,,"""uuid:1e18ccd8-f4a8-48fe-9d00-d…","""team_3 olkiramatian_disp""","""uuid:1e18ccd8-f4a8-48fe-9d00-d…",260,"""Team 3 - Lower Eastern""",0,0,,,"""collect:KAniqqDVb7jD298O""",0
"""2024-04-21T16:26:55.495Z""","""HFA2024""",,,2024-04-17,"""team_3""","""Nicholas""","""Nicholas Lagat""","""nairobi""","""mathare""","""upendo_disp""","""8_06""","""D""",2,,,,-1.263232,36.858389,1607.300049,4.783,,1,1,1,1,1,,1,1,1,,2,2,2,2,2,…,2,,2,,2,,2,,2,,2,,2,,2,,2,,2,,2,,2,,,,"""uuid:23699268-3e9a-4aed-87bb-2…","""team_3 upendo_disp""","""uuid:23699268-3e9a-4aed-87bb-2…",260,"""Team 3 - Lower Eastern""",0,0,,,"""collect:KAniqqDVb7jD298O""",0
"""2024-04-18T09:28:53.259Z""","""HFA2024""",,,2024-04-16,"""team_1""","""Hassanur""","""Hassannur Adan""","""meru_1""","""tigania_east""","""charuru_disp""","""5_28""","""D""",2,,,,0.1844416,37.839385,1684.0,4.9,,1,1,1,1,1,,1,2,2,,2,1,2,1,2,…,2,,2,,2,,2,,2,,2,,2,,2,,2,,2,,2,,2,,,,"""uuid:619730e5-bff8-49ea-ab6b-e…","""team_1 charuru_disp""","""uuid:619730e5-bff8-49ea-ab6b-e…",258,"""Team 1 - North Eastern""",0,0,,,"""collect:5K3B4vfDBW4G2H1A""",0
"""2024-04-17T19:24:58.390Z""","""HFA2024""",,,2024-04-17,"""team_5""","""Fridah""","""Fridah Kaitany""","""kiambu""","""githunguri""","""miguta_cmnty_disp""","""1_04""","""D""",2,,,,-1.070658,36.830314,1773.3,4.45,,1,1,1,1,1,,1,1,2,,2,2,2,2,2,…,2,,2,,2,,2,,2,,2,,2,,2,,2,,2,,2,,2,,,,"""uuid:ea8cc8ed-847d-4698-a583-f…","""team_5 miguta_cmnty_disp""","""uuid:ea8cc8ed-847d-4698-a583-f…",262,"""Team 5 - Central""",0,0,,,"""collect:KAN28nhI9AhY9Mby""",0
"""2024-04-17T19:19:38.055Z""","""HFA2024""",,,2024-03-24,"""team_5""","""Fridah""","""Fridah Kaitany""","""nyandarua""","""kinangop""","""bamboo_hc""","""1_12""","""HC""",2,,,,-0.870094,36.569176,0.0,20.1,,1,1,1,1,1,,2,1,2,,2,2,2,2,2,…,2,,2,,2,,2,,2,,2,,2,,2,,2,,2,,2,,2,,,,"""uuid:acf0e322-6efb-4824-aa14-6…","""team_5 bamboo_hc""","""uuid:acf0e322-6efb-4824-aa14-6…",262,"""Team 5 - Central""",0,0,,,"""collect:KAN28nhI9AhY9Mby""",0


In [11]:
print(outpatient_hf)

shape: (104, 528)
┌─────────────┬──────────┬─────────────┬────────────┬───┬────────┬────────────┬────────────┬───────┐
│ SubmissionD ┆ password ┆ title_main- ┆ title_main ┆ … ┆ Status ┆ ReviewStat ┆ DeviceID   ┆ Edits │
│ ate         ┆ ---      ┆ assessment_ ┆ -form_titl ┆   ┆ ---    ┆ e          ┆ ---        ┆ ---   │
│ ---         ┆ str      ┆ op_ip       ┆ e          ┆   ┆ str    ┆ ---        ┆ str        ┆ i64   │
│ str         ┆          ┆ ---         ┆ ---        ┆   ┆        ┆ str        ┆            ┆       │
│             ┆          ┆ str         ┆ str        ┆   ┆        ┆            ┆            ┆       │
╞═════════════╪══════════╪═════════════╪════════════╪═══╪════════╪════════════╪════════════╪═══════╡
│ 2024-04-19T ┆ HFA2024  ┆ null        ┆ null       ┆ … ┆ null   ┆ null       ┆ collect:KA ┆ 0     │
│ 03:58:51.01 ┆          ┆             ┆            ┆   ┆        ┆            ┆ N28nhI9AhY ┆       │
│ 5Z          ┆          ┆             ┆            ┆   ┆        ┆       