# Optimizing Malaria Care in Kenya: An Unsupervised Learning Approach for Segmenting Health Facility Preparedness
***


## BUSINESS UNDERSTANDING
***
### Business Overview
Malaria remains a major public health challenge in many countries, significantly impacting morbidity and mortality rates, particularly among vulnerable populations. From a study in 2022, nearly 3.42 million cases of malaria were confirmed in Kenya along with some 219 deaths. The National Malaria Control Program (NMCP) is the organization committed to reducing malaria-related deaths through improved healthcare delivery, effective treatment protocols, and strengthened health facility preparedness in Kenya. This project leverages nationwide survey data to assess the quality of malaria care focusing on facility readiness, health worker competencies, and patient outcomes. By employing advanced analytics and deep learning techniques, the project aims to provide actionable insights that can drive targeted interventions and resource allocation.


### Problem Statement
The National Malaria Control Program (NMCP), the primary stakeholder in this initiative, is tasked with reducing malaria morbidity and mortality by 75% relative to 2016 levels by 2029. However, recent funding pauses from major donors such as USAID and WHO have intensified the need for targeted, cost-effective interventions. These pauses have further limited the NMCP’s capacity to expand interventions. Existing monitoring systems, which rely on high-dimensional survey data, lack the predictive precision and granularity needed to pinpoint specific deficiencies and inform targeted, cost-effective improvements. NMCP decision-makers require a precise, data-driven method to:
•	Identify which health facilities are underperforming in terms of preparedness and case management.
•	Prioritize limited resources and design interventions that address specific weaknesses.
•	Monitor performance improvements over time and adjust strategies rapidly.


### Proposed Solution
#### Methodology Overview:
The project will use a two-step modeling approach:
•	Step 1: Autoencoder Development
A neural network-based autoencoder will be constructed and trained on the multi-dimensional survey data. This model will compress the data into a latent representation that captures essential, non-linear relationships among key indicators (infrastructure, supply chain, health worker competencies). Success will be measured by a low reconstruction loss (target MSE ≤ 0.015 on normalized data).
•	Step 2: Clustering in the Latent Space
The latent features obtained from the autoencoder will serve as input to a clustering algorithm (e.g., K-Means). Clusters will be evaluated using metrics such as the silhouette score (target ≥ 0.55) and Davies-Bouldin index (target < 1.0), ensuring well-separated and compact groups.


### Main Objective
Develop an unsupervised learning pipeline that leverages a neural network–based autoencoder to learn a compact, latent representation of the survey data. Subsequently, apply a clustering algorithm (e.g., K-Means) on these latent features to identify distinct groups. This model aims to capture the complex non-linear relationships in the data and produce actionable segments for targeted interventions.


### Success Criteria
Autoencoder Reconstruction (MSE):
•	Aim for an MSE of 0.015 or lower on normalized validation data, meaning the autoencoder accurately rebuilds the input.
Clustering Quality (Silhouette Score):
•	Target an average silhouette score of 0.55 or higher (ideally around 0.60) to ensure clusters are well separated.
Cluster Stability:
•	Achieve a cluster assignment consistency (e.g., measured by Jaccard similarity) of 0.8 or higher across different runs.



### Specific Objectives
1.	Assess Facility Preparedness:
Evaluate the readiness of health facilities by integrating data on infrastructure (e.g., electricity, water, equipment availability) and medication stocks, laboratory stocks and training indicators
2.	Investigate what latent (hidden) factors underlie the observed variability in facility performance that traditional linear models might overlook?
3.	Evaluate Health Worker Competence:
Analyze survey responses on training, treatment knowledge, and experience to score health workers and identify areas where further training is needed.
4.	Analyze Patient Outcomes and Satisfaction:
Utilize exit survey data to determine patient treatment outcomes and satisfaction levels. 
5.	Identify Regional Patterns and Key Drivers:
Examine how facility preparedness and health worker performance vary by region or facility type, highlighting the main factors that influence these differences to support targeted interventions.


## DATA UNDERSTANDING
The data originates from a National Annual Quality of Care Survey conducted by the National Malaria Control Program (NMCP) in Kenya. This survey is administered annually to assess various aspects of malaria care quality across the country.
The survey collects comprehensive information from multiple perspectives, including facility preparedness, health worker knowledge, and patient experiences. 
The datasets provided include:
1.	Health Facility Questionnaires (hf1.xlsx, hf2.xlsx, hf3.xlsx):
These three files represent different sections of a comprehensive survey on health facility preparedness for malaria care. They include details on infrastructure (electricity, water, equipment), medication stocks, laboratory capacities, logistics, and adherence to treatment protocols.
Data Types:
Categorical/Binary: Many responses (e.g., yes/no for equipment functionality, presence of guidelines)
Ordinal/Rating: Some indicators are provided as ratings or levels (e.g., facility level, staff qualifications)
Continuous/Numerical: Counts (e.g., number of medication packs, patient load) and dates (e.g., last supervisory visit).
A unique facility identifier (originally noted as P_HF) appears in all three files.
2.	Health Worker Questionnaire (hw.xlsx):
This dataset contains information on individual health workers, including demographics, training records, and a knowledge assessment related to malaria treatment protocols.
Data Types:
Numerical: Knowledge assessment scores, years of experience
Categorical: Cadre, type of training received, gender, medication to be administered.
It provides context on the human factors that can influence facility performance.
3.	Exit Survey Data (exit.xlsx):
This dataset captures patient-level information such as demographics, treatment received, and satisfaction levels. It offers critical insights into patient outcomes and service quality.
Data Types:
Categorical: Patient sex, diagnosis, treatment outcome.
Numerical: Age, sometimes quantitative satisfaction ratings.



In [1]:
import pandas as pd
import numpy as np

In [2]:
outpatient_hf = pd.read_excel("fwdmalariahealthfacilityassessmentdatasubmittedasat\Outpatient-Form-1-Health-Facility-Assessment.xlsx")
outpatient_hf.head()

Unnamed: 0,SubmissionDate,password,hf_info-opd_cm,hf_info-opd_hfa,hf_info-datetim,hf_info-team,hf_info-team_supervisor,hf_info-team_member_name,hf_info-hf_info_county,hf_info-hf_info_sub_county,...,meta-instanceName,KEY,SubmitterID,SubmitterName,AttachmentsPresent,AttachmentsExpected,Status,ReviewState,DeviceID,Edits
0,2024-04-21T16:32:49.270Z,HFA2024,,,2024-04-08,team_3,Nicholas,Nicholas Lagat,kajiado,kajiado_west,...,team_3 olkiramatian_disp,uuid:1e18ccd8-f4a8-48fe-9d00-d4aa30c52b52,260,Team 3 - Lower Eastern,0,0,,,collect:KAniqqDVb7jD298O,0
1,2024-04-21T16:26:55.495Z,HFA2024,,,2024-04-17,team_3,Nicholas,Nicholas Lagat,nairobi,mathare,...,team_3 upendo_disp,uuid:23699268-3e9a-4aed-87bb-2a81e97f80bb,260,Team 3 - Lower Eastern,0,0,,,collect:KAniqqDVb7jD298O,0
2,2024-04-18T09:28:53.259Z,HFA2024,,,2024-04-16,team_1,Hassanur,Hassannur Adan,meru_1,tigania_east,...,team_1 charuru_disp,uuid:619730e5-bff8-49ea-ab6b-ecc24143ff01,258,Team 1 - North Eastern,0,0,,,collect:5K3B4vfDBW4G2H1A,0
3,2024-04-17T19:24:58.390Z,HFA2024,,,2024-04-17,team_5,Fridah,Fridah Kaitany,kiambu,githunguri,...,team_5 miguta_cmnty_disp,uuid:ea8cc8ed-847d-4698-a583-fb353a9df8b6,262,Team 5 - Central,0,0,,,collect:KAN28nhI9AhY9Mby,0
4,2024-04-17T19:19:38.055Z,HFA2024,,,2024-03-24,team_5,Fridah,Fridah Kaitany,nyandarua,kinangop,...,team_5 bamboo_hc,uuid:acf0e322-6efb-4824-aa14-6dbc84aa654b,262,Team 5 - Central,0,0,,,collect:KAN28nhI9AhY9Mby,0


In [3]:
outpatient_hf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 194 entries, 0 to 193
Columns: 374 entries, SubmissionDate to Edits
dtypes: datetime64[ns](4), float64(179), int64(132), object(59)
memory usage: 567.0+ KB


In [4]:
#list of all 344 columns in the dataset
pd.set_option("display.max_columns", None)
outpatient_hf_columns = outpatient_hf.columns.to_list()
outpatient_hf_columns

['SubmissionDate',
 'password',
 'hf_info-opd_cm',
 'hf_info-opd_hfa',
 'hf_info-datetim',
 'hf_info-team',
 'hf_info-team_supervisor',
 'hf_info-team_member_name',
 'hf_info-hf_info_county',
 'hf_info-hf_info_sub_county',
 'hf_info-hf_name',
 'hf_info-hf_id',
 'hf_info-hf_type',
 'hf_info-hf_replaced',
 'hf_info-hf_replaced_reason',
 'hf_info-hf_replaced_name',
 'hf_info-data_collector',
 'hf_info-gps_coord-Latitude',
 'hf_info-gps_coord-Longitude',
 'hf_info-gps_coord-Altitude',
 'hf_info-gps_coord-Accuracy',
 'hf_infrstrctr-hf_infrstrctr_title',
 'hf_infrstrctr-hf_infrstrctr_elec',
 'hf_infrstrctr-hf_infrstrctr_wtr',
 'hf_infrstrctr-hf_infrstrctr_wgh_scal',
 'hf_infrstrctr-hf_infrstrctr_func_thmtr',
 'hf_infrstrctr-hf_infrstrctr_ntwrk_phne',
 'hf_guid_chrts-hf_guid_chrts_title',
 'hf_guid_chrts-hf_guid_chrts_guidln',
 'hf_guid_chrts-hf_guid_chrts_imci',
 'hf_guid_chrts-hf_guid_chrts_mal_mngt_buk',
 'hf_guid_chrts-wall_chrt_expsd',
 'hf_guid_chrts-hf_guid_chrts_alg_tx_chld',
 'hf_gui

In [5]:
print("Percentage of Nulls Per Column\n")
outpatient_hf_nulls_dict = {}
for col in outpatient_hf_columns:
    print(col ,"=", a:= outpatient_hf[col].isna().sum()/outpatient_hf.shape[0]*100)
    outpatient_hf_nulls_dict[col] = a

Percentage of Nulls Per Column

SubmissionDate = 0.0
password = 0.0
hf_info-opd_cm = 100.0
hf_info-opd_hfa = 100.0
hf_info-datetim = 0.0
hf_info-team = 0.0
hf_info-team_supervisor = 14.432989690721648
hf_info-team_member_name = 0.0
hf_info-hf_info_county = 0.0
hf_info-hf_info_sub_county = 0.0
hf_info-hf_name = 0.0
hf_info-hf_id = 0.0
hf_info-hf_type = 0.0
hf_info-hf_replaced = 0.0
hf_info-hf_replaced_reason = 88.14432989690721
hf_info-hf_replaced_name = 88.14432989690721
hf_info-data_collector = 100.0
hf_info-gps_coord-Latitude = 0.0
hf_info-gps_coord-Longitude = 0.0
hf_info-gps_coord-Altitude = 0.0
hf_info-gps_coord-Accuracy = 0.0
hf_infrstrctr-hf_infrstrctr_title = 100.0
hf_infrstrctr-hf_infrstrctr_elec = 0.0
hf_infrstrctr-hf_infrstrctr_wtr = 0.0
hf_infrstrctr-hf_infrstrctr_wgh_scal = 0.0
hf_infrstrctr-hf_infrstrctr_func_thmtr = 0.0
hf_infrstrctr-hf_infrstrctr_ntwrk_phne = 0.0
hf_guid_chrts-hf_guid_chrts_title = 100.0
hf_guid_chrts-hf_guid_chrts_guidln = 0.0
hf_guid_chrts-hf_guid_chr

In [6]:
less_than_10perc_nulls = []
for key, val in outpatient_hf_nulls_dict.items():
    if val < 10:
        less_than_10perc_nulls.append(key)
print(len(less_than_10perc_nulls))
less_than_10perc_nulls

162


['SubmissionDate',
 'password',
 'hf_info-datetim',
 'hf_info-team',
 'hf_info-team_member_name',
 'hf_info-hf_info_county',
 'hf_info-hf_info_sub_county',
 'hf_info-hf_name',
 'hf_info-hf_id',
 'hf_info-hf_type',
 'hf_info-hf_replaced',
 'hf_info-gps_coord-Latitude',
 'hf_info-gps_coord-Longitude',
 'hf_info-gps_coord-Altitude',
 'hf_info-gps_coord-Accuracy',
 'hf_infrstrctr-hf_infrstrctr_elec',
 'hf_infrstrctr-hf_infrstrctr_wtr',
 'hf_infrstrctr-hf_infrstrctr_wgh_scal',
 'hf_infrstrctr-hf_infrstrctr_func_thmtr',
 'hf_infrstrctr-hf_infrstrctr_ntwrk_phne',
 'hf_guid_chrts-hf_guid_chrts_guidln',
 'hf_guid_chrts-hf_guid_chrts_imci',
 'hf_guid_chrts-hf_guid_chrts_mal_mngt_buk',
 'hf_guid_chrts-hf_guid_chrts_alg_tx_chld',
 'hf_guid_chrts-hf_guid_chrts_al_dos_schdl',
 'hf_guid_chrts-hf_guid_chrts_mal_op_alg_adlt',
 'hf_guid_chrts-hf_guid_chrts_mal_op_alg_adlt_chld_new',
 'hf_guid_chrts-hf_guid_chrts_artsnt_iv_im_poster',
 'staff_cas_mngt_trng-doc_tot_num',
 'co_cas_mngt-co_tot_num',
 'nurs_

In [7]:
new_outpatient_hf = outpatient_hf[less_than_10perc_nulls]
new_outpatient_hf.drop(['hmis_tools-kmhfl_code',
 'hmis_tools-his_-dar_incl_artsnt',
 'hmis_tools-his_-dar_excl_artsnt',
 'hmis_tools-his_-mal_commodity_form',
 'hmis_tools-his_-moh_643',
 'hmis_tools-his_-moh_204a',
 'hmis_tools-his_-moh_705a',
 'hmis_tools-his_-moh_204b',
 'hmis_tools-his_-moh_705b',
 'hmis_tools-his_-moh_240',
 'hmis_tools-his_-moh_706',
 'hmis_tools-his_-moh_405',
 'hmis_tools-his_-moh_511',
 'hmis_tools-his_-moh_711',
 'hmis_tools-his_-moh_505',
 'hmis_tools-his_tool_use-al6_blis_regis',
 'hmis_tools-his_tool_use-al6_blis_summ',
 'hmis_tools-his_tool_use-al6_blis_soh',
 'hmis_tools-al12_qnty-al12_blis_regis',
 'hmis_tools-al12_qnty-al12_blis_summ',
 'hmis_tools-al12_qnty-al12_blis_soh',
 'hmis_tools-al18_qnty-al18_blis_regis',
 'hmis_tools-al18_qnty-al18_blis_summ',
 'hmis_tools-al18_qnty-al18_blis_soh',
 'hmis_tools-al24_qnty-al24_blis_regis',
 'hmis_tools-al24_qnty-al24_blis_summ',
 'hmis_tools-al24_qnty-al24_blis_soh',
 'hmis_tools-artsnt_qnty-artesun_inj_amp_regis',
 'hmis_tools-artsnt_qnty-artesun_inj_amp_summ',
 'hmis_tools-artsnt_qnty-artesun_inj_amp_soh',
 'hmis_tools-mrdt_qnty-mrdt_regis',
 'hmis_tools-mrdt_qnty-mrdt_summ',
 'hmis_tools-mrdt_qnty-mrdt_soh',
 'hmis_tools-sp_qnty-sp_regis',
 'hmis_tools-sp_qnty-sp_regis_calc',
 'hmis_tools-sp_qnty-sp_summ',
 'hmis_tools-sp_qnty-sp_soh',
 'hmis_tools-llins_qnty-llin_regis_anc',
 'hmis_tools-llins_qnty-llin_regis_cwc',
 'hmis_tools-llins_qnty-llin_summ',
 'hmis_tools-llins_qnty-llin_summ1',
 'hmis_tools-llins_qnty-llin_summ2',
 'hmis_tools-llins_qnty-llin_soh',
 'meta-instanceID',
 'meta-instanceName',
 'KEY',
 'SubmitterID',
 'SubmitterName',
 'AttachmentsPresent',
 'AttachmentsExpected',
 'DeviceID',
 'Edits'], axis = 1, inplace=True)
new_outpatient_hf.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_outpatient_hf.drop(['hmis_tools-kmhfl_code',


Unnamed: 0,SubmissionDate,password,hf_info-datetim,hf_info-team,hf_info-team_member_name,hf_info-hf_info_county,hf_info-hf_info_sub_county,hf_info-hf_name,hf_info-hf_id,hf_info-hf_type,hf_info-hf_replaced,hf_info-gps_coord-Latitude,hf_info-gps_coord-Longitude,hf_info-gps_coord-Altitude,hf_info-gps_coord-Accuracy,hf_infrstrctr-hf_infrstrctr_elec,hf_infrstrctr-hf_infrstrctr_wtr,hf_infrstrctr-hf_infrstrctr_wgh_scal,hf_infrstrctr-hf_infrstrctr_func_thmtr,hf_infrstrctr-hf_infrstrctr_ntwrk_phne,hf_guid_chrts-hf_guid_chrts_guidln,hf_guid_chrts-hf_guid_chrts_imci,hf_guid_chrts-hf_guid_chrts_mal_mngt_buk,hf_guid_chrts-hf_guid_chrts_alg_tx_chld,hf_guid_chrts-hf_guid_chrts_al_dos_schdl,hf_guid_chrts-hf_guid_chrts_mal_op_alg_adlt,hf_guid_chrts-hf_guid_chrts_mal_op_alg_adlt_chld_new,hf_guid_chrts-hf_guid_chrts_artsnt_iv_im_poster,staff_cas_mngt_trng-doc_tot_num,co_cas_mngt-co_tot_num,nurs_cas_mngt-nurs_tot_num,nurs_cas_mngt-nurs_num_mal_cm,nurs_cas_mngt-nurs_num_mal_rdt,nurs_cas_mngt-nurs_num_imci,chew_cas_mngt-chew_tot_num,oth_cas_mngt-othrs_tot_num,staff_lab_mngt_trng-lab_techlgst_prfrm_micrscpy,staff_lab_mngt_trng-lab_tech_prfrm_micrscpy,staff_lab_mngt_trng-oth_cadre_prfrm_micrscpy,drug_dispns_trng-pharm_tech_num,drug_dispns_trng-pharm_num,drug_dispns_trng-nurse_num,drug_dispns_trng-chews_num,drug_dispns_trng-other_spec_num,drug_dispns_trng-cadr_dispns_tday,supvsn-supvsn_lst_3mnth,supvsn-qc_mal_mcrscpy,supvsn-supvsn_mrdt,supvsn-supvsn_drg_mngt,avail_mal_dx-mal_mcrscpy_rutin,mrdt_tday-stck_bin_crd_rdt,avail_al_med_tday1-phsy_count_non_exp-al_6_disp_physc_count_stor,avail_al_med_tday1-phsy_count_non_exp-al_12_disp_physc_count_stor,avail_al_med_tday1-phsy_count_non_exp-al_18_physc_count_stor,avail_al_med_tday1-phsy_count_non_exp-al_24_physc_count_stor,avail_al_med_tday1-phsy_count_non_exp-llin_physc_count_stor,avail_al_med_tday1-record_count_non_exp_bincard-stck_bin_crd_al6_disp,avail_al_med_tday1-record_count_non_exp_bincard-stck_bin_crd_al12_disp,avail_al_med_tday1-record_count_non_exp_bincard-stck_bin_crd_al18,avail_al_med_tday1-record_count_non_exp_bincard-stck_bin_crd_al24,avail_al_med_tday1-record_count_non_exp_bincard-stck_bin_crd_llin,avail_al_med_tday1-phsy_count_non_exp_dispens_area-al_6_disp_physc_count_dsp_ar,avail_al_med_tday1-phsy_count_non_exp_dispens_area-al_12_disp_physc_count_dsp_ar,avail_al_med_tday1-phsy_count_non_exp_dispens_area-al_18_physc_count_dsp_ar,avail_al_med_tday1-phsy_count_non_exp_dispens_area-al_24_physc_count_dsp_ar,avail_al_med_tday1-phsy_count_non_exp_dispens_area-al_6_disp_tot_phys,avail_al_med_tday1-phsy_count_non_exp_dispens_area-al_12_disp_tot_phys,avail_al_med_tday1-phsy_count_non_exp_dispens_area-al_18_tot_phys,avail_al_med_tday1-phsy_count_non_exp_dispens_area-al_24_tot_phys,avail_al_med_tday1-record_count_non_exp_dar-al_6_disp_recrd_count_dar,avail_al_med_tday1-record_count_non_exp_dar-al_12_disp_recrd_count_dar,avail_al_med_tday1-record_count_non_exp_dar-al_18_recrd_count_dar,avail_al_med_tday1-record_count_non_exp_dar-al_24_recrd_count_dar,avail_al_med_tday1-record_count_non_exp_dar-llins_anc_reg,avail_al_med_tday1-record_count_non_exp_dar-llins_cwc_reg,avail_al_med_tday1-avail_exp_qnty1-avail_exp_qnty_disp_al6,avail_al_med_tday1-avail_exp_qnty1-avail_exp_qnty_disp_al12,avail_al_med_tday1-avail_exp_qnty1-avail_exp_qnty_al18,avail_al_med_tday1-avail_exp_qnty1-avail_exp_qnty_al24,avail_al_med_tday1-avail_exp_qnty1-avail_exp_qnty_llins,tools_frms_avail-dar_avail,tools_frms_avail-msf_avail,oth_antmal_avail-chlrqn_tabs_nonexp,oth_antmal_avail-chlrqn_syrp_nonexp,oth_antmal_avail-chlrqn_inj_nonexp,oth_antmal_avail-sp_tabs_nonexp,oth_antmal_avail-sp_syrp_nonexp,oth_antmal_avail-amdqn_tabs_nonexp,oth_antmal_avail-amdqn_syrp_nonexp,oth_antmal_avail-quinn_tabs_nonexp,oth_antmal_avail-quinn_inj_nonexp,oth_antmal_avail-artsnt_inj_nonexp,oth_antmal_avail-oth_am_nonexp,oth_antmal_avail_exp-chlrqn_tabs_exp,oth_antmal_avail_exp-chlrqn_syrp_exp,oth_antmal_avail_exp-chlrqn_inj_exp,oth_antmal_avail_exp-sp_tabs_exp,oth_antmal_avail_exp-sp_syrp_exp,oth_antmal_avail_exp-amdqn_tabs_exp,oth_antmal_avail_exp-amdqn_syrp_exp,oth_antmal_avail_exp-quinn_tabs_exp,oth_antmal_avail_exp-quinn_inj_exp,oth_antmal_avail_exp-artsnt_inj_exp,oth_antmal_avail_exp-oth_am_exp,qnty_al_ord_recvd-al_spplr_kmsa,qnty_al_ord_recvd-al_spplr_meds,qnty_al_ord_recvd-al_spplr_oth,qnty_al_ord_recvd-date_lst_dlvry,al_pull_systm_title-al_pull_systm,al_push_systm_title-al_push_systm
0,2024-04-21T16:32:49.270Z,HFA2024,2024-04-08,team_3,Nicholas Lagat,kajiado,kajiado_west,olkiramatian_disp,3_16,D,1,-1.55872,36.47675,991.0,3.9,1,1,1,1,1,1,1,1,2,2,2,2,1,0,2,2,0.0,0.0,0.0,0,0,1,0,0,0,0,2,0,1,Nurse,1,1,1,1,1,1,10,10,10,0,400,1,1,1,1,1,0,0,0,0,10,10,10,0,2,1,3,0,14,10,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,1,2,2,2024-03-04,1,2.0
1,2024-04-21T16:26:55.495Z,HFA2024,2024-04-17,team_3,Nicholas Lagat,nairobi,mathare,upendo_disp,8_06,D,2,-1.263232,36.858389,1607.300049,4.783,1,1,1,1,1,1,1,1,2,2,2,2,2,0,0,1,1.0,0.0,1.0,1,0,0,0,0,0,0,1,0,0,Nurse,2,2,2,2,2,1,0,0,0,5,0,2,2,2,1,2,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,1,2024-02-16,1,2.0
2,2024-04-18T09:28:53.259Z,HFA2024,2024-04-16,team_1,Hassannur Adan,meru_1,tigania_east,charuru_disp,5_28,D,2,0.184442,37.839385,1684.0,4.9,1,1,1,1,1,1,2,2,2,1,2,1,2,0,0,1,1.0,1.0,1.0,0,0,0,0,0,0,0,1,0,0,Nurse,1,2,2,1,2,2,0,0,0,0,33,2,2,2,2,1,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,2019-01-28,1,2.0
3,2024-04-17T19:24:58.390Z,HFA2024,2024-04-17,team_5,Fridah Kaitany,kiambu,githunguri,miguta_cmnty_disp,1_04,D,2,-1.070658,36.830314,1773.3,4.45,1,1,1,1,1,1,1,2,2,2,2,2,2,0,0,2,0.0,0.0,1.0,5,1,1,0,0,1,0,2,0,0,Nurse,1,1,1,1,1,1,3,0,0,3,0,1,2,2,1,2,0,0,0,0,3,0,0,3,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,1,2023-12-11,1,2.0
4,2024-04-17T19:19:38.055Z,HFA2024,2024-03-24,team_5,Fridah Kaitany,nyandarua,kinangop,bamboo_hc,1_12,HC,2,-0.870094,36.569176,0.0,20.1,1,1,1,1,1,2,1,2,2,2,2,2,2,0,1,10,0.0,0.0,0.0,1,11,2,0,0,1,0,1,6,1,Nurse\nSupport staff,1,1,2,1,1,2,0,0,0,2,0,2,2,2,1,2,0,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,1,2023-04-06,1,2.0


In [8]:
new_outpatient_hf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 194 entries, 0 to 193
Columns: 110 entries, SubmissionDate to al_push_systm_title-al_push_systm
dtypes: datetime64[ns](2), float64(8), int64(90), object(10)
memory usage: 166.8+ KB


In [9]:
for col in new_outpatient_hf.columns:
    b = new_outpatient_hf[col].isna().sum()/new_outpatient_hf.shape[0]*100
    if b > 0:
        print(col, "=", b)
   

nurs_cas_mngt-nurs_num_mal_cm = 1.5463917525773196
nurs_cas_mngt-nurs_num_mal_rdt = 1.5463917525773196
nurs_cas_mngt-nurs_num_imci = 1.5463917525773196
drug_dispns_trng-cadr_dispns_tday = 0.5154639175257731
qnty_al_ord_recvd-date_lst_dlvry = 6.185567010309279
al_push_systm_title-al_push_systm = 0.5154639175257731


In [10]:
new_outpatient_hf["qnty_al_ord_recvd-date_lst_dlvry"].unique()

<DatetimeArray>
['2024-03-04 00:00:00', '2024-02-16 00:00:00', '2019-01-28 00:00:00',
 '2023-12-11 00:00:00', '2023-04-06 00:00:00', '2023-04-30 00:00:00',
 '2000-01-01 00:00:00', '2024-04-03 00:00:00', '2023-05-09 00:00:00',
 '2024-02-26 00:00:00',
 ...
 '2024-01-08 00:00:00', '2023-12-12 00:00:00', '2024-02-13 00:00:00',
 '2024-02-12 00:00:00', '2023-12-13 00:00:00', '2023-10-13 00:00:00',
 '2023-11-19 00:00:00', '2024-01-10 00:00:00', '2020-03-09 00:00:00',
 '2024-01-13 00:00:00']
Length: 131, dtype: datetime64[ns]

In [11]:
new_outpatient_hf["qnty_al_ord_recvd-date_lst_dlvry"].fillna('0000-00-00 00:00:00', inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  new_outpatient_hf["qnty_al_ord_recvd-date_lst_dlvry"].fillna('0000-00-00 00:00:00', inplace=True)
  new_outpatient_hf["qnty_al_ord_recvd-date_lst_dlvry"].fillna('0000-00-00 00:00:00', inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_outpatient_hf["qnty_al_ord_recvd-date_lst_dlvry"].fillna('0000-00-00 00:00:00', inplace=True)


In [12]:
new_outpatient_hf["nurs_cas_mngt-nurs_num_imci"].value_counts()

nurs_cas_mngt-nurs_num_imci
0.0     94
1.0     51
2.0     24
3.0      8
5.0      5
4.0      3
8.0      2
6.0      1
9.0      1
16.0     1
10.0     1
Name: count, dtype: int64

In [13]:
new_outpatient_hf["nurs_cas_mngt-nurs_num_mal_cm"].value_counts()

nurs_cas_mngt-nurs_num_mal_cm
0.0     112
1.0      33
2.0      24
5.0       6
3.0       5
6.0       3
4.0       3
9.0       2
10.0      2
15.0      1
Name: count, dtype: int64

In [14]:
new_outpatient_hf["nurs_cas_mngt-nurs_num_mal_rdt"].value_counts()

nurs_cas_mngt-nurs_num_mal_rdt
0.0     107
1.0      36
2.0      25
3.0       8
5.0       7
4.0       3
6.0       1
9.0       1
7.0       1
10.0      1
13.0      1
Name: count, dtype: int64

In [15]:
new_outpatient_hf["drug_dispns_trng-cadr_dispns_tday"].value_counts()

drug_dispns_trng-cadr_dispns_tday
Nurse                                100
CHEW                                   6
Pharmacy technologist                  6
0                                      6
Pharmacist                             5
NURSE                                  5
Clinical officer                       5
Pharmacist                             4
Pharmacy Technologist                  3
Clinical Officer                       3
Nurses                                 3
Pharm tech                             3
Pharmaceutical Technologist            2
PHARMACY TECHNOLOGIST                  2
Pharm technologist                     2
Pharmaceutical technologist            2
1                                      2
Pharmacy technician                    2
nurses                                 1
Nurse                                  1
Nurse Aid                              1
Pharmacist and pharm technologist      1
Medical officer\nClinician             1
Nurse , CHEW           

In [16]:
new_outpatient_hf["al_push_systm_title-al_push_systm"].value_counts()

al_push_systm_title-al_push_systm
2.0    166
1.0     27
Name: count, dtype: int64