# Predicting Family Planning Demand and Optimizing Service Delivery in Kenya

## Project Overview
This project aims to enhance family planning (FP) service delivery efficiency by leveraging data-driven approaches, specifically machine learning (ML) using the CRISP-DM methodology. The goal is to improve maternal and child health outcomes, reduce unmet need for family planning, and contribute to achieving national and global development goals related to reproductive health.

## Problem Statement
Kenya faces significant strides in increasing access to family planning services, yet a substantial unmet need for family planning remains. According to the 2022 Kenya Demographic and Health Survey (KDHS), the total unmet need for family planning is 15%, with 10% for spacing and 5% for limiting births. This indicates that a significant portion of the population desires to space or limit births but is not using any contraceptive method. Traditional methods often show imbalances, with a heavy reliance on short-acting methods, leading to unstable uptake of long-acting reversible contraceptives (LARCs) and permanent methods. This disparity can lead to higher discontinuation rates and continued unmet need. Supply chain inefficiencies, commodity stock-outs, inadequate healthcare worker training, and uneven distribution of resources exacerbate these issues, hindering effective service delivery.

## Stakeholders
* **Government of Kenya:** Ministry of Health (MoH), especially the National Family Planning Coordinated Implementation Plan (NFPICIP) and Kenya Health Information System (KHIS) initiatives.
* **Policymakers and Donors:** Need evidence-based advocacy for resource mobilization and investment.
* **Healthcare Providers:** Frontline health workers providing FP services.
* **Women of Reproductive Age:** Direct beneficiaries of improved FP services.
* **Local Communities:** Impacted by and involved in FP service delivery.

## Key Statistics
* **Total Unmet Need for Family Planning (2022 KDHS):** 15%
    * **Unmet Need for Spacing:** 10%
    * **Unmet Need for Limiting Births:** 5%
* **Married Women Aged 15-49 Rising to 57% in 2022 (DRS 2022):** This likely refers to modern contraceptive prevalence rate (mCPR).
* **Number of new clients for FP method band and the continuation rate:** Key data points for predicting future demand.

## Key Analytics Questions
* How many new clients are expected for injectables in a County next quarter?
* What is the projected demand for different family planning methods (injectables, pills, condoms, implants, IUD, sterilization) at various geographical (e.g., county) and temporal (e.g., quarterly, annual) granularities?
* How can we optimize resource allocation (commodities, equipment, staffing) to meet projected demand and minimize wastage?
* How can predictive analytics identify potential stock-outs or oversupply of specific FP commodities in different locations?
* Which regions or demographics are most underserved in terms of family planning access and uptake?

## Objectives
* **Quantitatively forecast the demand for specific family planning methods:** This includes predicting the continuation rates of users for each method at defined geographical and temporal scales.
* **Enable proactive resource allocation:** This involves optimizing the distribution of commodities, equipment, and staffing to reduce stock-outs, minimize wastage, and improve targeted interventions.
* **Improve method continuation:** By understanding demand and improving service delivery, the project aims to reduce discontinuation rates and increase sustained use of FP methods.
* **Provide evidence-based insights:** Support policymakers and donors in making informed decisions regarding resource mobilization and investment in family planning.

## Metrics of Importance to Focus On
* **Accuracy of Demand Forecasts:** Measured by comparing predicted demand with actual uptake for various FP methods at different geographical and temporal levels (e.g., Mean Absolute Error, Root Mean Squared Error).
* **Commodity Stock-out Rates:** Reduction in the number or duration of stock-outs for essential family planning commodities.
* **Resource Utilization Efficiency:** Metrics related to optimal allocation and reduced wastage of commodities, equipment, and human resources.
* **Method Continuation Rates:** Increase in the percentage of users who continue using a specific family planning method over a defined period (e.g., 12-month continuation rate).
* **Unmet Need for Family Planning:** Contribution to the reduction of the national unmet need for family planning.
* **Client Satisfaction:** Indirectly improved through better access and availability of preferred methods.
* **Healthcare Worker Productivity:** Optimized allocation of staff to meet demand efficiently.

### Data Loading

In [77]:
#Import necessary libraries for data manipulation, visualization, and machine learning
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# Cutting across (general data manipulation, visualization, and utility)
from sklearn.model_selection import train_test_split, cross_val_score, TimeSeriesSplit
from sklearn.preprocessing import StandardScaler, MinMaxScaler, OneHotEncoder
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer


# Classification Model Libraries (common and versatile)
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier


# Regression Model Libraries (common and versatile)
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor


In [78]:
# Data loading
# Assuming the 'data' folder is in the same directory as the script
try:
    population_data_path = "data/ke_fp_population_data.csv"
    service_data_path = "data/ke_fp_service_data.csv"
    benchmarks_data_path = "data/ke_fp_benchmarks_data.csv"
    commodity_data_path = "data/ke_fp_commodity_data.csv"


    # Attempt to read with 'latin1' encoding, a common alternative for non-UTF-8 files
    df_population = pd.read_csv(population_data_path, encoding='latin1')
    df_service = pd.read_csv(service_data_path, encoding='latin1')
    df_benchmarks = pd.read_csv(benchmarks_data_path, encoding='latin1')
    df_commodity = pd.read_csv(commodity_data_path, encoding='latin1')

    print("Datasets loaded successfully:")
    print(f"ke_fp_population_data.csv shape: {df_population.shape}")
    print(f"ke_fp_service_data.csv shape: {df_service.shape}")
    print(f"ke_fp_benchmarks_data.csv shape: {df_benchmarks.shape}")
    print(f"ke_fp_commodity_data.csv shape: {df_commodity.shape}")

except FileNotFoundError as e:
    print(f"Error: One or both of the CSV files were not found.")
    print(f"Please ensure 'ke_fp_population_data.csv' and 'ke_fp_service_data.csv' are in a folder named 'data' in the same directory as this script.")
    print(e)
except Exception as e:
    print(f"An unexpected error occurred while loading the datasets: {e}")




Datasets loaded successfully:
ke_fp_population_data.csv shape: (47, 38)
ke_fp_service_data.csv shape: (6204, 60)
ke_fp_benchmarks_data.csv shape: (48, 18)
ke_fp_commodity_data.csv shape: (2480, 74)


### Data Cleaning

This involved;
* Standardization of the column names
* Renaming the columns
* Dropping empty and unwanted columns
* Handling missing values, duplicates and outliers

1. ke_fp_service_data.csv

In [79]:
# Make a copy of the data
df_service1=df_service.copy()

In [80]:
# Preview the data
df_service1.head()

Unnamed: 0,periodid,periodname,periodcode,perioddescription,orgunitlevel1,orgunitlevel2,organisationunitid,organisationunitname,organisationunitcode,organisationunitdescription,...,MOH 711 Rev 2020_Post parturm FP 4weeks to 6weeks Re-visits,MOH 711 Rev 2020_Post parturm FP within 48 Hours New clients,MOH 711 Rev 2020_Post parturm FP within 48 Hours Re-visits,MOH 711 Rev 2020_Voluntary Surgical Contraception Vasectomy Ist Time Insertion,MOH 711 Rev 2020_Voluntary Surgical Contraception Vasectomy Re-insertion,MOH 711 Rev 2020_Voluntary surgical contraception BTL Ist Time Insertion,MOH 711 Rev 2020_Voluntary surgical contraception BTL Re-insertion,Population Growth Rate,Total Population,Women of childbearing age (15â49yrs)
0,201404,Apr-14,201404,,Kenya,Turkana County,kphDeKClFch,Turkana County,KE_County_23,,...,,,,,,,,3.16,999367.0,225176.0
1,201404,Apr-14,201404,,Kenya,Nandi County,t0J75eHKxz5,Nandi County,KE_County_29,,...,,,,,,,,3.02,938866.0,223505.0
2,201404,Apr-14,201404,,Kenya,West Pokot County,XWALbfAPa6n,West Pokot County,KE_County_24,,...,,,,,,,,3.1,597313.0,144549.0
3,201404,Apr-14,201404,,Kenya,Bomet County,HMNARUV2CW4,Bomet County,KE_County_36,,...,,,,,,,,1568.96,834381.7,237820.0
4,201404,Apr-14,201404,,Kenya,Nairobi County,jkG3zaihdSs,Nairobi County,KE_County_47,,...,,,,,,,,4.02,3894186.0,981191.0


In [81]:
# Explore the data
df_service1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6204 entries, 0 to 6203
Data columns (total 60 columns):
 #   Column                                                                          Non-Null Count  Dtype  
---  ------                                                                          --------------  -----  
 0   periodid                                                                        6204 non-null   int64  
 1   periodname                                                                      6204 non-null   object 
 2   periodcode                                                                      6204 non-null   int64  
 3   perioddescription                                                               0 non-null      float64
 4   orgunitlevel1                                                                   6204 non-null   object 
 5   orgunitlevel2                                                                   6204 non-null   object 
 6   organisationunit

In [82]:
# Standardize the column names

def standardize_col_labels(df):
    def clean_column(col):
        # Remove redundant prefixes
        col = col.replace('MOH 711 Rev ', '')
        col = col.replace('MOH 711 ', '')
        
        # Formatting
        col = col.strip().lower()          # Convert to lowercase
        col = col.replace(' ', '_')     # Replace spaces with underscores
        col = col.replace('-', '_')  # Replace hyphen with underscores
        col = col.replace('â€“', '_')    # en-dash
        return col

    df.columns = [clean_column(col) for col in df.columns]
    return df

df_service1 = standardize_col_labels(df_service1)
df_service1.columns

Index(['periodid', 'periodname', 'periodcode', 'perioddescription',
       'orgunitlevel1', 'orgunitlevel2', 'organisationunitid',
       'organisationunitname', 'organisationunitcode',
       'organisationunitdescription', 'estimated_number_of_pregnant_women',
       'fp_attendance_new_clients', 'fp_attendance_re_visits',
       'adolescent_10_14_yrs_receiving_fp_services_new_clients',
       'adolescent_10_14_yrs_receiving_fp_services_re_visits',
       'adolescent_15_19_yrs_receiving_fp_services_new_clients',
       'adolescent_15_19_yrs_receiving_fp_services_re_visits',
       'adolescent_20_24_yrs_receiving_fp_services_new_clients',
       'adolescent_20_24_yrs_receiving_fp_services_re_visits',
       'client_receiving_male_condoms_new_clients',
       'client_receiving_male_condoms_re_visits',
       'clients_counselled_natural_family_planning_new_clients',
       'clients_counselled_natural_family_planning_re_visits',
       'clients_receiving_female_condoms_new_clients',
      

In [83]:
# Rename column names

name_map = {
    'periodcode': 'quarter',
    'orgunitlevel1': 'country',
    'orgunitlevel2': 'county',
    'organisationunitid': 'uid',
    'organisationunitcode':'uidcode',
    'county_cou':'uidcode'
}
df_service1 = df_service1.rename(columns=name_map)
df_service1

Unnamed: 0,periodid,periodname,quarter,perioddescription,country,county,uid,organisationunitname,uidcode,organisationunitdescription,...,2020_post_parturm_fp_4weeks_to_6weeks_re_visits,2020_post_parturm_fp_within_48_hours_new_clients,2020_post_parturm_fp_within_48_hours_re_visits,2020_voluntary_surgical_contraception_vasectomy_ist_time_insertion,2020_voluntary_surgical_contraception_vasectomy_re_insertion,2020_voluntary_surgical_contraception_btl_ist_time_insertion,2020_voluntary_surgical_contraception_btl_re_insertion,population_growth_rate,total_population,women_of_childbearing_age_(15â49yrs)
0,201404,Apr-14,201404,,Kenya,Turkana County,kphDeKClFch,Turkana County,KE_County_23,,...,,,,,,,,3.16,999367.00,225176.00
1,201404,Apr-14,201404,,Kenya,Nandi County,t0J75eHKxz5,Nandi County,KE_County_29,,...,,,,,,,,3.02,938866.00,223505.00
2,201404,Apr-14,201404,,Kenya,West Pokot County,XWALbfAPa6n,West Pokot County,KE_County_24,,...,,,,,,,,3.10,597313.00,144549.00
3,201404,Apr-14,201404,,Kenya,Bomet County,HMNARUV2CW4,Bomet County,KE_County_36,,...,,,,,,,,1568.96,834381.70,237820.00
4,201404,Apr-14,201404,,Kenya,Nairobi County,jkG3zaihdSs,Nairobi County,KE_County_47,,...,,,,,,,,4.02,3894186.00,981191.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6199,202409,Sep-24,202409,,Kenya,Isiolo County,bzOfj0iwfDH,Isiolo County,KE_County_11,,...,53.0,17.0,,,,1.0,,2.32,296179.23,67863.42
6200,202409,Sep-24,202409,,Kenya,Trans Nzoia County,mThvosEflAU,Trans Nzoia County,KE_County_26,,...,3.0,75.0,54.0,,,1.0,,2.94,1153812.49,277655.62
6201,202409,Sep-24,202409,,Kenya,Nakuru County,ob6SxuRcqU4,Nakuru County,KE_County_32,,...,41.0,191.0,34.0,3.0,,15.0,,2.90,2480137.33,651943.27
6202,202409,Sep-24,202409,,Kenya,Tharaka Nithi County,T4urHM47nlm,Tharaka Nithi County,KE_County_13,,...,,,,,,,,2.67,451552.75,113879.15


In [84]:
# Drop columns where all values are null 
df_service1=df_service1.dropna(axis=1, how='all')
df_service1

Unnamed: 0,periodid,periodname,quarter,country,county,uid,organisationunitname,uidcode,estimated_number_of_pregnant_women,fp_attendance_new_clients,...,2020_post_parturm_fp_4weeks_to_6weeks_re_visits,2020_post_parturm_fp_within_48_hours_new_clients,2020_post_parturm_fp_within_48_hours_re_visits,2020_voluntary_surgical_contraception_vasectomy_ist_time_insertion,2020_voluntary_surgical_contraception_vasectomy_re_insertion,2020_voluntary_surgical_contraception_btl_ist_time_insertion,2020_voluntary_surgical_contraception_btl_re_insertion,population_growth_rate,total_population,women_of_childbearing_age_(15â49yrs)
0,201404,Apr-14,201404,Kenya,Turkana County,kphDeKClFch,Turkana County,KE_County_23,24517.00,440.0,...,,,,,,,,3.16,999367.00,225176.00
1,201404,Apr-14,201404,Kenya,Nandi County,t0J75eHKxz5,Nandi County,KE_County_29,43784.00,3572.0,...,,,,,,,,3.02,938866.00,223505.00
2,201404,Apr-14,201404,Kenya,West Pokot County,XWALbfAPa6n,West Pokot County,KE_County_24,21198.00,745.0,...,,,,,,,,3.10,597313.00,144549.00
3,201404,Apr-14,201404,Kenya,Bomet County,HMNARUV2CW4,Bomet County,KE_County_36,39762.00,2521.0,...,,,,,,,,1568.96,834381.70,237820.00
4,201404,Apr-14,201404,Kenya,Nairobi County,jkG3zaihdSs,Nairobi County,KE_County_47,158875.00,11356.0,...,,,,,,,,4.02,3894186.00,981191.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6199,202409,Sep-24,202409,Kenya,Isiolo County,bzOfj0iwfDH,Isiolo County,KE_County_11,7422.28,240.0,...,53.0,17.0,,,,1.0,,2.32,296179.23,67863.42
6200,202409,Sep-24,202409,Kenya,Trans Nzoia County,mThvosEflAU,Trans Nzoia County,KE_County_26,35675.47,2805.0,...,3.0,75.0,54.0,,,1.0,,2.94,1153812.49,277655.62
6201,202409,Sep-24,202409,Kenya,Nakuru County,ob6SxuRcqU4,Nakuru County,KE_County_32,78676.96,10801.0,...,41.0,191.0,34.0,3.0,,15.0,,2.90,2480137.33,651943.27
6202,202409,Sep-24,202409,Kenya,Tharaka Nithi County,T4urHM47nlm,Tharaka Nithi County,KE_County_13,13004.53,1061.0,...,,,,,,,,2.67,451552.75,113879.15


In [85]:
# Drop unwanted columns
df_service1=df_service1.drop(columns=['periodid','organisationunitname', 'periodname','population_growth_rate',
                                       'total_population','women_of_childbearing_age_(15â49yrs)'
                                       ], axis=1)


In [86]:
# Check for missing values
df_service1.isna().sum().sort_values(ascending=False)

2020_voluntary_surgical_contraception_vasectomy_re_insertion          6199
2020_voluntary_surgical_contraception_btl_re_insertion                6172
2020_clients_given_cycle_beads_re_visits                              6120
2020_voluntary_surgical_contraception_vasectomy_ist_time_insertion    5904
2020_clients_receiving_post_abortion_fp_re_visits                     5779
2020_clients_given_cycle_beads_new_clients                            5016
2020_post_parturm_fp_within_48_hours_re_visits                        5005
2020_post_parturm_fp_4weeks_to_6weeks_re_visits                       4776
2020_iucd_insertion_hormonal_re_insertion                             4643
2020_voluntary_surgical_contraception_btl_ist_time_insertion          4619
2020_clients_receiving_post_abortion_fp_new_clients                   4197
2020_iucd_insertion_hormonal_ist_time_insertion                       4189
2020_iucd_insertion_non_hormonal_re_insertion                         4110
2020_fp_injections_dmpa__

Missing values were interpreted as 'no service was provided or dataset missing for the organization unit' and filled with 0

In [87]:
# Fill the missing values with zeros
df_service1 = df_service1.fillna(0)

In [88]:
# Check for duplicates
df_service1.duplicated().sum()

0

2. ke_fp_population_data

In [89]:
# Make a copy of the data
df_population1 =df_population.copy()

In [90]:
# Explore the data
df_population1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47 entries, 0 to 46
Data columns (total 38 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   orgunitlevel1                              47 non-null     object 
 1   orgunitlevel2                              47 non-null     object 
 2   organisationunitid                         47 non-null     object 
 3   organisationunitname                       47 non-null     object 
 4   organisationunitcode                       47 non-null     object 
 5   organisationunitdescription                0 non-null      float64
 6   Estimated Number of Pregnant Women 2020    47 non-null     int64  
 7   Estimated Number of Pregnant Women 2018    46 non-null     float64
 8   Estimated Number of Pregnant Women 2024    47 non-null     int64  
 9   Estimated Number of Pregnant Women 2021    47 non-null     int64  
 10  Estimated Number of Pregnant

In [91]:
# Standardize column names
df_population1 = standardize_col_labels(df_population1)

# Preview the data
df_population1.head()

Unnamed: 0,orgunitlevel1,orgunitlevel2,organisationunitid,organisationunitname,organisationunitcode,organisationunitdescription,estimated_number_of_pregnant_women_2020,estimated_number_of_pregnant_women_2018,estimated_number_of_pregnant_women_2024,estimated_number_of_pregnant_women_2021,...,total_population_2020,total_population_2018,women_of_childbearing_age_(1549yrs)_2019,women_of_childbearing_age_(1549yrs)_2022,women_of_childbearing_age_(1549yrs)_2025,women_of_childbearing_age_(1549yrs)_2020,women_of_childbearing_age_(1549yrs)_2018,women_of_childbearing_age_(1549yrs)_2024,women_of_childbearing_age_(1549yrs)_2021,women_of_childbearing_age_(1549yrs)_2023
0,Kenya,Baringo County,vvOK1BxTbet,Baringo County,KE_County_30,,22791,28403.0,25859,23508,...,689090,754013,188198,166778,367820.0,156691.0,175023.0,177596,161588,172072
1,Kenya,Bomet County,HMNARUV2CW4,Bomet County,KE_County_36,,27782,35596.0,31224,28589,...,902047,963723,229073,236968,240219.0,223603.0,229615.0,251118,230087,244215
2,Kenya,Bungoma County,KGHhQ5GLd4k,Bungoma County,KE_County_39,,59820,65489.0,68382,61766,...,1731240,1637392,397598,435684,481027.0,408267.0,394095.0,454863,421554,450225
3,Kenya,Busia County,Tvf1zgVZ0K4,Busia County,KE_County_40,,28685,22589.0,32280,29517,...,922352,869978,201282,237629,259940.0,224145.0,198285.0,252295,230672,244885
4,Kenya,Elgeyo Marakwet County,MqnLxQBigG0,Elgeyo Marakwet County,KE_County_28,,16561,19301.0,18939,17111,...,470812,502411,123822,118538,131140.0,110930.0,120579.0,126811,114604,122616


In [92]:
# Rename column names
df_population1 = df_population1.rename(columns=name_map)
df_population1

Unnamed: 0,country,county,uid,organisationunitname,uidcode,organisationunitdescription,estimated_number_of_pregnant_women_2020,estimated_number_of_pregnant_women_2018,estimated_number_of_pregnant_women_2024,estimated_number_of_pregnant_women_2021,...,total_population_2020,total_population_2018,women_of_childbearing_age_(1549yrs)_2019,women_of_childbearing_age_(1549yrs)_2022,women_of_childbearing_age_(1549yrs)_2025,women_of_childbearing_age_(1549yrs)_2020,women_of_childbearing_age_(1549yrs)_2018,women_of_childbearing_age_(1549yrs)_2024,women_of_childbearing_age_(1549yrs)_2021,women_of_childbearing_age_(1549yrs)_2023
0,Kenya,Baringo County,vvOK1BxTbet,Baringo County,KE_County_30,,22791,28403.0,25859,23508,...,689090,754013,188198,166778,367820.0,156691.0,175023.0,177596,161588,172072
1,Kenya,Bomet County,HMNARUV2CW4,Bomet County,KE_County_36,,27782,35596.0,31224,28589,...,902047,963723,229073,236968,240219.0,223603.0,229615.0,251118,230087,244215
2,Kenya,Bungoma County,KGHhQ5GLd4k,Bungoma County,KE_County_39,,59820,65489.0,68382,61766,...,1731240,1637392,397598,435684,481027.0,408267.0,394095.0,454863,421554,450225
3,Kenya,Busia County,Tvf1zgVZ0K4,Busia County,KE_County_40,,28685,22589.0,32280,29517,...,922352,869978,201282,237629,259940.0,224145.0,198285.0,252295,230672,244885
4,Kenya,Elgeyo Marakwet County,MqnLxQBigG0,Elgeyo Marakwet County,KE_County_28,,16561,19301.0,18939,17111,...,470812,502411,123822,118538,131140.0,110930.0,120579.0,126811,114604,122616
5,Kenya,Embu County,PFu8alU2KWG,Embu County,KE_County_14,,15561,15013.0,17131,15929,...,622993,571413,147630,167305,179861.0,159579.0,143763.0,175593,163334,171428
6,Kenya,Garissa County,uyOrcHZBpW0,Garissa County,KE_County_7,,43114,22261.0,46476,44334,...,1216738,725589,170865,281124,328852.0,273913.0,168130.0,309841,281989,299366
7,Kenya,Homa Bay County,nK0A12Q7MvS,Homa Bay County,KE_County_43,,39907,47462.0,45521,41191,...,1167024,1176010,293099,296631,326624.0,278143.0,289424.0,316159,287099,305970
8,Kenya,Isiolo County,bzOfj0iwfDH,Isiolo County,KE_County_11,,24448,7168.0,7402,7923,...,267641,183045,51341,64295,33080.0,69253.0,43931.0,67678,64544,65984
9,Kenya,Kajiado County,Hsk1YV8kHkT,Kajiado County,KE_County_34,,38371,35467.0,43539,39573,...,1152398,933224,231990,332902,366318.0,312674.0,224004.0,354857,322477,343753


In [93]:
# Drop columns where all values are null 
df_population1=df_population1.dropna(axis=1, how='all')
df_population1

Unnamed: 0,country,county,uid,organisationunitname,uidcode,estimated_number_of_pregnant_women_2020,estimated_number_of_pregnant_women_2018,estimated_number_of_pregnant_women_2024,estimated_number_of_pregnant_women_2021,estimated_number_of_pregnant_women_2023,...,total_population_2020,total_population_2018,women_of_childbearing_age_(1549yrs)_2019,women_of_childbearing_age_(1549yrs)_2022,women_of_childbearing_age_(1549yrs)_2025,women_of_childbearing_age_(1549yrs)_2020,women_of_childbearing_age_(1549yrs)_2018,women_of_childbearing_age_(1549yrs)_2024,women_of_childbearing_age_(1549yrs)_2021,women_of_childbearing_age_(1549yrs)_2023
0,Kenya,Baringo County,vvOK1BxTbet,Baringo County,KE_County_30,22791,28403.0,25859,23508,25054,...,689090,754013,188198,166778,367820.0,156691.0,175023.0,177596,161588,172072
1,Kenya,Bomet County,HMNARUV2CW4,Bomet County,KE_County_36,27782,35596.0,31224,28589,30324,...,902047,963723,229073,236968,240219.0,223603.0,229615.0,251118,230087,244215
2,Kenya,Bungoma County,KGHhQ5GLd4k,Bungoma County,KE_County_39,59820,65489.0,68382,61766,66192,...,1731240,1637392,397598,435684,481027.0,408267.0,394095.0,454863,421554,450225
3,Kenya,Busia County,Tvf1zgVZ0K4,Busia County,KE_County_40,28685,22589.0,32280,29517,31336,...,922352,869978,201282,237629,259940.0,224145.0,198285.0,252295,230672,244885
4,Kenya,Elgeyo Marakwet County,MqnLxQBigG0,Elgeyo Marakwet County,KE_County_28,16561,19301.0,18939,17111,18312,...,470812,502411,123822,118538,131140.0,110930.0,120579.0,126811,114604,122616
5,Kenya,Embu County,PFu8alU2KWG,Embu County,KE_County_14,15561,15013.0,17131,15929,16723,...,622993,571413,147630,167305,179861.0,159579.0,143763.0,175593,163334,171428
6,Kenya,Garissa County,uyOrcHZBpW0,Garissa County,KE_County_7,43114,22261.0,46476,44334,47112,...,1216738,725589,170865,281124,328852.0,273913.0,168130.0,309841,281989,299366
7,Kenya,Homa Bay County,nK0A12Q7MvS,Homa Bay County,KE_County_43,39907,47462.0,45521,41191,44060,...,1167024,1176010,293099,296631,326624.0,278143.0,289424.0,316159,287099,305970
8,Kenya,Isiolo County,bzOfj0iwfDH,Isiolo County,KE_County_11,24448,7168.0,7402,7923,7215,...,267641,183045,51341,64295,33080.0,69253.0,43931.0,67678,64544,65984
9,Kenya,Kajiado County,Hsk1YV8kHkT,Kajiado County,KE_County_34,38371,35467.0,43539,39573,42177,...,1152398,933224,231990,332902,366318.0,312674.0,224004.0,354857,322477,343753


In [94]:
# Drop unwanted columns
df_population1.drop(columns=['organisationunitname'], axis=1) # This was dropped because it is the same as county

Unnamed: 0,country,county,uid,uidcode,estimated_number_of_pregnant_women_2020,estimated_number_of_pregnant_women_2018,estimated_number_of_pregnant_women_2024,estimated_number_of_pregnant_women_2021,estimated_number_of_pregnant_women_2023,estimated_number_of_pregnant_women_2019,...,total_population_2020,total_population_2018,women_of_childbearing_age_(1549yrs)_2019,women_of_childbearing_age_(1549yrs)_2022,women_of_childbearing_age_(1549yrs)_2025,women_of_childbearing_age_(1549yrs)_2020,women_of_childbearing_age_(1549yrs)_2018,women_of_childbearing_age_(1549yrs)_2024,women_of_childbearing_age_(1549yrs)_2021,women_of_childbearing_age_(1549yrs)_2023
0,Kenya,Baringo County,vvOK1BxTbet,KE_County_30,22791,28403.0,25859,23508,25054,29126,...,689090,754013,188198,166778,367820.0,156691.0,175023.0,177596,161588,172072
1,Kenya,Bomet County,HMNARUV2CW4,KE_County_36,27782,35596.0,31224,28589,30324,31522,...,902047,963723,229073,236968,240219.0,223603.0,229615.0,251118,230087,244215
2,Kenya,Bungoma County,KGHhQ5GLd4k,KE_County_39,59820,65489.0,68382,61766,66192,58642,...,1731240,1637392,397598,435684,481027.0,408267.0,394095.0,454863,421554,450225
3,Kenya,Busia County,Tvf1zgVZ0K4,KE_County_40,28685,22589.0,32280,29517,31336,33360,...,922352,869978,201282,237629,259940.0,224145.0,198285.0,252295,230672,244885
4,Kenya,Elgeyo Marakwet County,MqnLxQBigG0,KE_County_28,16561,19301.0,18939,17111,18312,19812,...,470812,502411,123822,118538,131140.0,110930.0,120579.0,126811,114604,122616
5,Kenya,Embu County,PFu8alU2KWG,KE_County_14,15561,15013.0,17131,15929,16723,15416,...,622993,571413,147630,167305,179861.0,159579.0,143763.0,175593,163334,171428
6,Kenya,Garissa County,uyOrcHZBpW0,KE_County_7,43114,22261.0,46476,44334,47112,27640,...,1216738,725589,170865,281124,328852.0,273913.0,168130.0,309841,281989,299366
7,Kenya,Homa Bay County,nK0A12Q7MvS,KE_County_43,39907,47462.0,45521,41191,44060,48214,...,1167024,1176010,293099,296631,326624.0,278143.0,289424.0,316159,287099,305970
8,Kenya,Isiolo County,bzOfj0iwfDH,KE_County_11,24448,7168.0,7402,7923,7215,7485,...,267641,183045,51341,64295,33080.0,69253.0,43931.0,67678,64544,65984
9,Kenya,Kajiado County,Hsk1YV8kHkT,KE_County_34,38371,35467.0,43539,39573,42177,36731,...,1152398,933224,231990,332902,366318.0,312674.0,224004.0,354857,322477,343753


In [95]:
# Check for missing values
df_population1.isna().sum().sort_values(ascending=False)

population_growth_rate_2024                  17
population_growth_rate_2022                  14
population_growth_rate_2021                  13
population_growth_rate_2023                  12
population_growth_rate_2025                  12
population_growth_rate_2020                  10
population_growth_rate_2019                   3
estimated_number_of_pregnant_women_2018       1
population_growth_rate_2018                   1
estimated_number_of_pregnant_women_2024       0
estimated_number_of_pregnant_women_2021       0
estimated_number_of_pregnant_women_2020       0
estimated_number_of_pregnant_women_2019       0
estimated_number_of_pregnant_women_2022       0
estimated_number_of_pregnant_women_2025       0
uidcode                                       0
organisationunitname                          0
uid                                           0
county                                        0
estimated_number_of_pregnant_women_2023       0
women_of_childbearing_age_(1549yrs)_202

In [96]:
# Dealing with the missing values

# Select population growth rate columns
pop_growth_cols = sorted([col for col in df_population1.columns if "population_growth_rate" in col])

# Backfill and forward fill
df_population1[pop_growth_cols] = ( df_population1[pop_growth_cols] .bfill(axis=1).ffill(axis=1)
)

# Fill estimated_number_of_pregnant_women_2018 using median
median = df_population1['estimated_number_of_pregnant_women_2018'].median()
df_population1['estimated_number_of_pregnant_women_2018'].fillna(median, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().fillna(


3. ke_fp_commodity_data.csv

In [97]:
# Make a copy of the data
df_commodity1 = df_commodity.copy()

In [98]:
# Preview the data
df_commodity1.head()

Unnamed: 0,periodid,periodname,periodcode,perioddescription,orgunitlevel1,orgunitlevel2,organisationunitid,organisationunitname,organisationunitcode,organisationunitdescription,...,Male Condom Quantity Needed/Requested,Male Condom Stock Received,Progestin only Pills Beginning Balance,Progestin only Pills Ending Balanc,Progestin only Pills Issued/Dispensed,Progestin only Pills Losses,Progestin only Pills Negative Adjustment (Issued to Other HF),Progestin only Pills Positive Adjustment (Receipt from other HF),Progestin only Pills Quantity Needed/Requested,Progestin only Pills Stock Received
0,201404,Apr-14,201404,,Kenya,Kirinyaga County,Ulj33KBau7V,Kirinyaga County,KE_County_20,,...,3281.0,801.0,939.0,809.0,250.0,,,8.0,2957.0,114.0
1,201404,Apr-14,201404,,Kenya,Kisii County,sPkRcDvhGWA,Kisii County,KE_County_45,,...,,,,,,,,,,
2,201404,Apr-14,201404,,Kenya,Kisumu County,tAbBVBbueqD,Kisumu County,KE_County_42,,...,11300.0,,1516.0,1335.0,53.0,,,,110.0,
3,201404,Apr-14,201404,,Kenya,Makueni County,BoDytkJQ4Qi,Makueni County,KE_County_17,,...,2000.0,4000.0,1150.0,1136.0,14.0,,,,400.0,
4,201404,Apr-14,201404,,Kenya,Kwale County,N7YETT3A9r1,Kwale County,KE_County_2,,...,13300.0,,2747.0,2432.0,65.0,,15.0,,535.0,


In [99]:
# Explore the data
df_commodity1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2480 entries, 0 to 2479
Data columns (total 74 columns):
 #   Column                                                                         Non-Null Count  Dtype  
---  ------                                                                         --------------  -----  
 0   periodid                                                                       2480 non-null   int64  
 1   periodname                                                                     2480 non-null   object 
 2   periodcode                                                                     2480 non-null   int64  
 3   perioddescription                                                              0 non-null      float64
 4   orgunitlevel1                                                                  2480 non-null   object 
 5   orgunitlevel2                                                                  2480 non-null   object 
 6   organisationunitid      

In [100]:
# Standardize the column names
df_commodity1 = standardize_col_labels(df_commodity1)

df_commodity1.head()

Unnamed: 0,periodid,periodname,periodcode,perioddescription,orgunitlevel1,orgunitlevel2,organisationunitid,organisationunitname,organisationunitcode,organisationunitdescription,...,male_condom__quantity_needed/requested,male_condom__stock_received,progestin_only_pills_beginning_balance,progestin_only_pills_ending_balanc,progestin_only_pills_issued/dispensed,progestin_only_pills_losses,progestin_only_pills_negative_adjustment_(issued_to_other_hf),progestin_only_pills_positive_adjustment_(receipt_from_other_hf),progestin_only_pills_quantity_needed/requested,progestin_only_pills_stock_received
0,201404,Apr-14,201404,,Kenya,Kirinyaga County,Ulj33KBau7V,Kirinyaga County,KE_County_20,,...,3281.0,801.0,939.0,809.0,250.0,,,8.0,2957.0,114.0
1,201404,Apr-14,201404,,Kenya,Kisii County,sPkRcDvhGWA,Kisii County,KE_County_45,,...,,,,,,,,,,
2,201404,Apr-14,201404,,Kenya,Kisumu County,tAbBVBbueqD,Kisumu County,KE_County_42,,...,11300.0,,1516.0,1335.0,53.0,,,,110.0,
3,201404,Apr-14,201404,,Kenya,Makueni County,BoDytkJQ4Qi,Makueni County,KE_County_17,,...,2000.0,4000.0,1150.0,1136.0,14.0,,,,400.0,
4,201404,Apr-14,201404,,Kenya,Kwale County,N7YETT3A9r1,Kwale County,KE_County_2,,...,13300.0,,2747.0,2432.0,65.0,,15.0,,535.0,


In [101]:
# Rename column names
df_commodity1 = df_commodity1.rename(columns=name_map)
df_commodity1

Unnamed: 0,periodid,periodname,quarter,perioddescription,country,county,uid,organisationunitname,uidcode,organisationunitdescription,...,male_condom__quantity_needed/requested,male_condom__stock_received,progestin_only_pills_beginning_balance,progestin_only_pills_ending_balanc,progestin_only_pills_issued/dispensed,progestin_only_pills_losses,progestin_only_pills_negative_adjustment_(issued_to_other_hf),progestin_only_pills_positive_adjustment_(receipt_from_other_hf),progestin_only_pills_quantity_needed/requested,progestin_only_pills_stock_received
0,201404,Apr-14,201404,,Kenya,Kirinyaga County,Ulj33KBau7V,Kirinyaga County,KE_County_20,,...,3281.0,801.0,939.0,809.0,250.0,,,8.0,2957.0,114.0
1,201404,Apr-14,201404,,Kenya,Kisii County,sPkRcDvhGWA,Kisii County,KE_County_45,,...,,,,,,,,,,
2,201404,Apr-14,201404,,Kenya,Kisumu County,tAbBVBbueqD,Kisumu County,KE_County_42,,...,11300.0,,1516.0,1335.0,53.0,,,,110.0,
3,201404,Apr-14,201404,,Kenya,Makueni County,BoDytkJQ4Qi,Makueni County,KE_County_17,,...,2000.0,4000.0,1150.0,1136.0,14.0,,,,400.0,
4,201404,Apr-14,201404,,Kenya,Kwale County,N7YETT3A9r1,Kwale County,KE_County_2,,...,13300.0,,2747.0,2432.0,65.0,,15.0,,535.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2475,202209,Sep-22,202209,,Kenya,Kericho County,ihZsJ8alvtb,Kericho County,KE_County_35,,...,5000.0,,19.0,19.0,,,,,200.0,
2476,202209,Sep-22,202209,,Kenya,Nandi County,t0J75eHKxz5,Nandi County,KE_County_29,,...,,137.0,,,,,,,,
2477,202209,Sep-22,202209,,Kenya,Machakos County,yhCUgGcCcOo,Machakos County,KE_County_16,,...,,43200.0,1415.0,1412.0,603.0,,,,,600.0
2478,202309,Sep-23,202309,,Kenya,Nairobi County,jkG3zaihdSs,Nairobi County,KE_County_47,,...,,,89.0,77.0,12.0,,,,,


In [102]:
# Drop columns where all values are null 
df_commodity1=df_commodity1.dropna(axis=1, how='all')
df_commodity1

Unnamed: 0,periodid,periodname,quarter,country,county,uid,organisationunitname,uidcode,combined_oral_contraceptive_pills_beginning_balance,combined_oral_contraceptive_pills_ending_balanc,...,male_condom__quantity_needed/requested,male_condom__stock_received,progestin_only_pills_beginning_balance,progestin_only_pills_ending_balanc,progestin_only_pills_issued/dispensed,progestin_only_pills_losses,progestin_only_pills_negative_adjustment_(issued_to_other_hf),progestin_only_pills_positive_adjustment_(receipt_from_other_hf),progestin_only_pills_quantity_needed/requested,progestin_only_pills_stock_received
0,201404,Apr-14,201404,Kenya,Kirinyaga County,Ulj33KBau7V,Kirinyaga County,KE_County_20,35056.0,46036.0,...,3281.0,801.0,939.0,809.0,250.0,,,8.0,2957.0,114.0
1,201404,Apr-14,201404,Kenya,Kisii County,sPkRcDvhGWA,Kisii County,KE_County_45,66.0,58.0,...,,,,,,,,,,
2,201404,Apr-14,201404,Kenya,Kisumu County,tAbBVBbueqD,Kisumu County,KE_County_42,3800.0,3542.0,...,11300.0,,1516.0,1335.0,53.0,,,,110.0,
3,201404,Apr-14,201404,Kenya,Makueni County,BoDytkJQ4Qi,Makueni County,KE_County_17,2863.0,2588.0,...,2000.0,4000.0,1150.0,1136.0,14.0,,,,400.0,
4,201404,Apr-14,201404,Kenya,Kwale County,N7YETT3A9r1,Kwale County,KE_County_2,10011.0,9289.0,...,13300.0,,2747.0,2432.0,65.0,,15.0,,535.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2475,202209,Sep-22,202209,Kenya,Kericho County,ihZsJ8alvtb,Kericho County,KE_County_35,,,...,5000.0,,19.0,19.0,,,,,200.0,
2476,202209,Sep-22,202209,Kenya,Nandi County,t0J75eHKxz5,Nandi County,KE_County_29,68.0,43.0,...,,137.0,,,,,,,,
2477,202209,Sep-22,202209,Kenya,Machakos County,yhCUgGcCcOo,Machakos County,KE_County_16,49627.0,46174.0,...,,43200.0,1415.0,1412.0,603.0,,,,,600.0
2478,202309,Sep-23,202309,Kenya,Nairobi County,jkG3zaihdSs,Nairobi County,KE_County_47,60.0,60.0,...,,,89.0,77.0,12.0,,,,,


In [103]:
# Drop unwanted columns
df_commodity1.drop(columns='organisationunitname', inplace=True)
df_commodity1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,periodid,periodname,quarter,country,county,uid,uidcode,combined_oral_contraceptive_pills_beginning_balance,combined_oral_contraceptive_pills_ending_balanc,combined_oral_contraceptive_pills_issued/dispensed,...,male_condom__quantity_needed/requested,male_condom__stock_received,progestin_only_pills_beginning_balance,progestin_only_pills_ending_balanc,progestin_only_pills_issued/dispensed,progestin_only_pills_losses,progestin_only_pills_negative_adjustment_(issued_to_other_hf),progestin_only_pills_positive_adjustment_(receipt_from_other_hf),progestin_only_pills_quantity_needed/requested,progestin_only_pills_stock_received
0,201404,Apr-14,201404,Kenya,Kirinyaga County,Ulj33KBau7V,KE_County_20,35056.0,46036.0,5728.0,...,3281.0,801.0,939.0,809.0,250.0,,,8.0,2957.0,114.0
1,201404,Apr-14,201404,Kenya,Kisii County,sPkRcDvhGWA,KE_County_45,66.0,58.0,8.0,...,,,,,,,,,,
2,201404,Apr-14,201404,Kenya,Kisumu County,tAbBVBbueqD,KE_County_42,3800.0,3542.0,241.0,...,11300.0,,1516.0,1335.0,53.0,,,,110.0,
3,201404,Apr-14,201404,Kenya,Makueni County,BoDytkJQ4Qi,KE_County_17,2863.0,2588.0,149.0,...,2000.0,4000.0,1150.0,1136.0,14.0,,,,400.0,
4,201404,Apr-14,201404,Kenya,Kwale County,N7YETT3A9r1,KE_County_2,10011.0,9289.0,693.0,...,13300.0,,2747.0,2432.0,65.0,,15.0,,535.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2475,202209,Sep-22,202209,Kenya,Kericho County,ihZsJ8alvtb,KE_County_35,,,,...,5000.0,,19.0,19.0,,,,,200.0,
2476,202209,Sep-22,202209,Kenya,Nandi County,t0J75eHKxz5,KE_County_29,68.0,43.0,420.0,...,,137.0,,,,,,,,
2477,202209,Sep-22,202209,Kenya,Machakos County,yhCUgGcCcOo,KE_County_16,49627.0,46174.0,3653.0,...,,43200.0,1415.0,1412.0,603.0,,,,,600.0
2478,202309,Sep-23,202309,Kenya,Nairobi County,jkG3zaihdSs,KE_County_47,60.0,60.0,,...,,,89.0,77.0,12.0,,,,,


In [104]:
# Check for missing values
df_commodity1.isna().sum().sort_values(ascending=False)

emergency_pill_losses                                      2451
female_condom__losses                                      2428
implants_(1_rod)_losses                                    2425
male_condom__losses                                        2408
emergency_pill_negative_adjustment_(issued_to_other_hf)    2401
                                                           ... 
county                                                        0
country                                                       0
quarter                                                       0
periodname                                                    0
periodid                                                      0
Length: 71, dtype: int64

In [None]:
# Dealing with the missing values



In [105]:
# Check for duplicates
df_commodity1.duplicated().sum()

0

4. ke_fp_benchmarks_data.csv

In [106]:
# Make a copy of the data
df_benchmarks1 = df_benchmarks.copy()

In [107]:
# Preview the data
df_benchmarks1.head()

Unnamed: 0,county,uid,county_cou,total_number_of_facilities,general_service_readiness_index_(%),basic_equipment_mean_score_(%),essential_medicines_mean_score_(%),"core_health_workforce_per_10,000population",Total Population (2019),Female Population 15-49 (2019),Population Density (per sq. km),Urban Population (%),Avg. Household Size,"mCPR (Married Women, %)","Total Unmet Need (Married Women, %)",Demand Satisfied by Modern Methods (%),Total Fertility Rate (TFR),"Teenage Pregnancy Rate (15-19, %)"
0,baringo,vvOK1BxTbet,KE_County_30,169,52,65,29,10.6,666763,157639,60,12.9,4.7,44.8,16.6,62.2,4.4,19.3
1,bomet,HMNARUV2CW4,KE_County_36,111,62,81,35,8.9,875689,216493,392,14.8,4.5,56.9,16.7,71.1,4.3,19.0
2,bungoma,KGHhQ5GLd4k,KE_County_39,141,56,72,35,8.5,1670570,409928,552,22.0,4.6,62.9,14.6,79.0,4.5,18.4
3,busia,Tvf1zgVZ0K4,KE_County_40,80,58,72,37,11.9,893681,219897,527,20.3,4.5,56.0,18.6,73.4,4.7,17.6
4,elgeyo_marakwet,MqnLxQBigG0,KE_County_28,116,51,57,30,9.9,454480,108131,150,14.6,4.5,57.3,13.5,67.7,4.1,10.8


In [108]:
# Rename the columns
df_benchmarks1 = df_benchmarks1.rename(columns=name_map)
df_benchmarks1

Unnamed: 0,county,uid,uidcode,total_number_of_facilities,general_service_readiness_index_(%),basic_equipment_mean_score_(%),essential_medicines_mean_score_(%),"core_health_workforce_per_10,000population",Total Population (2019),Female Population 15-49 (2019),Population Density (per sq. km),Urban Population (%),Avg. Household Size,"mCPR (Married Women, %)","Total Unmet Need (Married Women, %)",Demand Satisfied by Modern Methods (%),Total Fertility Rate (TFR),"Teenage Pregnancy Rate (15-19, %)"
0,baringo,vvOK1BxTbet,KE_County_30,169,52,65,29,10.6,666763,157639,60,12.9,4.7,44.8,16.6,62.2,4.4,19.3
1,bomet,HMNARUV2CW4,KE_County_36,111,62,81,35,8.9,875689,216493,392,14.8,4.5,56.9,16.7,71.1,4.3,19.0
2,bungoma,KGHhQ5GLd4k,KE_County_39,141,56,72,35,8.5,1670570,409928,552,22.0,4.6,62.9,14.6,79.0,4.5,18.4
3,busia,Tvf1zgVZ0K4,KE_County_40,80,58,72,37,11.9,893681,219897,527,20.3,4.5,56.0,18.6,73.4,4.7,17.6
4,elgeyo_marakwet,MqnLxQBigG0,KE_County_28,116,51,57,30,9.9,454480,108131,150,14.6,4.5,57.3,13.5,67.7,4.1,10.8
5,embu,PFu8alU2KWG,KE_County_14,149,64,77,46,21.2,608599,159850,215,28.1,3.2,74.9,2.2,89.3,2.7,6.2
6,garissa,uyOrcHZBpW0,KE_County_7,129,58,64,47,10.0,841353,179064,19,33.7,6.1,12.6,10.8,48.4,6.1,13.9
7,homa_bay,nK0A12Q7MvS,KE_County_43,180,60,77,41,14.2,1131950,288582,359,14.4,4.2,63.3,17.0,77.3,4.3,22.7
8,isiolo,bzOfj0iwfDH,KE_County_11,41,64,90,44,22.0,268002,64813,11,42.9,4.4,31.0,27.3,51.5,4.3,21.8
9,kajiado,Hsk1YV8kHkT,KE_County_34,224,62,65,52,24.6,1117840,316279,51,46.5,3.5,54.8,12.5,75.7,4.0,19.7


### Feature Engineering

In [109]:
# Group elements by FP method

def group_by_fp_method(df, methods):
    methods = [ 'female_condoms', 'male_condoms', 'implants_1_rod', 'implants_2_rod','pills_combined_oral_contraceptives',
               'pills_progestin_only_contraceptives','iucd_hormonal', 'iucd_non_hormonal', 'emergency_pill',
                'injections', 'surgical']

    grouped_columns = {method: [] for method in methods}

    for col in df.columns:
        for method in methods:
            if method in col:  
                grouped_columns[method].append(col)
                break  # assign column to the first matched method only

    return grouped_columns

grouped_columns = group_by_fp_method(df_service1, methods=None)

In [110]:
# Aggregate by method
for method, cols in grouped_columns.items():
   df_service1[f"{method}_total"] = df_service1[cols].sum(axis=1)

# Print
print(df_service1[[f"{method}_total" for method in grouped_columns if grouped_columns[method]]])

      female_condoms_total  male_condoms_total  injections_total  \
0                      0.0              2018.0               0.0   
1                      0.0              3499.0               0.0   
2                      0.0              1906.0               0.0   
3                      0.0              1624.0               0.0   
4                      0.0             22320.0               0.0   
...                    ...                 ...               ...   
6199                   0.0                 9.0             754.0   
6200                  12.0               553.0            7400.0   
6201                  59.0               893.0           16929.0   
6202                   1.0               164.0            4064.0   
6203                  20.0               735.0            3161.0   

      surgical_total  
0                0.0  
1                0.0  
2                0.0  
3                0.0  
4                0.0  
...              ...  
6199             1.0  

Couple Years of Protection(CYP)-CYP measures the estimated protection provided by FP based on the volume of contraceptive method distribution to clients to help monitor health system performance and track trends and progress over time.

In [111]:
# Couple Years of Protection(CYP)

# CYP conversion factors
cyp_factors = {
    'condoms': 0.0083,
    'pills_combined_oral_contraceptives': 0.0067,
    'pills_progestin_only_contraceptives': 0.0833,
    'injections': 0.25,
    'implants_1_rod': 2.5,
    'implants_2_rod': 3.8,
    'iucd_hormonal': 4.8,
    'iucd_non_hormonal': 4.6,
    'surgical': 10.0
}

In [112]:
# Calculate CYP per method
for method, factor in cyp_factors.items():
    method_col = f"{method}_total" # Create a new column
    if method_col in df_service1.columns:
        df_service1[f"{method}_cyp"] = df_service1[method_col] * factor

# Print
cyp_cols = [f"{method}_cyp" for method in cyp_factors if f"{method}_cyp" in df_service1.columns]
print(df_service1[cyp_cols])

      pills_combined_oral_contraceptives_cyp  \
0                                        0.0   
1                                        0.0   
2                                        0.0   
3                                        0.0   
4                                        0.0   
...                                      ...   
6199                                     0.0   
6200                                     0.0   
6201                                     0.0   
6202                                     0.0   
6203                                     0.0   

      pills_progestin_only_contraceptives_cyp  injections_cyp  \
0                                         0.0            0.00   
1                                         0.0            0.00   
2                                         0.0            0.00   
3                                         0.0            0.00   
4                                         0.0            0.00   
...                              

In [113]:
# Total CYP per row
df_service1['cyp_total'] = df_service1[cyp_cols].sum(axis=1) #create new column 'cyp_total' and populate with total cyp for each method

# Print
print(df_service1[['cyp_total'] + cyp_cols])

      cyp_total  pills_combined_oral_contraceptives_cyp  \
0          0.00                                     0.0   
1          0.00                                     0.0   
2          0.00                                     0.0   
3          0.00                                     0.0   
4          0.00                                     0.0   
...         ...                                     ...   
6199     198.50                                     0.0   
6200    1860.00                                     0.0   
6201    4412.25                                     0.0   
6202    1016.00                                     0.0   
6203     890.25                                     0.0   

      pills_progestin_only_contraceptives_cyp  injections_cyp  \
0                                         0.0            0.00   
1                                         0.0            0.00   
2                                         0.0            0.00   
3                              

 Stock at hand: This will inform whether a particular facility is ready to offer services or not

In [114]:
fp_methods = {
    "female_condoms": "female_condom",
    "male_condoms": "male_condom",
    "pills_combined_oral_contraceptives":"combined_oral_contraceptives",
    "pills_progestin_only_contraceptives": "progestin_only_pills",
    "injections": "injectables",
    "implants_1_rod": "implants_(1_rod)",
    "implants_2_rod": "implants_(2_rod)",
    "iucd_non_hormonal": "iucd_copper_t",
    }