# Predicting Family Planning Demand and Optimizing Service Delivery in Kenya

## Project Overview
This project aims to enhance family planning (FP) service delivery efficiency by leveraging data-driven approaches, specifically machine learning (ML) using the CRISP-DM methodology. The goal is to improve maternal and child health outcomes, reduce unmet need for family planning, and contribute to achieving national and global development goals related to reproductive health.

## Problem Statement
Kenya faces significant strides in increasing access to family planning services, yet a substantial unmet need for family planning remains. According to the 2022 Kenya Demographic and Health Survey (KDHS), the total unmet need for family planning is 15%, with 10% for spacing and 5% for limiting births. This indicates that a significant portion of the population desires to space or limit births but is not using any contraceptive method. Traditional methods often show imbalances, with a heavy reliance on short-acting methods, leading to unstable uptake of long-acting reversible contraceptives (LARCs) and permanent methods. This disparity can lead to higher discontinuation rates and continued unmet need. Supply chain inefficiencies, commodity stock-outs, inadequate healthcare worker training, and uneven distribution of resources exacerbate these issues, hindering effective service delivery.

## Stakeholders
* **Government of Kenya:** Ministry of Health (MoH), especially the National Family Planning Coordinated Implementation Plan (NFPICIP) and Kenya Health Information System (KHIS) initiatives.
* **Policymakers and Donors:** Need evidence-based advocacy for resource mobilization and investment.
* **Healthcare Providers:** Frontline health workers providing FP services.
* **Women of Reproductive Age:** Direct beneficiaries of improved FP services.
* **Local Communities:** Impacted by and involved in FP service delivery.

## Key Statistics
* **Total Unmet Need for Family Planning (2022 KDHS):** 15%
    * **Unmet Need for Spacing:** 10%
    * **Unmet Need for Limiting Births:** 5%
* **Married Women Aged 15-49 Rising to 57% in 2022 (DRS 2022):** This likely refers to modern contraceptive prevalence rate (mCPR).
* **Number of new clients for FP method band and the continuation rate:** Key data points for predicting future demand.

## Key Analytics Questions
* How many new clients are expected for injectables in a County next quarter?
* What is the projected demand for different family planning methods (injectables, pills, condoms, implants, IUD, sterilization) at various geographical (e.g., county) and temporal (e.g., quarterly, annual) granularities?
* How can we optimize resource allocation (commodities, equipment, staffing) to meet projected demand and minimize wastage?
* How can predictive analytics identify potential stock-outs or oversupply of specific FP commodities in different locations?
* Which regions or demographics are most underserved in terms of family planning access and uptake?

## Objectives
* **Quantitatively forecast the demand for specific family planning methods:** This includes predicting the continuation rates of users for each method at defined geographical and temporal scales.
* **Enable proactive resource allocation:** This involves optimizing the distribution of commodities, equipment, and staffing to reduce stock-outs, minimize wastage, and improve targeted interventions.
* **Improve method continuation:** By understanding demand and improving service delivery, the project aims to reduce discontinuation rates and increase sustained use of FP methods.
* **Provide evidence-based insights:** Support policymakers and donors in making informed decisions regarding resource mobilization and investment in family planning.

## Metrics of Importance to Focus On
* **Accuracy of Demand Forecasts:** Measured by comparing predicted demand with actual uptake for various FP methods at different geographical and temporal levels (e.g., Mean Absolute Error, Root Mean Squared Error).
* **Commodity Stock-out Rates:** Reduction in the number or duration of stock-outs for essential family planning commodities.
* **Resource Utilization Efficiency:** Metrics related to optimal allocation and reduced wastage of commodities, equipment, and human resources.
* **Method Continuation Rates:** Increase in the percentage of users who continue using a specific family planning method over a defined period (e.g., 12-month continuation rate).
* **Unmet Need for Family Planning:** Contribution to the reduction of the national unmet need for family planning.
* **Client Satisfaction:** Indirectly improved through better access and availability of preferred methods.
* **Healthcare Worker Productivity:** Optimized allocation of staff to meet demand efficiently.

In [1]:
#Import necessary libraries for data manipulation, visualization, and machine learning
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# Cutting across (general data manipulation, visualization, and utility)
from sklearn.model_selection import train_test_split, cross_val_score, TimeSeriesSplit
from sklearn.preprocessing import StandardScaler, MinMaxScaler, OneHotEncoder
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer


# Classification Model Libraries (common and versatile)
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier


# Regression Model Libraries (common and versatile)
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

# Data loading
# Assuming the 'data' folder is in the same directory as the script
try:
    population_data_path = "data/ke_fp_population_data.csv"
    service_data_path = "data/ke_fp_service_data.csv"

    # Attempt to read with 'latin1' encoding, a common alternative for non-UTF-8 files
    df_population = pd.read_csv(population_data_path, encoding='latin1')
    df_service = pd.read_csv(service_data_path, encoding='latin1')

    print("Datasets loaded successfully:")
    print(f"ke_fp_population_data.csv shape: {df_population.shape}")
    print(f"ke_fp_service_data.csv shape: {df_service.shape}")

except FileNotFoundError as e:
    print(f"Error: One or both of the CSV files were not found.")
    print(f"Please ensure 'ke_fp_population_data.csv' and 'ke_fp_service_data.csv' are in a folder named 'data' in the same directory as this script.")
    print(e)
except Exception as e:
    print(f"An unexpected error occurred while loading the datasets: {e}")





Datasets loaded successfully:
ke_fp_population_data.csv shape: (47, 38)
ke_fp_service_data.csv shape: (1128, 60)


### Data Cleaning

This involved;
* Standardization of the column names
* Renaming the columns
* Dropping empty and unwanted columns
* Handling missing values, duplicates and outliers

1. ke_fp_service_data.csv

In [35]:
# Make a copy of the data
df_service1=df_service.copy()

In [36]:
# Preview the data
df_service1.head()

Unnamed: 0,periodid,periodname,periodcode,perioddescription,orgunitlevel1,orgunitlevel2,organisationunitid,organisationunitname,organisationunitcode,organisationunitdescription,...,MOH 711 Rev 2020_Post parturm FP 4weeks to 6weeks Re-visits,MOH 711 Rev 2020_Post parturm FP within 48 Hours New clients,MOH 711 Rev 2020_Post parturm FP within 48 Hours Re-visits,MOH 711 Rev 2020_Voluntary Surgical Contraception Vasectomy Ist Time Insertion,MOH 711 Rev 2020_Voluntary Surgical Contraception Vasectomy Re-insertion,MOH 711 Rev 2020_Voluntary surgical contraception BTL Ist Time Insertion,MOH 711 Rev 2020_Voluntary surgical contraception BTL Re-insertion,Population Growth Rate,Total Population,Women of childbearing age (1549yrs)
0,2019Q2,April - June 2019,2019Q2,,Kenya,Baringo County,vvOK1BxTbet,Baringo County,KE_County_30,,...,,,,,,,,3.21,772181.43,187683.8
1,2019Q2,April - June 2019,2019Q2,,Kenya,Marsabit County,Eey8fT4Im3y,Marsabit County,KE_County_10,,...,,,,,,,,,325069.4,74575.68
2,2020Q2,April - June 2020,2020Q2,,Kenya,Lamu County,NjWSbQTwys4,Lamu County,KE_County_5,,...,,,,,,,,,148523.0,33620.0
3,2020Q2,April - June 2020,2020Q2,,Kenya,Kirinyaga County,Ulj33KBau7V,Kirinyaga County,KE_County_20,,...,,,,,,,,1.99,622651.0,165992.0
4,2020Q2,April - June 2020,2020Q2,,Kenya,Marsabit County,Eey8fT4Im3y,Marsabit County,KE_County_10,,...,,,,,,,,,475473.0,104014.0


In [37]:
# Explore the data
df_service1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1128 entries, 0 to 1127
Data columns (total 60 columns):
 #   Column                                                                          Non-Null Count  Dtype  
---  ------                                                                          --------------  -----  
 0   periodid                                                                        1128 non-null   object 
 1   periodname                                                                      1128 non-null   object 
 2   periodcode                                                                      1128 non-null   object 
 3   perioddescription                                                               0 non-null      float64
 4   orgunitlevel1                                                                   1128 non-null   object 
 5   orgunitlevel2                                                                   1128 non-null   object 
 6   organisationunit

In [39]:
# Standardize the column names

def standardize_col_labels(df):
    def clean_column(col):
        # Remove redundant prefixes
        col = col.replace('MOH 711 Rev ', '')
        col = col.replace('MOH 711 ', '')
        
        # Formatting
        col = col.strip().lower()          # Convert to lowercase
        col = col.replace(' ', '_')     # Replace spaces with underscores
        col = col.replace('-', '_')  # Replace hyphen with underscores
        col = col.replace('–', '_')    # en-dash
        return col

    df.columns = [clean_column(col) for col in df.columns]
    return df

df_service1 = standardize_col_labels(df_service1)
df_service1.columns

Index(['periodid', 'periodname', 'periodcode', 'perioddescription',
       'orgunitlevel1', 'orgunitlevel2', 'organisationunitid',
       'organisationunitname', 'organisationunitcode',
       'organisationunitdescription', 'estimated_number_of_pregnant_women',
       'fp_attendance_new_clients', 'fp_attendance_re_visits',
       'adolescent_10_14_yrs_receiving_fp_services_new_clients',
       'adolescent_10_14_yrs_receiving_fp_services_re_visits',
       '15_19_yrs_receiving_fp_services_new_clients',
       'adolescent_15_19_yrs_receiving_fp_services_re_visits',
       'adolescent_20_24_yrs_receiving_fp_services_new_clients',
       'adolescent_20_24_yrs_receiving_fp_services_re_visits',
       'client_receiving_male_condoms_new_clients',
       'client_receiving_male_condoms_re_visits',
       'clients_counselled_natural_family_planning_new_clients',
       'clients_counselled_natural_family_planning_re_visits',
       'clients_receiving_female_condoms_new_clients',
       'clients_r

In [40]:
# Rename column names

name_map = {
    'periodcode': 'quarter',
    'orgunitlevel1': 'country',
    'orgunitlevel2': 'county',
    'organisationunitid': 'uid',
    'organisationunitcode':'uidcode'
}
df_service1 = df_service1.rename(columns=name_map)
df_service1

Unnamed: 0,periodid,periodname,quarter,perioddescription,country,county,uid,organisationunitname,uidcode,organisationunitdescription,...,2020_post_parturm_fp_4weeks_to_6weeks_re_visits,2020_post_parturm_fp_within_48_hours_new_clients,2020_post_parturm_fp_within_48_hours_re_visits,2020_voluntary_surgical_contraception_vasectomy_ist_time_insertion,2020_voluntary_surgical_contraception_vasectomy_re_insertion,2020_voluntary_surgical_contraception_btl_ist_time_insertion,2020_voluntary_surgical_contraception_btl_re_insertion,population_growth_rate,total_population,women_of_childbearing_age_(1549yrs)
0,2019Q2,April - June 2019,2019Q2,,Kenya,Baringo County,vvOK1BxTbet,Baringo County,KE_County_30,,...,,,,,,,,3.21,772181.43,187683.80
1,2019Q2,April - June 2019,2019Q2,,Kenya,Marsabit County,Eey8fT4Im3y,Marsabit County,KE_County_10,,...,,,,,,,,,325069.40,74575.68
2,2020Q2,April - June 2020,2020Q2,,Kenya,Lamu County,NjWSbQTwys4,Lamu County,KE_County_5,,...,,,,,,,,,148523.00,33620.00
3,2020Q2,April - June 2020,2020Q2,,Kenya,Kirinyaga County,Ulj33KBau7V,Kirinyaga County,KE_County_20,,...,,,,,,,,1.99,622651.00,165992.00
4,2020Q2,April - June 2020,2020Q2,,Kenya,Marsabit County,Eey8fT4Im3y,Marsabit County,KE_County_10,,...,,,,,,,,,475473.00,104014.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1123,2020Q4,October - December 2020,2020Q4,,Kenya,Marsabit County,Eey8fT4Im3y,Marsabit County,KE_County_10,,...,1.0,18.0,,,,,,,475473.00,104014.00
1124,2023Q4,October - December 2023,2023Q4,,Kenya,Wajir County,CeLsrJOH0g9,Wajir County,KE_County_8,,...,,70.0,,,,,,2.46,863278.85,179017.54
1125,2023Q4,October - December 2023,2023Q4,,Kenya,Mandera County,R6f9znhg37c,Mandera County,KE_County_9,,...,,10.0,,,,,,3.00,988993.43,195635.01
1126,2024Q4,October - December 2024,2024Q4,,Kenya,Wajir County,CeLsrJOH0g9,Wajir County,KE_County_8,,...,5.0,71.0,,,,,,2.20,886747.00,184023.00


In [41]:
# Drop columns where all values are null 
df_service1=df_service1.dropna(axis=1, how='all')
df_service1

Unnamed: 0,periodid,periodname,quarter,country,county,uid,organisationunitname,uidcode,estimated_number_of_pregnant_women,fp_attendance_new_clients,...,2020_post_parturm_fp_4weeks_to_6weeks_re_visits,2020_post_parturm_fp_within_48_hours_new_clients,2020_post_parturm_fp_within_48_hours_re_visits,2020_voluntary_surgical_contraception_vasectomy_ist_time_insertion,2020_voluntary_surgical_contraception_vasectomy_re_insertion,2020_voluntary_surgical_contraception_btl_ist_time_insertion,2020_voluntary_surgical_contraception_btl_re_insertion,population_growth_rate,total_population,women_of_childbearing_age_(1549yrs)
0,2019Q2,April - June 2019,2019Q2,Kenya,Baringo County,vvOK1BxTbet,Baringo County,KE_County_30,29046.42,2402,...,,,,,,,,3.21,772181.43,187683.80
1,2019Q2,April - June 2019,2019Q2,Kenya,Marsabit County,Eey8fT4Im3y,Marsabit County,KE_County_10,13732.38,2075,...,,,,,,,,,325069.40,74575.68
2,2020Q2,April - June 2020,2020Q2,Kenya,Lamu County,NjWSbQTwys4,Lamu County,KE_County_5,4847.00,1636,...,,,,,,,,,148523.00,33620.00
3,2020Q2,April - June 2020,2020Q2,Kenya,Kirinyaga County,Ulj33KBau7V,Kirinyaga County,KE_County_20,13640.00,4488,...,,,,,,,,1.99,622651.00,165992.00
4,2020Q2,April - June 2020,2020Q2,Kenya,Marsabit County,Eey8fT4Im3y,Marsabit County,KE_County_10,15757.00,1959,...,,,,,,,,,475473.00,104014.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1123,2020Q4,October - December 2020,2020Q4,Kenya,Marsabit County,Eey8fT4Im3y,Marsabit County,KE_County_10,15757.00,1805,...,1.0,18.0,,,,,,,475473.00,104014.00
1124,2023Q4,October - December 2023,2023Q4,Kenya,Wajir County,CeLsrJOH0g9,Wajir County,KE_County_8,25822.25,1433,...,,70.0,,,,,,2.46,863278.85,179017.54
1125,2023Q4,October - December 2023,2023Q4,Kenya,Mandera County,R6f9znhg37c,Mandera County,KE_County_9,33823.33,1729,...,,10.0,,,,,,3.00,988993.43,195635.01
1126,2024Q4,October - December 2024,2024Q4,Kenya,Wajir County,CeLsrJOH0g9,Wajir County,KE_County_8,28489.00,2315,...,5.0,71.0,,,,,,2.20,886747.00,184023.00


In [42]:
# Drop unwanted columns
df_service1=df_service1.drop(columns=['periodid','organisationunitname', 'periodname','population_growth_rate',
                                       'total_population','women_of_childbearing_age_(1549yrs)'
                                       ], axis=1)


In [51]:
# Check for missing values
df_service1.isna().sum().sort_values(ascending=False)

2020_voluntary_surgical_contraception_vasectomy_re_insertion          1123
2020_voluntary_surgical_contraception_btl_re_insertion                1100
2020_clients_given_cycle_beads_re_visits                              1055
2020_voluntary_surgical_contraception_vasectomy_ist_time_insertion     920
2020_clients_receiving_post_abortion_fp_re_visits                      838
2020_clients_given_cycle_beads_new_clients                             530
2020_post_parturm_fp_within_48_hours_re_visits                         517
2020_voluntary_surgical_contraception_btl_ist_time_insertion           496
2020_post_parturm_fp_4weeks_to_6weeks_re_visits                        486
2020_iucd_insertion_hormonal_re_insertion                              457
clients_counselled_natural_family_planning_re_visits                   454
2020_clients_receiving_post_abortion_fp_new_clients                    371
2020_iucd_insertion_non_hormonal_re_insertion                          371
2020_iucd_insertion_hormo

Missing values were interpreted as 'no clients served' and filled with 0

In [53]:
# Fill the missing values with zeros
df_service1 = df_service1.fillna(0)

In [52]:
# Check for duplicates
df_service1.duplicated().sum()

0

2. ke_fp_population_data

In [43]:
# Make a copy of the data
df_population1 =df_population.copy()

In [44]:
# Explore the data
df_population1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47 entries, 0 to 46
Data columns (total 38 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   orgunitlevel1                              47 non-null     object 
 1   orgunitlevel2                              47 non-null     object 
 2   organisationunitid                         47 non-null     object 
 3   organisationunitname                       47 non-null     object 
 4   organisationunitcode                       47 non-null     object 
 5   organisationunitdescription                0 non-null      float64
 6   Estimated Number of Pregnant Women 2020    47 non-null     int64  
 7   Estimated Number of Pregnant Women 2018    46 non-null     float64
 8   Estimated Number of Pregnant Women 2024    47 non-null     int64  
 9   Estimated Number of Pregnant Women 2021    47 non-null     int64  
 10  Estimated Number of Pregnant

In [45]:
# Standardize column names
df_population1 = standardize_col_labels(df_population1)

# Preview the data
df_population1.head()

Unnamed: 0,orgunitlevel1,orgunitlevel2,organisationunitid,organisationunitname,organisationunitcode,organisationunitdescription,estimated_number_of_pregnant_women_2020,estimated_number_of_pregnant_women_2018,estimated_number_of_pregnant_women_2024,estimated_number_of_pregnant_women_2021,...,total_population_2020,total_population_2018,women_of_childbearing_age_(1549yrs)_2019,women_of_childbearing_age_(1549yrs)_2022,women_of_childbearing_age_(1549yrs)_2025,women_of_childbearing_age_(1549yrs)_2020,women_of_childbearing_age_(1549yrs)_2018,women_of_childbearing_age_(1549yrs)_2024,women_of_childbearing_age_(1549yrs)_2021,women_of_childbearing_age_(1549yrs)_2023
0,Kenya,Baringo County,vvOK1BxTbet,Baringo County,KE_County_30,,22791,28403.0,25859,23508,...,689090,754013,188198,166778,367820.0,156691.0,175023.0,177596,161588,172072
1,Kenya,Bomet County,HMNARUV2CW4,Bomet County,KE_County_36,,27782,35596.0,31224,28589,...,902047,963723,229073,236968,240219.0,223603.0,229615.0,251118,230087,244215
2,Kenya,Bungoma County,KGHhQ5GLd4k,Bungoma County,KE_County_39,,59820,65489.0,68382,61766,...,1731240,1637392,397598,435684,481027.0,408267.0,394095.0,454863,421554,450225
3,Kenya,Busia County,Tvf1zgVZ0K4,Busia County,KE_County_40,,28685,22589.0,32280,29517,...,922352,869978,201282,237629,259940.0,224145.0,198285.0,252295,230672,244885
4,Kenya,Elgeyo Marakwet County,MqnLxQBigG0,Elgeyo Marakwet County,KE_County_28,,16561,19301.0,18939,17111,...,470812,502411,123822,118538,131140.0,110930.0,120579.0,126811,114604,122616


In [46]:
# Rename column names
df_population1 = df_population1.rename(columns=name_map)
df_population1

Unnamed: 0,country,county,uid,organisationunitname,uidcode,organisationunitdescription,estimated_number_of_pregnant_women_2020,estimated_number_of_pregnant_women_2018,estimated_number_of_pregnant_women_2024,estimated_number_of_pregnant_women_2021,...,total_population_2020,total_population_2018,women_of_childbearing_age_(1549yrs)_2019,women_of_childbearing_age_(1549yrs)_2022,women_of_childbearing_age_(1549yrs)_2025,women_of_childbearing_age_(1549yrs)_2020,women_of_childbearing_age_(1549yrs)_2018,women_of_childbearing_age_(1549yrs)_2024,women_of_childbearing_age_(1549yrs)_2021,women_of_childbearing_age_(1549yrs)_2023
0,Kenya,Baringo County,vvOK1BxTbet,Baringo County,KE_County_30,,22791,28403.0,25859,23508,...,689090,754013,188198,166778,367820.0,156691.0,175023.0,177596,161588,172072
1,Kenya,Bomet County,HMNARUV2CW4,Bomet County,KE_County_36,,27782,35596.0,31224,28589,...,902047,963723,229073,236968,240219.0,223603.0,229615.0,251118,230087,244215
2,Kenya,Bungoma County,KGHhQ5GLd4k,Bungoma County,KE_County_39,,59820,65489.0,68382,61766,...,1731240,1637392,397598,435684,481027.0,408267.0,394095.0,454863,421554,450225
3,Kenya,Busia County,Tvf1zgVZ0K4,Busia County,KE_County_40,,28685,22589.0,32280,29517,...,922352,869978,201282,237629,259940.0,224145.0,198285.0,252295,230672,244885
4,Kenya,Elgeyo Marakwet County,MqnLxQBigG0,Elgeyo Marakwet County,KE_County_28,,16561,19301.0,18939,17111,...,470812,502411,123822,118538,131140.0,110930.0,120579.0,126811,114604,122616
5,Kenya,Embu County,PFu8alU2KWG,Embu County,KE_County_14,,15561,15013.0,17131,15929,...,622993,571413,147630,167305,179861.0,159579.0,143763.0,175593,163334,171428
6,Kenya,Garissa County,uyOrcHZBpW0,Garissa County,KE_County_7,,43114,22261.0,46476,44334,...,1216738,725589,170865,281124,328852.0,273913.0,168130.0,309841,281989,299366
7,Kenya,Homa Bay County,nK0A12Q7MvS,Homa Bay County,KE_County_43,,39907,47462.0,45521,41191,...,1167024,1176010,293099,296631,326624.0,278143.0,289424.0,316159,287099,305970
8,Kenya,Isiolo County,bzOfj0iwfDH,Isiolo County,KE_County_11,,24448,7168.0,7402,7923,...,267641,183045,51341,64295,33080.0,69253.0,43931.0,67678,64544,65984
9,Kenya,Kajiado County,Hsk1YV8kHkT,Kajiado County,KE_County_34,,38371,35467.0,43539,39573,...,1152398,933224,231990,332902,366318.0,312674.0,224004.0,354857,322477,343753


In [47]:
# Drop columns where all values are null 
df_population1=df_population1.dropna(axis=1, how='all')
df_population1

Unnamed: 0,country,county,uid,organisationunitname,uidcode,estimated_number_of_pregnant_women_2020,estimated_number_of_pregnant_women_2018,estimated_number_of_pregnant_women_2024,estimated_number_of_pregnant_women_2021,estimated_number_of_pregnant_women_2023,...,total_population_2020,total_population_2018,women_of_childbearing_age_(1549yrs)_2019,women_of_childbearing_age_(1549yrs)_2022,women_of_childbearing_age_(1549yrs)_2025,women_of_childbearing_age_(1549yrs)_2020,women_of_childbearing_age_(1549yrs)_2018,women_of_childbearing_age_(1549yrs)_2024,women_of_childbearing_age_(1549yrs)_2021,women_of_childbearing_age_(1549yrs)_2023
0,Kenya,Baringo County,vvOK1BxTbet,Baringo County,KE_County_30,22791,28403.0,25859,23508,25054,...,689090,754013,188198,166778,367820.0,156691.0,175023.0,177596,161588,172072
1,Kenya,Bomet County,HMNARUV2CW4,Bomet County,KE_County_36,27782,35596.0,31224,28589,30324,...,902047,963723,229073,236968,240219.0,223603.0,229615.0,251118,230087,244215
2,Kenya,Bungoma County,KGHhQ5GLd4k,Bungoma County,KE_County_39,59820,65489.0,68382,61766,66192,...,1731240,1637392,397598,435684,481027.0,408267.0,394095.0,454863,421554,450225
3,Kenya,Busia County,Tvf1zgVZ0K4,Busia County,KE_County_40,28685,22589.0,32280,29517,31336,...,922352,869978,201282,237629,259940.0,224145.0,198285.0,252295,230672,244885
4,Kenya,Elgeyo Marakwet County,MqnLxQBigG0,Elgeyo Marakwet County,KE_County_28,16561,19301.0,18939,17111,18312,...,470812,502411,123822,118538,131140.0,110930.0,120579.0,126811,114604,122616
5,Kenya,Embu County,PFu8alU2KWG,Embu County,KE_County_14,15561,15013.0,17131,15929,16723,...,622993,571413,147630,167305,179861.0,159579.0,143763.0,175593,163334,171428
6,Kenya,Garissa County,uyOrcHZBpW0,Garissa County,KE_County_7,43114,22261.0,46476,44334,47112,...,1216738,725589,170865,281124,328852.0,273913.0,168130.0,309841,281989,299366
7,Kenya,Homa Bay County,nK0A12Q7MvS,Homa Bay County,KE_County_43,39907,47462.0,45521,41191,44060,...,1167024,1176010,293099,296631,326624.0,278143.0,289424.0,316159,287099,305970
8,Kenya,Isiolo County,bzOfj0iwfDH,Isiolo County,KE_County_11,24448,7168.0,7402,7923,7215,...,267641,183045,51341,64295,33080.0,69253.0,43931.0,67678,64544,65984
9,Kenya,Kajiado County,Hsk1YV8kHkT,Kajiado County,KE_County_34,38371,35467.0,43539,39573,42177,...,1152398,933224,231990,332902,366318.0,312674.0,224004.0,354857,322477,343753


In [48]:
# Drop unwanted columns
df_population1.drop(columns=['organisationunitname'], axis=1) # This was dropped because it is the same as county

Unnamed: 0,country,county,uid,uidcode,estimated_number_of_pregnant_women_2020,estimated_number_of_pregnant_women_2018,estimated_number_of_pregnant_women_2024,estimated_number_of_pregnant_women_2021,estimated_number_of_pregnant_women_2023,estimated_number_of_pregnant_women_2019,...,total_population_2020,total_population_2018,women_of_childbearing_age_(1549yrs)_2019,women_of_childbearing_age_(1549yrs)_2022,women_of_childbearing_age_(1549yrs)_2025,women_of_childbearing_age_(1549yrs)_2020,women_of_childbearing_age_(1549yrs)_2018,women_of_childbearing_age_(1549yrs)_2024,women_of_childbearing_age_(1549yrs)_2021,women_of_childbearing_age_(1549yrs)_2023
0,Kenya,Baringo County,vvOK1BxTbet,KE_County_30,22791,28403.0,25859,23508,25054,29126,...,689090,754013,188198,166778,367820.0,156691.0,175023.0,177596,161588,172072
1,Kenya,Bomet County,HMNARUV2CW4,KE_County_36,27782,35596.0,31224,28589,30324,31522,...,902047,963723,229073,236968,240219.0,223603.0,229615.0,251118,230087,244215
2,Kenya,Bungoma County,KGHhQ5GLd4k,KE_County_39,59820,65489.0,68382,61766,66192,58642,...,1731240,1637392,397598,435684,481027.0,408267.0,394095.0,454863,421554,450225
3,Kenya,Busia County,Tvf1zgVZ0K4,KE_County_40,28685,22589.0,32280,29517,31336,33360,...,922352,869978,201282,237629,259940.0,224145.0,198285.0,252295,230672,244885
4,Kenya,Elgeyo Marakwet County,MqnLxQBigG0,KE_County_28,16561,19301.0,18939,17111,18312,19812,...,470812,502411,123822,118538,131140.0,110930.0,120579.0,126811,114604,122616
5,Kenya,Embu County,PFu8alU2KWG,KE_County_14,15561,15013.0,17131,15929,16723,15416,...,622993,571413,147630,167305,179861.0,159579.0,143763.0,175593,163334,171428
6,Kenya,Garissa County,uyOrcHZBpW0,KE_County_7,43114,22261.0,46476,44334,47112,27640,...,1216738,725589,170865,281124,328852.0,273913.0,168130.0,309841,281989,299366
7,Kenya,Homa Bay County,nK0A12Q7MvS,KE_County_43,39907,47462.0,45521,41191,44060,48214,...,1167024,1176010,293099,296631,326624.0,278143.0,289424.0,316159,287099,305970
8,Kenya,Isiolo County,bzOfj0iwfDH,KE_County_11,24448,7168.0,7402,7923,7215,7485,...,267641,183045,51341,64295,33080.0,69253.0,43931.0,67678,64544,65984
9,Kenya,Kajiado County,Hsk1YV8kHkT,KE_County_34,38371,35467.0,43539,39573,42177,36731,...,1152398,933224,231990,332902,366318.0,312674.0,224004.0,354857,322477,343753


In [54]:
# Check for missing values
df_population1.isna().sum().sort_values(ascending=False)

population_growth_rate_2024                  17
population_growth_rate_2022                  14
population_growth_rate_2021                  13
population_growth_rate_2023                  12
population_growth_rate_2025                  12
population_growth_rate_2020                  10
population_growth_rate_2019                   3
estimated_number_of_pregnant_women_2018       1
population_growth_rate_2018                   1
estimated_number_of_pregnant_women_2024       0
estimated_number_of_pregnant_women_2021       0
estimated_number_of_pregnant_women_2020       0
estimated_number_of_pregnant_women_2019       0
estimated_number_of_pregnant_women_2022       0
estimated_number_of_pregnant_women_2025       0
uidcode                                       0
organisationunitname                          0
uid                                           0
county                                        0
estimated_number_of_pregnant_women_2023       0
women_of_childbearing_age_(1549yrs)_202

### Feature Engineering

In [84]:
# Group elements by FP method

def group_by_fp_method(df, methods):
    methods = [ 'condoms', 'implants','pills','iucd', 'injections', 'surgical']

    grouped_columns = {method: [] for method in methods}

    for col in df.columns:
        for method in methods:
            if method in col:  
                grouped_columns[method].append(col)
                break  # assign column to the first matched method only

    return grouped_columns

grouped_columns = group_by_fp_method(df_service1, methods=None)

In [89]:
# Aggregate by method
for method, cols in grouped_columns.items():
   df_service1[f"{method}_total"] = df_service1[cols].sum(axis=1)

# Print
print(df_service1[[f"{method}_total" for method in grouped_columns if grouped_columns[method]]])

      condoms_total  implants_total  pills_total  iucd_total  \
0            4318.0             0.0       1502.0         0.0   
1            1382.0             0.0        442.0         0.0   
2             647.0             0.0       1023.0         0.0   
3             582.0             0.0      10485.0         0.0   
4             842.0            32.0        376.0         5.0   
...             ...             ...          ...         ...   
1123          358.0           367.0        508.0        10.0   
1124          656.0           206.0        203.0         5.0   
1125          913.0           363.0        846.0         2.0   
1126         1880.0           335.0        630.0         6.0   
1127          832.0           510.0        707.0         2.0   

      injections_total  surgical_total  
0                  0.0             0.0  
1                  0.0             0.0  
2                  0.0             0.0  
3                  0.0             0.0  
4                 63.0    