# Predictive Health Assessment: Leveraging DHS Data for Targeted Interventions in Kenya


**Authors**: [Alpha Guya](mailto:alpha.guya@student.moringaschool.com), [Ben Ochoro](mailto:ben.ochoro@student.moringaschool.com), [Caleb Ochieng](mailto:caleb.ochieng@student.moringaschool.com), [Christine Mukiri](mailto:christine.mukiri@student.moringaschool.com), [Dominic Muli](mailto:dominic.muli@student.moringaschool.com), [Frank Mandele](mailto:frank.mandele@student.moringaschool.com), [Jacquiline Tulinye](mailto:jacquiline.tulinye@student.moringaschool.com) and [Lesley Wanjiku](mailto:lesley.wanjiku@student.moringaschool.com)

## 1.0) Project Overview

Our project focuses on using machine learning techniques and data sourced from the Demographic and Health Surveys (DHS) program to generate predictive models aimed at evaluating individual and household health risks in Kenya. By analyzing various set of demographic, socio-economic, and health-related indicators, we target to develop reliable predictive models capable of estimating the likelihood of malnutrition, disease prevalence, and various health risks within certain communities. The goal is to provide users such as public health officials with targeted insights. This will enable more effective allotment of resources and interventions. This proactive approach is geared to optimize the impact of health initiatives, allowing for the prioritization and customization of interventions to at risk populations, ultimately contributing to the improvement of health outcomes in Kenya.

## 1.1) Business Problem

Despite existing health interventions, Kenya encounters difficulties in effectively targeting resources and interventions. This will help to address individual and household health risks, including malnutrition, diseases, and other health concerns. This fault highlights the need for a predictive and targeted approach to allocate resources and interventions more effectively. Leveraging machine learning models built upon Demographic and Health Surveys (DHS) data, the project aims to develop predictive models capable of assessing the likelihood of malnutrition, disease prevalence, and health risks based on individual and household characteristics. By accurately identifying at-risk populations, this solution seeks to empower decision-makers and public health officials to allocate resources on need basis, ultimately increasing the impact of health interventions and improving overall health outcomes in Kenya.

## 1.2) Objectives

* Predictive Model Development:

Develop machine learning models to predict health risks (e.g., malnutrition, disease prevalence) based on individual and household characteristics derived from DHS data.
* Feature Engineering and Selection:

Conduct comprehensive feature engineering to extract relevant features from DHS data, considering demographic, socio-economic, and health-related variables.

* Model Interpretability and Explainability:

Enhance model interpretability to provide actionable insights for decision-makers by employing techniques such as SHAP values or feature importance analysis.

* Targeted Intervention Recommendations:

Utilize model predictions to generate targeted recommendations for health interventions and resource allocation in specific Kenyan communities.

* API Deployment and Usability:

Deploy an accessible API interface for stakeholders to input data and receive health risk predictions based on the developed models.

* Impact Assessment and Validation:

Assess the real-world impact of model-guided interventions by monitoring and evaluating changes in health outcomes in targeted Kenyan populations.

## 1.3) Metric of Success

* Achieve a predictive accuracy of at least 90% on unseen validation data.
* Identify and utilize the top 10 most influential features contributing to the models' predictive power.
* Generate clear and interpretable explanations for at least 70% of model predictions.
* Create a prioritized list of actionable recommendations based on identified health risks for at least 100 of communities.
* Ensure an API uptime of at least 90% and gather feedback on usability for further improvements.
* Measure the effectiveness of interventions by observing changes in health indicators, aiming for improvements in at least 80% of targeted communities.

## 1.4) Data Relevance and Validation

## 2.0) Understanding the Data

## 2.1) Reading the Data

### 2.1.1) Installations

In [6]:
# installations
# %pip install requests

### 2.1.2) Importing Relevant Libraries

In [9]:
# importing necessary libraries
import requests, json
import urllib
import urllib.request
import urllib.error
import pandas as pd


### 2.1.3) Reading the Data

In [48]:
# A function to read get json file from API endpoint
def fetch_dhs_data(api_endpoint):
    try:
        response = requests.get(api_endpoint)

        if response.status_code == 200:
            data = response.json()
            return data
        else:
            print(f"Request failed with status code {response.status_code}")
            return None

    except requests.RequestException as e:
        print(f"Request Exception: {e}")
        return None


In [32]:
# Accessing DHS program KE data indicators using API
api_endpoint = 'https://api.dhsprogram.com/rest/dhs/data?breakdown=national&countryIds=KE&lang=en&f=json'

dhs_ke_data_json = fetch_dhs_data(api_endpoint)
if dhs_ke_data_json:
    print("Data retrieved successfully:")
else:
    print("Failed to retrieve data from the API.")

Data retrieved successfully:


In [47]:
# converting the json response into a pandas DataFrame
dhs_data_list = dhs_ke_data_json.get('Data', [])
ke_data = pd.json_normalize(dhs_data_list)
# Setting the maximum number of columns to display in the DataFrame
pd.set_option('display.max_columns', 28)
ke_data

Unnamed: 0,DataId,SurveyId,Indicator,IsPreferred,Value,SDRID,Precision,RegionId,SurveyYearLabel,SurveyType,SurveyYear,IndicatorOrder,DHS_CountryCode,CILow,CountryName,IndicatorType,CharacteristicId,CharacteristicCategory,IndicatorId,CharacteristicOrder,CharacteristicLabel,ByVariableLabel,DenominatorUnweighted,DenominatorWeighted,CIHigh,IsTotal,ByVariableId,LevelRank
0,92840,KE1989DHS,Age specific fertility rate: 10-14,1,2.0,FEFRTRWA10,0,,1989,DHS,1989,11763005,KE,,Kenya,I,1000,Total,FE_FRTR_W_A10,0,Total,,,,,1,0,
1,92842,KE1989DHS,Age specific fertility rate: 15-19,1,153.0,FEFRTRWA15,0,,1989,DHS,1989,11763010,KE,,Kenya,I,1000,Total,FE_FRTR_W_A15,0,Total,,,,,1,0,
2,92844,KE1989DHS,Age specific fertility rate: 20-24,1,324.0,FEFRTRWA20,0,,1989,DHS,1989,11763020,KE,,Kenya,I,1000,Total,FE_FRTR_W_A20,0,Total,,,,,1,0,
3,92858,KE1989DHS,Age specific fertility rate: 25-29,1,301.0,FEFRTRWA25,0,,1989,DHS,1989,11763030,KE,,Kenya,I,1000,Total,FE_FRTR_W_A25,0,Total,,,,,1,0,
4,92859,KE1989DHS,Age specific fertility rate: 30-34,1,243.0,FEFRTRWA30,0,,1989,DHS,1989,11763040,KE,,Kenya,I,1000,Total,FE_FRTR_W_A30,0,Total,,,,,1,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3995,407586,KE1998DHS,DPT 1 vaccination received,0,95.6,CHVACSCDP1,1,,1998,DHS,1998,93836020,KE,,Kenya,I,268002,Source of vaccination information,CH_VACS_C_DP1,268002,Either source,24-35,1040,1021,,1,258002,
3996,610386,KE1998DHS,DPT 2 vaccination received,1,90.0,CHVACSCDP2,1,,1998,DHS,1998,93836030,KE,,Kenya,I,268002,Source of vaccination information,CH_VACS_C_DP2,268002,Either source,12-23,1127,1097,,1,258001,
3997,407587,KE1998DHS,DPT 2 vaccination received,0,90.2,CHVACSCDP2,1,,1998,DHS,1998,93836030,KE,,Kenya,I,268002,Source of vaccination information,CH_VACS_C_DP2,268002,Either source,24-35,1040,1021,,1,258002,
3998,610315,KE1998DHS,DPT 3 vaccination received,1,79.2,CHVACSCDP3,1,,1998,DHS,1998,93836040,KE,,Kenya,I,268002,Source of vaccination information,CH_VACS_C_DP3,268002,Either source,12-23,1127,1097,,1,258001,


Observation: we will work with

In [49]:
# Reading downloaded relevant data
df = pd.read_csv('./data/KEHR8BFL.csv')
df

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,HHID,HV000,HV001,HV002,HV003,HV004,HV005,HV006,HV007,HV008,HV008A,HV009,HV010,HV011,...,SH305D$09,SH305D$10,SH305D$11,SH305E$01,SH305E$02,SH305E$03,SH305E$04,SH305E$05,SH305E$06,SH305E$07,SH305E$08,SH305E$09,SH305E$10,SH305E$11
0,1 4,KE8,1,4,2,1,1306431,4,2022,1468,44676,6,1,1,...,,,,,,,,,,,,,,
1,1 7,KE8,1,7,2,1,1306431,4,2022,1468,44676,3,1,0,...,,,,,,,,,,,,,,
2,1 10,KE8,1,10,1,1,1306431,4,2022,1468,44677,2,1,0,...,,,,,,,,,,,,,,
3,1 13,KE8,1,13,4,1,1306431,4,2022,1468,44676,8,1,2,...,,,,,,,,,,,,,,
4,1 17,KE8,1,17,1,1,1306431,4,2022,1468,44677,3,0,0,...,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14288,650 83,KE8,650,83,1,650,1028861,6,2022,1470,44727,2,1,0,...,,,,0,,,,,,,,,,
14289,650 88,KE8,650,88,1,650,1028861,6,2022,1470,44727,1,0,0,...,,,,,,,,,,,,,,
14290,650 93,KE8,650,93,2,650,1028861,6,2022,1470,44727,3,0,0,...,,,,0,,,,,,,,,,
14291,650 97,KE8,650,97,1,650,1028861,6,2022,1470,44726,3,1,0,...,,,,,,,,,,,,,,


## 2.2) Data Cleaning

## 2.3) EDA

## 2.4) Building Model

## 2.5) Conclusion

## 2.6) Recommendation

## 2.7) Model Deployment