## BUSINESS UNDERSTANDING


**OVERVIEW**

The United States has long been a cornerstone of Kenya’s public health funding, providing about $2.5 billion between 2020 and 2025, with 80% supporting health programs through USAID and PEPFAR. In January 2025, an executive order paused U.S. foreign aid, followed by sweeping budget cuts in July 2025, abruptly freezing most U.S.-funded health programs in Kenya. The impact was immediate: ART clinics closed, community HIV programs were halted, and essential prevention efforts like PrEP distribution and DREAMS support for adolescent girls were suspended. Globally, modeling suggests that sustained aid cuts could cause 10.8 million additional HIV infections and 2.9 million deaths by 2030. Regional data from Mozambique show a 15% rise in new infections and a 10% rise in HIV-related deaths following similar disruptions. The UNAIDS models warn of millions of new infections and deaths if donor funding stops. Kenya — a key PEPFAR partner — is already feeling the strain: clinic closures, staff layoffs, and reduced ART coverage.

**Our goal is to quantify Kenya’s specific impact: How much of this projected global rise in infections and deaths could occur here if aid cuts persist — and to build predictive models that help policymakers act early.**

**BACKGROUND**

Kenya’s success in controlling HIV has been closely tied to external funding, especially through PEPFAR and USAID, which have financed ART programs, health worker salaries, and community prevention initiatives. The 2025 suspension of U.S. aid exposed the country’s heavy reliance on donor support, triggering job losses, service interruptions, and data system breakdowns.
Programs such as DREAMS, which helped keep 66,000 girls HIV-free, were paused, while ART clinics and community outreach services faced closure. These disruptions underscore a broader question of sustainability and resilience in Kenya’s health system. Understanding how changes in foreign aid affect HIV outcomes and the healthcare workforce is vital for developing adaptive, evidence-based funding strategies that can protect future public health gains.

**KEY OBJECTIVES - Quantifying Kenya’s share of the global HIV impact**

**•	Kenya’s Projected Impact**

If global modeling predicts millions of new infections and deaths, what proportion of this burden might occur in Kenya? We will use a Time-Series Analysis – to track Kenya’s HIV trends (testing, ART coverage, mortality) before and after funding shifts.

**•	Aid–Outcome Relationships**

How have changes in U.S. funding levels historically correlated with:HIV testing rates, ART coverage, AIDS-related mortality in Kenya? We will use regression models (Multiple Linear, Ridge) – to estimate how much HIV outcomes change per unit drop in aid funding.

**•	County-Level Vulnerability**

Which counties or regions in Kenya are most dependent on donor funding, and therefore most vulnerable when aid is suspended? We will use clustering (K-Means) – to group counties based on aid dependency, workforce reliance, and health outcome sensitivity.

**•	Future Scenario Forecasting**

If foreign aid cuts persist or deepen:
1. How many new HIV infections could occur in Kenya (2025–2029)?
2. How many additional new infections per day compared to current trends?
3. How many AIDS-related deaths might result?
4. How many new child infections, child deaths, and orphans could emerge?
We will use predictive modeling (Random Forest, Gradient Boosting) – to simulate Kenya’s future infection and death counts under different funding scenarios.

**SUCCESS METRICS**

Success will be defined through a mix of technical, analytical, and policy outcomes:
1.	**Model Accuracy:** Achieve ≥80% predictive accuracy (R² ≥ 0.8) in forecasting HIV infections, deaths, and ART coverage under various funding scenarios.
2.	**Data Quality:** Build a clean, verified, and reproducible dataset integrating aid, workforce, and HIV outcome data.
3.	**Insight Clarity:** Produce analyses that clearly demonstrate relationships between donor funding changes and health outcomes.
4.	**Policy Relevance:** Deliver actionable recommendations for the Ministry of Health, donors, and county health systems.
5.	**Scalability:** Ensure the framework is modular and reusable, allowing integration of new data sources such as PEPFAR, World Bank, and Kenya Health Data Portal datasets.

**KEY STAKEHOLDERS**

1. **Kenya Ministry of Health (MOH)** – For strategic planning, resource allocation, and health workforce deployment.
2. **PEPFAR, USAID, and Global Fund** – For evaluating funding effectiveness and sustainability.
3. **County Governments** – For identifying vulnerable regions and planning localized responses.
4. **Local NGOs and Civil Society** – For evidence-based advocacy and program continuity.
5. **Data Scientists and Researchers** – For advancing models that link foreign aid dynamics to public health outcomes.

**RELEVANCE TO KENYA**

This project is vital for Kenya’s public health resilience and policy planning. By quantifying how fluctuations in donor aid influence HIV outcomes and healthcare workforce stability, the analysis will help policymakers design sustainable, data-driven funding frameworks. The findings will inform strategies to maintain critical health services, reduce dependency on external aid, and safeguard Kenya’s progress toward ending the HIV epidemic.


## DATA UNDERSTANDING

In [1]:
# ======= [Import all relevant libraries] =======

# Utilities
import warnings
warnings.filterwarnings('ignore')

# Usual Suspects
import numpy as np           # Mathematical operations
import pandas as pd          # Data manipulation

# Visualization
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')
import seaborn as sns
from wordcloud import WordCloud         # Word Cloud
from wordcloud import STOPWORDS
from itertools import cycle

# String manipulation
import re

# Counting items
from collections import Counter

# NLP
import nltk
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('punkt_tab')

from nltk.tokenize import RegexpTokenizer           # Tokenization
from nltk.corpus import stopwords                   # Stopwords
from nltk.stem import WordNetLemmatizer             # Lemmatization
from nltk import ngrams                             # N-Grams Analysis
from symspellpy import SymSpell, Verbosity

# Vectorization
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

# Pipelines
from sklearn.pipeline import Pipeline
from imblearn.pipeline import Pipeline as ImbPipeline

# ML
from sklearn.preprocessing import LabelEncoder, label_binarize , StandardScaler         # Encoding and scaling
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.decomposition import TruncatedSVD                                          # Dimensionality reduction
from sklearn.naive_bayes import MultinomialNB                                           # Naive Bayes
from sklearn.linear_model import LogisticRegression                                     # Logistic Regression
from sklearn.tree import DecisionTreeClassifier, plot_tree                              # Decision Tree
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb
from xgboost.sklearn import XGBClassifier

# ML Model Evaluation
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score, 
    ConfusionMatrixDisplay, confusion_matrix,
    roc_curve, auc, roc_auc_score,
    classification_report
)

# Handle class imbalance
from imblearn.over_sampling import SMOTE

# Model interpretability
from lime import lime_tabular

# Set column display to maximum
pd.set_option('display.max_colwidth', None)

# Additional pachakages
# !pip install lime
#!pip install symspellpy

OSError: 'seaborn-v0_8-whitegrid' not found in the style library and input is not a valid URL or path; see `style.available` for list of available styles