## **AUTHORS**

1. Pauline Kariuki
2. Norman Mwapea
3. Angela Chesire
4. Carlton Ogolla
5. Emmanuel Chol

## 1.BUSINESS UNDERSTANDING


### **1.1 OVERVIEW**

The United States has long been a cornerstone of Kenya’s public health funding, providing about $2.5 billion between 2020 and 2025, with 80% supporting health programs through USAID and PEPFAR. In January 2025, an executive order paused U.S. foreign aid, followed by sweeping budget cuts in July 2025, abruptly freezing most U.S.-funded health programs in Kenya. The impact was immediate: ART clinics closed, community HIV programs were halted, and essential prevention efforts like PrEP distribution and DREAMS support for adolescent girls were suspended. Globally, modeling suggests that sustained aid cuts could cause 10.8 million additional HIV infections and 2.9 million deaths by 2030. Regional data from Mozambique show a 15% rise in new infections and a 10% rise in HIV-related deaths following similar disruptions. The UNAIDS models warn of millions of new infections and deaths if donor funding stops. Kenya — a key PEPFAR partner — is already feeling the strain: clinic closures, staff layoffs, and reduced ART coverage.

**Our goal is to quantify Kenya’s specific impact: How much of this projected global rise in infections and deaths could occur here if aid cuts persist — and to build predictive models that help policymakers act early.**

### **1.2 BACKGROUND**

Kenya’s success in controlling HIV has been closely tied to external funding, especially through PEPFAR and USAID, which have financed ART programs, health worker salaries, and community prevention initiatives. The 2025 suspension of U.S. aid exposed the country’s heavy reliance on donor support, triggering job losses, service interruptions, and data system breakdowns.
Programs such as DREAMS, which helped keep 66,000 girls HIV-free, were paused, while ART clinics and community outreach services faced closure. These disruptions underscore a broader question of sustainability and resilience in Kenya’s health system. Understanding how changes in foreign aid affect HIV outcomes and the healthcare workforce is vital for developing adaptive, evidence-based funding strategies that can protect future public health gains.

### **1.3 KEY OBJECTIVES - Quantifying Kenya’s share of the global HIV impact**

**•	Kenya’s Projected Impact**

If global modeling predicts millions of new infections and deaths, what proportion of this burden might occur in Kenya? We will use a Time-Series Analysis – to track Kenya’s HIV trends (testing, ART coverage, mortality) before and after funding shifts.

**•	Aid–Outcome Relationships**

How have changes in U.S. funding levels historically correlated with:HIV testing rates, ART coverage, AIDS-related mortality in Kenya? We will use regression models (Multiple Linear, Ridge) – to estimate how much HIV outcomes change per unit drop in aid funding.

**•	County-Level Vulnerability**

Which counties or regions in Kenya are most dependent on donor funding, and therefore most vulnerable when aid is suspended? We will use clustering (K-Means) – to group counties based on aid dependency, workforce reliance, and health outcome sensitivity.

**•	Future Scenario Forecasting**

If foreign aid cuts persist or deepen:
1. How many new HIV infections could occur in Kenya (2025–2029)?
2. How many additional new infections per day compared to current trends?
3. How many AIDS-related deaths might result?
4. How many new child infections, child deaths, and orphans could emerge?
We will use predictive modeling (Random Forest, Gradient Boosting) – to simulate Kenya’s future infection and death counts under different funding scenarios.

### **1.4 SUCCESS METRICS**

Success will be defined through a mix of technical, analytical, and policy outcomes:
1.	**Model Accuracy:** Achieve ≥80% predictive accuracy (R² ≥ 0.8) in forecasting HIV infections, deaths, and ART coverage under various funding scenarios.
2.	**Data Quality:** Build a clean, verified, and reproducible dataset integrating aid, workforce, and HIV outcome data.
3.	**Insight Clarity:** Produce analyses that clearly demonstrate relationships between donor funding changes and health outcomes.
4.	**Policy Relevance:** Deliver actionable recommendations for the Ministry of Health, donors, and county health systems.
5.	**Scalability:** Ensure the framework is modular and reusable, allowing integration of new data sources such as PEPFAR, World Bank, and Kenya Health Data Portal datasets.

### **1.5 KEY STAKEHOLDERS**

1. **Kenya Ministry of Health (MOH)** – For strategic planning, resource allocation, and health workforce deployment.
2. **PEPFAR, USAID, and Global Fund** – For evaluating funding effectiveness and sustainability.
3. **County Governments** – For identifying vulnerable regions and planning localized responses.
4. **Local NGOs and Civil Society** – For evidence-based advocacy and program continuity.
5. **Data Scientists and Researchers** – For advancing models that link foreign aid dynamics to public health outcomes.

### **1.6 RELEVANCE TO KENYA**

This project is vital for Kenya’s public health resilience and policy planning. By quantifying how fluctuations in donor aid influence HIV outcomes and healthcare workforce stability, the analysis will help policymakers design sustainable, data-driven funding frameworks. The findings will inform strategies to maintain critical health services, reduce dependency on external aid, and safeguard Kenya’s progress toward ending the HIV epidemic.


## 2. DATA UNDERSTANDING

In [316]:
# ------- [Import all relevant libraries] -------

# Utilities
import warnings
warnings.filterwarnings('ignore')

# Usual Suspects
import numpy as np           # Mathematical operations
import pandas as pd          # Data manipulation

# Visualization
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')
import seaborn as sns

# String manipulation
import re

# Counting items
from collections import Counter

# Pipelines
from sklearn.pipeline import Pipeline
from imblearn.pipeline import Pipeline as ImbPipeline

# ML
from sklearn.preprocessing import LabelEncoder, label_binarize , StandardScaler         # Encoding and scaling
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.decomposition import TruncatedSVD                                          # Dimensionality reduction
from sklearn.naive_bayes import MultinomialNB                                           # Naive Bayes
from sklearn.linear_model import LogisticRegression                                     # Logistic Regression
from sklearn.tree import DecisionTreeClassifier, plot_tree                              # Decision Tree
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb
from xgboost.sklearn import XGBClassifier

# ML Model Evaluation
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score, 
    ConfusionMatrixDisplay, confusion_matrix,
    roc_curve, auc, roc_auc_score,
    classification_report
)

# Handle class imbalance
from imblearn.over_sampling import SMOTE

# Model interpretability
from lime import lime_tabular

# Set column display to maximum
pd.set_option('display.max_colwidth', None)

In [317]:
# Loading the data set into a data frame
df = pd.read_csv("../Raw Data/usaid_kenya.csv")

# Displaying first 5 rows of the data
df

Unnamed: 0,Country ID,Country Code,Country Name,Region ID,Region Name,Income Group ID,Income Group Name,Income Group Acronym,Managing Agency ID,Managing Agency Acronym,...,Transaction Type ID,Transaction Type Name,Fiscal Year,Transaction Date,Current Dollar Amount,Constant Dollar Amount,aid_type_id,aid_type_name,activity_budget_amount,submission_activity_id
0,404,KEN,Kenya,5,Sub-Saharan Africa,2.0,Lower Middle Income Country,LMIC,1,USAID,...,2,Obligations,2005,30SEP2005,28000,42057,8,Project-type interventions - not Investment Related,.,26757
1,404,KEN,Kenya,5,Sub-Saharan Africa,2.0,Lower Middle Income Country,LMIC,1,USAID,...,3,Disbursements,2005,30SEP2005,17875,26849,8,Project-type interventions - not Investment Related,.,26757
2,404,KEN,Kenya,5,Sub-Saharan Africa,2.0,Lower Middle Income Country,LMIC,1,USAID,...,3,Disbursements,2006,01FEB2006,3469,5047,8,Project-type interventions - not Investment Related,.,26757
3,404,KEN,Kenya,5,Sub-Saharan Africa,2.0,Lower Middle Income Country,LMIC,1,USAID,...,3,Disbursements,2006,01APR2006,1138,1655,8,Project-type interventions - not Investment Related,.,26757
4,404,KEN,Kenya,5,Sub-Saharan Africa,2.0,Lower Middle Income Country,LMIC,1,USAID,...,3,Disbursements,2006,01MAY2006,394,573,8,Project-type interventions - not Investment Related,.,26757
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80067,404,KEN,Kenya,5,Sub-Saharan Africa,2.0,Lower Middle Income Country,LMIC,29,EPA,...,2,Obligations,2024,19AUG2024,64250,62579,13,Technical Cooperation - Other,614819,1121
80068,404,KEN,Kenya,5,Sub-Saharan Africa,2.0,Lower Middle Income Country,LMIC,38,DFC,...,2,Obligations,2020,25SEP2020,1146864,1327370,13,Technical Cooperation - Other,1146864,276
80069,404,KEN,Kenya,5,Sub-Saharan Africa,2.0,Lower Middle Income Country,LMIC,38,DFC,...,2,Obligations,2021,28SEP2021,282104,315641,13,Technical Cooperation - Other,282104,267
80070,404,KEN,Kenya,5,Sub-Saharan Africa,2.0,Lower Middle Income Country,LMIC,38,DFC,...,2,Obligations,2024,02JAN2024,500000,486994,13,Technical Cooperation - Other,500000,235


##### *Observation:* From top to bottom, the data is uniform.

Our goal is to check how foreign aid cuts will affect Kenya from the time of this analysis - 2025, to the future. As such, we will check how recent and old our data is.

In [318]:
# Check for the shape of our data
print(f"The data has {df.shape[0]} entries and {df.shape[1]} features")

The data has 80072 entries and 56 features


In [319]:
# Check column names 
df.columns

Index(['Country ID', 'Country Code', 'Country Name', 'Region ID',
       'Region Name', 'Income Group ID', 'Income Group Name',
       'Income Group Acronym', 'Managing Agency ID', 'Managing Agency Acronym',
       'Managing Agency Name', 'Managing Sub-agency or Bureau ID',
       'Managing Sub-agency or Bureau Acronym',
       'Managing Sub-agency or Bureau Name',
       'Implementing Partner Category ID',
       'Implementing Partner Category Name',
       'Implementing Partner Sub-category ID',
       'Implementing Partner Sub-category Name', 'Implementing Partner ID',
       'Implementing Partner Name', 'International Category ID',
       'International Category Name', 'International Sector Code',
       'International Sector Name', 'International Purpose Code',
       'International Purpose Name', 'US Category ID', 'US Category Name',
       'US Sector ID', 'US Sector Name', 'Funding Account ID',
       'Funding Account Name', 'Funding Agency ID', 'Funding Agency Name',
       'Fu

##### *Observation:* The column names as inconsistent. They contain a mix of snake and camel case. There is need for standardization.

In [320]:
# Standardize column names so they are more intuitive
df.columns = (
    df.columns
    .str.strip()                     # remove leading/trailing spaces
    .str.lower()                     # make all lowercase
    .str.replace(' ', '_')           # replace spaces with underscores
    .str.replace('[^0-9a-zA-Z_]', '', regex=True)  # remove special characters
)

df.columns

Index(['country_id', 'country_code', 'country_name', 'region_id',
       'region_name', 'income_group_id', 'income_group_name',
       'income_group_acronym', 'managing_agency_id', 'managing_agency_acronym',
       'managing_agency_name', 'managing_subagency_or_bureau_id',
       'managing_subagency_or_bureau_acronym',
       'managing_subagency_or_bureau_name', 'implementing_partner_category_id',
       'implementing_partner_category_name',
       'implementing_partner_subcategory_id',
       'implementing_partner_subcategory_name', 'implementing_partner_id',
       'implementing_partner_name', 'international_category_id',
       'international_category_name', 'international_sector_code',
       'international_sector_name', 'international_purpose_code',
       'international_purpose_name', 'us_category_id', 'us_category_name',
       'us_sector_id', 'us_sector_name', 'funding_account_id',
       'funding_account_name', 'funding_agency_id', 'funding_agency_name',
       'funding_agency

In [321]:
# Checking for the data types and metadata
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80072 entries, 0 to 80071
Data columns (total 56 columns):
 #   Column                                 Non-Null Count  Dtype  
---  ------                                 --------------  -----  
 0   country_id                             80072 non-null  int64  
 1   country_code                           80072 non-null  object 
 2   country_name                           80072 non-null  object 
 3   region_id                              80072 non-null  int64  
 4   region_name                            80072 non-null  object 
 5   income_group_id                        80072 non-null  float64
 6   income_group_name                      80072 non-null  object 
 7   income_group_acronym                   80072 non-null  object 
 8   managing_agency_id                     80072 non-null  int64  
 9   managing_agency_acronym                80072 non-null  object 
 10  managing_agency_name                   80072 non-null  object 
 11  ma

##### *Observations:* 
1. The data is mostly categorical.  
2. It has some notable missing values.
3. Some columns will need typecasting such as:
    - fiscal_year to integer
    - transaction_date, activity_start_date, activity_end_date to datetime
    - activity_budget_amount to float
    - current_dollar_amount, constant_dollar_amount to float
    - international_sector_code, international_purpose_code to string
    - activity_project_number to string
4. There exist a number of unique identifiers. These offer little analytical value and will thus be dropped.

In [322]:
# ------- [Type Casting] -------
# Convert columns to their correct data types for analysis

# Fiscal year -> convert from object to integer
df['fiscal_year'] = pd.to_numeric(df['fiscal_year'], errors='coerce').astype('Int64')

# Date columns -> convert to datetime format
date_cols = ['transaction_date', 'activity_start_date', 'activity_end_date']
for col in date_cols:
    df[col] = pd.to_datetime(df[col], errors='coerce')

# Numeric conversion for budget amount
df['activity_budget_amount'] = pd.to_numeric(df['activity_budget_amount'], errors='coerce')

# Financial columns -> cast to float for consistency in modeling
df['current_dollar_amount'] = df['current_dollar_amount'].astype(float)
df['constant_dollar_amount'] = df['constant_dollar_amount'].astype(float)

In [323]:
# Summary statistics for numeric columns
df.describe().T

Unnamed: 0,count,mean,min,25%,50%,75%,max,std
country_id,80072.0,404.0,404.0,404.0,404.0,404.0,404.0,0.0
region_id,80072.0,5.0,5.0,5.0,5.0,5.0,5.0,0.0
income_group_id,80072.0,2.0,2.0,2.0,2.0,2.0,2.0,0.0
managing_agency_id,80072.0,2.106417,1.0,1.0,1.0,1.0,38.0,3.373374
managing_subagency_or_bureau_id,80072.0,63.601009,2.0,19.0,19.0,21.0,999.0,181.821798
implementing_partner_category_id,80072.0,4.949258,1.0,3.0,4.0,8.0,8.0,2.931494
implementing_partner_subcategory_id,80072.0,11.125662,1.0,5.0,8.0,19.0,20.0,7.817254
implementing_partner_id,80072.0,3449361.432848,1000001.0,3990008.0,4000011.0,4001154.0,4021430.0,1144963.638087
international_category_id,80072.0,5.056574,1.0,2.0,3.0,9.0,10.0,3.282771
international_sector_code,80072.0,373.995342,111.0,134.0,152.0,720.0,998.0,306.316465


In [324]:
# Checking statistical summary for categorical variables
df.describe(include='object').T

Unnamed: 0,count,unique,top,freq
country_code,80072,1,KEN,80072
country_name,80072,1,Kenya,80072
region_name,80072,1,Sub-Saharan Africa,80072
income_group_name,80072,1,Lower Middle Income Country,80072
income_group_acronym,80072,1,LMIC,80072
managing_agency_acronym,80072,20,USAID,63735
managing_agency_name,80072,20,U.S. Agency for International Development,63735
managing_subagency_or_bureau_acronym,76894,63,AFR,53601
managing_subagency_or_bureau_name,80072,67,Bureau for Africa,52879
implementing_partner_category_name,80072,8,Enterprises,34906


In [325]:
# Print unique value counts for each column
for col in df.columns:
    print(f"{col}: {df[col].nunique()} unique values")

country_id: 1 unique values
country_code: 1 unique values
country_name: 1 unique values
region_id: 1 unique values
region_name: 1 unique values
income_group_id: 1 unique values
income_group_name: 1 unique values
income_group_acronym: 1 unique values
managing_agency_id: 20 unique values
managing_agency_acronym: 20 unique values
managing_agency_name: 20 unique values
managing_subagency_or_bureau_id: 67 unique values
managing_subagency_or_bureau_acronym: 63 unique values
managing_subagency_or_bureau_name: 67 unique values
implementing_partner_category_id: 8 unique values
implementing_partner_category_name: 8 unique values
implementing_partner_subcategory_id: 17 unique values
implementing_partner_subcategory_name: 17 unique values
implementing_partner_id: 888 unique values
implementing_partner_name: 888 unique values
international_category_id: 10 unique values
international_category_name: 10 unique values
international_sector_code: 30 unique values
international_sector_name: 30 unique valu

In [326]:
# Print the unique values themselves
for col in df.columns:
    print(f"\n{col}: {df[col].unique()}")


country_id: [404]

country_code: ['KEN']

country_name: ['Kenya']

region_id: [5]

region_name: ['Sub-Saharan Africa']

income_group_id: [2.]

income_group_name: ['Lower Middle Income Country']

income_group_acronym: ['LMIC']

managing_agency_id: [ 1  2  3  4  5  6  7  9 10 11 12 13 14 15 16 17 19 20 29 38]

managing_agency_acronym: ['USAID' 'STATE' 'MCC' 'TREAS' 'AGR' 'HHS' 'DOD' 'DOI' 'DOJ' 'DOL' 'DOC'
 'DOE' 'DHS' 'DOT' 'PC' 'TDA' 'ADF' 'FTC' 'EPA' 'DFC']

managing_agency_name: ['U.S. Agency for International Development' 'Department of State'
 'Millennium Challenge Corporation' 'Department of the Treasury'
 'Department of Agriculture' 'Department of Health and Human Services'
 'Department of Defense' 'Department of the Interior'
 'Department of Justice' 'Department of Labor' 'Department of Commerce'
 'Department of Energy' 'Department of Homeland Security'
 'Department of Transportation' 'Peace Corps'
 'Trade and Development Agency' 'African Development Foundation'
 'Federal Trade

## 3. DATA PREPARATION 

Data preparation will majorly entail two parts:
1. Data wrangling which will entail checking for and removing missing values and duplicates and
2. Feature engineering.

Before that, however, we will check how old and recent our data is and filter to only have data from the present (2025) to 15 years in the past.

In [327]:
# Check the data recency
fiscal_min, fiscal_max = df['fiscal_year'].agg(['min', 'max'])
print(f"Data covers fiscal years from {fiscal_min} to {fiscal_max}.")

Data covers fiscal years from 1954 to 2025.


We have 71 years worth of data! We don't as much so we will first create a copy of the data to avoid modifying the original then filter to have our target data.

In [328]:
# Create a copy of the data
data = df.copy(deep=True)

# Filter the data to include only years from 2010 to 2025
data = data[(data['fiscal_year'] >= 2010) & (data['fiscal_year'] <= 2025)]

# Sanity check
fiscal_min, fiscal_max = data['fiscal_year'].agg(['min', 'max'])
print(f"Filtered data covers fiscal years from {fiscal_min} to {fiscal_max}.")

Filtered data covers fiscal years from 2010 to 2025.


Perfect. Now we begin the tedious work that is cleaning.

### 3.1 Data Cleaning

#### 3.1.1 Handling Duplicate Values

We will start by confirming the number of duplicates.

In [329]:
# Checking for duplicate values
print("Duplicate records:", data.duplicated().sum())

Duplicate records: 6


There are 6 duplicates. We will drop them and preview our changes.

In [330]:
# Drop duplicates
data.drop_duplicates(inplace=True)

# Sanity check
print("Duplicates after cleaning:", data.duplicated().sum())

Duplicates after cleaning: 0


#### 3.1.2 Handling Missing Values

In [331]:
# Checking for missing values
data.isna().sum()

country_id                                   0
country_code                                 0
country_name                                 0
region_id                                    0
region_name                                  0
income_group_id                              0
income_group_name                            0
income_group_acronym                         0
managing_agency_id                           0
managing_agency_acronym                      0
managing_agency_name                         0
managing_subagency_or_bureau_id              0
managing_subagency_or_bureau_acronym      2434
managing_subagency_or_bureau_name            0
implementing_partner_category_id             0
implementing_partner_category_name           0
implementing_partner_subcategory_id          0
implementing_partner_subcategory_name        0
implementing_partner_id                      0
implementing_partner_name                    0
international_category_id                    0
international

In [332]:
# Drop unique identifier columns
data = data.loc[:, ~data.columns.str.contains('id', case=False)]

# Drop columns that offer little analytical value
redundant_cols = [
    'country_code', 'region_name', 'income_group_name', 'income_group_acronym',
    'international_sector_code', 'international_purpose_code',
    'activity_project_number', 'activity_name', 'activity_description',
    'funding_account_name', 'managing_agency_acronym', 'funding_agency_acronym',
    'international_sector_name', 'international_purpose_name', 
    'implementing_partner_subcategory_name', 'international_category_name',
]

data.drop(columns=redundant_cols, inplace=True)

Since we already have the managing subagency or bureau name, the managing subagency or bureau acronym is redundant. We will drop this column.

In [333]:
# Drop managing_subagency_or_bureau_acronym
data.drop(columns='managing_subagency_or_bureau_acronym',inplace=True)

In [334]:
data.isna().sum()

country_name                              0
managing_agency_name                      0
managing_subagency_or_bureau_name         0
implementing_partner_category_name        0
implementing_partner_name                 0
us_category_name                          0
us_sector_name                            0
funding_agency_name                       0
foreign_assistance_objective_name         0
activity_start_date                   48808
activity_end_date                     38114
transaction_type_name                     0
fiscal_year                               0
transaction_date                       2044
current_dollar_amount                     0
constant_dollar_amount                    0
activity_budget_amount                40390
dtype: int64

From our Data Understanding we found out that the count for highest unique value was a (.), we will fill this with Nan.

In [335]:
# Fill (.) in activity_budget_amount with null
data['activity_budget_amount'] = data['activity_budget_amount'].replace('.', np.nan)

# Sanity check
print("Nulls in activity_budget_amount:", data['activity_budget_amount'].isna().sum())

Nulls in activity_budget_amount: 40390


Dealing with missing values in activity_budget_amount.

In [99]:
import pandas as pd
import numpy as np
from sklearn.impute import KNNImputer
from sklearn.preprocessing import OrdinalEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

# def hybrid_budget_imputation(data):
#     data = data.copy()
    
#     # Step 1: Group Median Imputation
#     group_cols = ['funding_agency_name', 'us_sector_name']
#     df['activity_budget_amount'] = (
#         df.groupby(group_cols)['activity_budget_amount']
#           .transform(lambda x: x.fillna(x.median()))
#     )

#     # Identify remaining missing values
#     remaining_missing = df['activity_budget_amount'].isna().sum()
#     print(f"Missing after group imputation: {remaining_missing}")

# data = hybrid_budget_imputation(data)

def hybrid_budget_imputation(data):
    data = data.copy()
    
    # Step 1: Group Median Imputation
    group_cols = ['funding_agency_name', 'us_sector_name']
    data['activity_budget_amount'] = (
        data.groupby(group_cols)['activity_budget_amount']
            .transform(lambda x: x.fillna(x.median()))
    )

    # Identify remaining missing values
    remaining_missing = data['activity_budget_amount'].isna().sum()
    print(f"Missing after group imputation: {remaining_missing}")

    return data

data = hybrid_budget_imputation(data)

Missing after group imputation: 6964


In [None]:
    # Step 2: Prepare data for KNN Imputation
    cat_cols = ['funding_agency_name', 'us_sector_name', 
                'foreign_assistance_objective_name', 'implementing_partner_category_name']
    num_cols = ['activity_budget_amount', 'current_dollar_amount', 
                'constant_dollar_amount', 'fiscal_year']

    # Encode categorical columns
    encoder = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1)
    df[cat_cols] = encoder.fit_transform(df[cat_cols])

    # Apply KNN Imputer
    imputer = KNNImputer(n_neighbors=5)
    df[num_cols] = imputer.fit_transform(df[num_cols])

    print("✅ KNN imputation completed.")
    
    return df, imputer, encoder


def evaluate_knn_imputer(df, imputer, encoder, mask_fraction=0.1):
    """
    Randomly mask a fraction of known values and compare KNN-imputed estimates
    to true values using Mean Absolute Error (MAE).
    """
    df = df.copy()

    # Prepare encoded + numeric data for evaluation
    cat_cols = ['funding_agency_name', 'us_sector_name', 
                'foreign_assistance_objective_name', 'implementing_partner_category_name']
    num_cols = ['activity_budget_amount', 'current_dollar_amount', 
                'constant_dollar_amount', 'fiscal_year']

    df[cat_cols] = encoder.transform(df[cat_cols])
    X = df[num_cols].values

    # Mask a fraction of activity_budget_amount values
    rng = np.random.default_rng(42)
    mask = rng.choice([True, False], size=len(X), p=[mask_fraction, 1 - mask_fraction])

    X_masked = X.copy()
    X_masked[mask, 0] = np.nan  # mask only the target column

    # Impute
    X_imputed = imputer.transform(X_masked)

    # Compute MAE for imputed values
    mae = mean_absolute_error(X[mask, 0], X_imputed[mask, 0])
    print(f"📊 KNN Imputer MAE on simulated missing values: {mae:.2f}")

    return mae

The number of missing values is too high. Imputation strategies such as mean or median will introduce bias and we will data integrity. 

##### From our above prompt to check for missing values, we also noticed that there were columns that also had too many missing values 

##### They included:'Activity Start Date', 'Activity End Date','Activity Project Number'. These columns had more than half of their data missing. So we will drop this colummns.

In [15]:
cols_to_drop = ['Activity Start Date', 'Activity End Date', 'Activity Project Number','activity_budget_amount']
data.drop(columns = cols_to_drop, inplace = True)

##### With this columns dropped, now we can deal with other columns that had missing values but were too little to make us drop them and or may be too important to drop.

##### For this, we have the column 'Transaction Date'.It has 5474 rows missing but contains dates which may be critical when conducting our Time Series Model. For our data we will fill these date with the Fiscal year but we will keep in mind that the dates are not accurate.

In [17]:
# Clean the Fiscal Year column first
data['Fiscal Year'] = data['Fiscal Year'].astype(str).str.extract(r'(\d{4})', expand=False)

# Coverting Transaction Date to datetime
data['Transaction Date'] = pd.to_datetime(data['Transaction Date'], errors='coerce')

# Appending Fiscal Year with 01-01
data['Fiscal Year'] = pd.to_datetime(data['Fiscal Year'] + '-01-01', errors='coerce')

# Filling missing Transaction Date  missing values
data['Transaction Date'] = data['Transaction Date'].fillna(data['Fiscal Year'])

In [18]:
# Checking if there are any missing values left in Transaction Date
data['Transaction Date'].isna().sum()

0

In [19]:
# Converting back to the year
data['Fiscal Year'] = pd.to_datetime(data['Fiscal Year'])
data['Fiscal Year']=data['Fiscal Year'].dt.year

In [20]:
# Checking for missing values
data.isna().sum()

Country ID                                0
Country Code                              0
Country Name                              0
Region ID                                 0
Region Name                               0
Income Group ID                           0
Income Group Name                         0
Income Group Acronym                      0
Managing Agency ID                        0
Managing Agency Acronym                   0
Managing Agency Name                      0
Managing Sub-agency or Bureau ID          0
Managing Sub-agency or Bureau Name        0
Implementing Partner Category ID          0
Implementing Partner Category Name        0
Implementing Partner Sub-category ID      0
Implementing Partner Sub-category Name    0
Implementing Partner ID                   0
Implementing Partner Name                 0
International Category ID                 0
International Category Name               0
International Sector Code                 0
International Sector Name       

##### We may also remove columns that may be redundant for our analysis. These are columns that give no information differentiating other columns. A starting point may be columns with only one unique value for example we may not need a column for 'Country Name' since our data contains information about Kenya only.

##### We will also get rid of columns that may contain similar information like Managing Agency Name and Managing Sub-agency or Bureau Name

In [24]:
# Save the file as a csv document
data1 = pd.DataFrame(data)
data1.to_csv('cleaned_data',index=False)

### 3.2 Feature Engineering

#### We could convert the Current Dollar Amount into Categorical data which may be simpler to use during modelling

In [25]:
data['Transaction_Size']=pd.qcut(data['Current Dollar Amount'],\
                                 q=3, labels=['Low','Medium','High'] )

In [26]:
data.head()

Unnamed: 0,Managing Agency Name,Implementing Partner Category Name,Implementing Partner Name,US Category Name,US Sector Name,Funding Agency Name,Foreign Assistance Objective Name,Aid Type Group Name,Activity ID,Activity Name,Activity Description,Transaction Type Name,Fiscal Year,Transaction Date,Current Dollar Amount,Constant Dollar Amount,aid_type_name,Transaction_Size
0,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Basic Education,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Obligations,2005,2005-09-30,28000,42057,Project-type interventions - not Investment Related,Medium
1,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Basic Education,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2005,2005-09-30,17875,26849,Project-type interventions - not Investment Related,Medium
2,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Basic Education,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-02-01,3469,5047,Project-type interventions - not Investment Related,Low
3,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Basic Education,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-04-01,1138,1655,Project-type interventions - not Investment Related,Low
4,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Basic Education,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-05-01,394,573,Project-type interventions - not Investment Related,Low


In [27]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 80066 entries, 0 to 80071
Data columns (total 18 columns):
 #   Column                              Non-Null Count  Dtype         
---  ------                              --------------  -----         
 0   Managing Agency Name                80066 non-null  object        
 1   Implementing Partner Category Name  80066 non-null  object        
 2   Implementing Partner Name           80066 non-null  object        
 3   US Category Name                    80066 non-null  object        
 4   US Sector Name                      80066 non-null  object        
 5   Funding Agency Name                 80066 non-null  object        
 6   Foreign Assistance Objective Name   80066 non-null  object        
 7   Aid Type Group Name                 80066 non-null  object        
 8   Activity ID                         80066 non-null  int64         
 9   Activity Name                       80066 non-null  object        
 10  Activity Description       

In [28]:
data['Foreign Assistance Objective Name'].unique()

array(['Economic', 'Military'], dtype=object)

##### We could change the name of columns like 'Foreign Assistance Objective Name' to objective in short for it to be simpler to deal with.

In [29]:
# Renaming the column
data = data.rename(columns={'Foreign Assistance Objective Name': 'Objective'})

In [30]:
data['US Sector Name'].unique()

array(['Basic Education', 'Other Public Health Threats',
       'Pandemic Influenza and Other Emerging Threats (PIOET)',
       'Direct Administrative Costs', 'Malaria',
       'Maternal and Child Health',
       'Family Planning and Reproductive Health', 'HIV/AIDS',
       'Water Supply and Sanitation', 'Good Governance', 'Civil Society',
       'Rule of Law and Human Rights',
       'Political Competition and Consensus-Building',
       'Counter-Terrorism', 'Conflict Mitigation and Reconciliation',
       'Policies, Regulations, and Systems', 'Social Assistance',
       'Financial Sector', 'Economic Opportunity',
       'Private Sector Competitiveness', 'Trade and Investment',
       'Agriculture', 'Natural Resources and Biodiversity',
       'Clean Productive Environment',
       'Protection, Assistance and Solutions',
       'Monitoring and Evaluation', 'Macroeconomic Foundation for Growth',
       'Tuberculosis', 'Nutrition', 'Higher Education',
       'Disaster Readiness', 'Socia

In [31]:
data['US Sector Name'] = data['US Sector Name'].str.strip().str.lower()
mapping1 = {
    'Health': [
        'Other Public Health Threats', 'Pandemic Influenza and Other Emerging Threats (PIOET)',
        'Malaria', 'Maternal and Child Health', 'Family Planning and Reproductive Health',
        'HIV/AIDS', 'Social Assistance' , 'Water Supply and Sanitation', 'Tuberculosis', 'Nutrition', 'Health - General'
    ],
    'Education': [
        'Basic Education', 'Higher Education', 'Education and Social Services - General'
    ],
    'Security': [
        'Counter-Terrorism', 'Conflict Mitigation and Reconciliation', 'Transnational Crime',
        'Stabilization Operations and Security Sector Reform', 'Peace and Security - General',
        'Counter-Narcotics', 'Combating Weapons of Mass Destruction (WMD)'
    ],
    'Politics': [
        'Good Governance', 'Civil Society', 'Political Competition and Consensus-Building',
        'Democracy, Human Rights, and Governance - General'
    ],
    'Human Rights': [
        'Rule of Law and Human Rights', 'Protection, Assistance and Solutions',
        'Migration Management'
    ],
    'Environment': [
        'Natural Resources and Biodiversity', 'Clean Productive Environment',
        'Environment - General', 'Environment', 'Mining and Natural Resources'
    ],
    'Agriculture': [
        'Agriculture'
    ],
    'Economy': [
        'Economic Opportunity'
    ],
    'Development': [
        'Infrastructure'
    ]
}
# This is a code that categorises every value with the category given in the map
def assign_category(text_entry):

    # Handle empty or non-string data
    if not isinstance(text_entry, str):
        return 'Other/Unspecified'

    text_lower = text_entry.lower()

    # Iterate through the main categories and their associated phrases
    for category, phrases in mapping1.items():
        for phrase in phrases:
            # Check if any phrase is present in the text entry
            if phrase.lower() in text_lower:
                return category  # Return the high-level category and stop searching

    # If no match is found after checking all categories
    return 'Other/Unspecified'

data["US Sector"] = data["US Sector Name"].apply(assign_category)
data.head()

Unnamed: 0,Managing Agency Name,Implementing Partner Category Name,Implementing Partner Name,US Category Name,US Sector Name,Funding Agency Name,Objective,Aid Type Group Name,Activity ID,Activity Name,Activity Description,Transaction Type Name,Fiscal Year,Transaction Date,Current Dollar Amount,Constant Dollar Amount,aid_type_name,Transaction_Size,US Sector
0,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,basic education,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Obligations,2005,2005-09-30,28000,42057,Project-type interventions - not Investment Related,Medium,Education
1,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,basic education,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2005,2005-09-30,17875,26849,Project-type interventions - not Investment Related,Medium,Education
2,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,basic education,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-02-01,3469,5047,Project-type interventions - not Investment Related,Low,Education
3,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,basic education,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-04-01,1138,1655,Project-type interventions - not Investment Related,Low,Education
4,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,basic education,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-05-01,394,573,Project-type interventions - not Investment Related,Low,Education


In [32]:
data.drop(columns=['US Sector Name'], inplace=True)

In [33]:
data['Managing Agency Name'].nunique()

20

In [34]:
data['Funding Agency Name'].nunique()

21

#### We can combine the columns 'Managing Agency Name' and 'Funding Agency Name' by making the entries in the combined column a statement.

In [None]:
data['Funded by and Managed by']=(data['Funding Agency Name'].astype(str).fillna('')+ \
                                  ' managed by ' + data['Managing Agency Name'].astype(str).fillna('')
                                  )

In [36]:
data.head()

Unnamed: 0,Managing Agency Name,Implementing Partner Category Name,Implementing Partner Name,US Category Name,Funding Agency Name,Objective,Aid Type Group Name,Activity ID,Activity Name,Activity Description,Transaction Type Name,Fiscal Year,Transaction Date,Current Dollar Amount,Constant Dollar Amount,aid_type_name,Transaction_Size,US Sector,Funded by and Managed by
0,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Obligations,2005,2005-09-30,28000,42057,Project-type interventions - not Investment Related,Medium,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development
1,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2005,2005-09-30,17875,26849,Project-type interventions - not Investment Related,Medium,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development
2,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-02-01,3469,5047,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development
3,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-04-01,1138,1655,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development
4,U.S. Agency for International Development,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,U.S. Agency for International Development,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-05-01,394,573,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development


In [38]:
data.drop(columns=['Funding Agency Name','Managing Agency Name'],inplace=True)

In [39]:
data.head()

Unnamed: 0,Implementing Partner Category Name,Implementing Partner Name,US Category Name,Objective,Aid Type Group Name,Activity ID,Activity Name,Activity Description,Transaction Type Name,Fiscal Year,Transaction Date,Current Dollar Amount,Constant Dollar Amount,aid_type_name,Transaction_Size,US Sector,Funded by and Managed by
0,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Obligations,2005,2005-09-30,28000,42057,Project-type interventions - not Investment Related,Medium,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development
1,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2005,2005-09-30,17875,26849,Project-type interventions - not Investment Related,Medium,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development
2,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-02-01,3469,5047,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development
3,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-04-01,1138,1655,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development
4,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-05-01,394,573,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development


#### We can also combine the 'Implementing Partner Category Name' with the 'Implementing Partner Name' 

In [46]:
data['Implemented by and assisted by']=(data['Implementing Partner Name'].astype(str).fillna('') + \
                                  ' and assisted by ' + data['Implementing Partner Category Name'].astype(str).fillna('')
                                  )

In [47]:
data.head()

Unnamed: 0,Implementing Partner Category Name,Implementing Partner Name,US Category Name,Objective,Aid Type Group Name,Activity ID,Activity Name,Activity Description,Transaction Type Name,Fiscal Year,Transaction Date,Current Dollar Amount,Constant Dollar Amount,aid_type_name,Transaction_Size,US Sector,Funded by and Managed by,Implemented by and assisted by
0,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Obligations,2005,2005-09-30,28000,42057,Project-type interventions - not Investment Related,Medium,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
1,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2005,2005-09-30,17875,26849,Project-type interventions - not Investment Related,Medium,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
2,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-02-01,3469,5047,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
3,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-04-01,1138,1655,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
4,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-05-01,394,573,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government


In [48]:
data.drop(columns=['Implementing Partner Name','Implementing Partner Category Name'])

Unnamed: 0,US Category Name,Objective,Aid Type Group Name,Activity ID,Activity Name,Activity Description,Transaction Type Name,Fiscal Year,Transaction Date,Current Dollar Amount,Constant Dollar Amount,aid_type_name,Transaction_Size,US Sector,Funded by and Managed by,Implemented by and assisted by
0,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Obligations,2005,2005-09-30,28000,42057,Project-type interventions - not Investment Related,Medium,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
1,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2005,2005-09-30,17875,26849,Project-type interventions - not Investment Related,Medium,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
2,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-02-01,3469,5047,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
3,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-04-01,1138,1655,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
4,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-05-01,394,573,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80067,Environment,Economic,Technical Assistance,252655,"Environmental Protection Agency, Office of the Chief Financial Officer, EPA Peace Corps Partnership - Kenya","Establish a pilot project to invest in a pipeline of future EPA staff through capacity building trainings for Peace Corps Volunteers (PCVs) and local partners in support of both organizations missions through in-country trainings to PCVs and Peace Corps local partners, including Project Design and Managment Training culminating in a report of lessons learned and recommendations to further the EPA-Peace Corps strategic partnership.",Obligations,2024,2024-08-19,64250,62579,Technical Cooperation - Other,Medium,Environment,Environmental Protection Agencymanaged byEnvironmental Protection Agency,U.S. Government - Peace Corps and assisted by Government
80068,Health,Economic,Technical Assistance,297016,"U.S. International Development Finance Corporation: Technical Development project with Sanergy, Inc.","Technical assistance to support (i) feasibility study analyzing expansion of sanitation and food security company's operations into new markets and (ii) increasing impact of DFC loan to company. In most cases, grants for feasibility studies and technical assistance will be designed to increase the developmental impact or improve the commercial sustainability of a project that has received, or may receive, DFC financing or insurance support. The program complements and does not duplicate work funded by other agencies or financiers. DFC determines the technical assistance, feasibility study, or training work to be provided, and the grant recipient selects an entity with relevant expertise and experience that will perform that work. In addition, the program provides technical assistance for certain development credit activities requested by other agencies by utilizing a competitively selected pool of contractors.",Obligations,2020,2020-09-25,1146864,1327370,Technical Cooperation - Other,High,Health,U.S. International Development Finance Corporationmanaged byU.S. International Development Finance Corporation,"Sanergy, Inc. and assisted by Enterprises"
80069,Economic Development,Economic,Technical Assistance,297007,U.S. International Development Finance Corporation: Technical Development Assistance for Lending for Education in Africa Partnership (LEAP) Technical Assistance,"This Project is expected to have a highly developmental impact by expanding the financing options available to students in Kenya, many from low-income families. LEAP's target beneficiaries are highachieving students who do not possess the means to finance their own education, lack the collateral necessary to receive a commercial loan, and are unable to secure scholarships or government loans. The Projects Kenyan operations are women- owned and women-led, and LEAP aims to increase the availability of higher education financing for female students. LEAP seeks to demonstrate that investors can profit by offering affordable loans to students in Kenya and East Africa including low-income students and still generate a profit from utilizing scale",Obligations,2021,2021-09-28,282104,315641,Technical Cooperation - Other,High,Other/Unspecified,U.S. International Development Finance Corporationmanaged byU.S. International Development Finance Corporation,Lending for Education in Africa Partnership and assisted by Enterprises
80070,Economic Development,Economic,Technical Assistance,296975,Technical Development Assistance for Pezesha Africa Limited,TA to build a proprietary credit scoring model to improve Pezesha's underwriting capabilities for MSME clients.,Obligations,2024,2024-01-02,500000,486994,Technical Cooperation - Other,High,Other/Unspecified,U.S. International Development Finance Corporationmanaged byU.S. International Development Finance Corporation,Pezesha Africa Limited and assisted by Enterprises


In [None]:
new_order_cols=['Activity ID','Activity Name','Activity Description','Fiscal Year','Transaction Date',\
                'Transaction Type Name','Transaction_Size','US Category Name','US Sector','Aid Type Group Name',\
                'Objective','US Sector','Funded by and Managed by','Implemented by and assisted by'  ]

In [49]:
data.head()

Unnamed: 0,Implementing Partner Category Name,Implementing Partner Name,US Category Name,Objective,Aid Type Group Name,Activity ID,Activity Name,Activity Description,Transaction Type Name,Fiscal Year,Transaction Date,Current Dollar Amount,Constant Dollar Amount,aid_type_name,Transaction_Size,US Sector,Funded by and Managed by,Implemented by and assisted by
0,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Obligations,2005,2005-09-30,28000,42057,Project-type interventions - not Investment Related,Medium,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
1,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2005,2005-09-30,17875,26849,Project-type interventions - not Investment Related,Medium,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
2,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-02-01,3469,5047,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
3,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-04-01,1138,1655,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
4,Government,U.S. Government - U.S. Agency for International Development,Education and Social Services,Economic,Project-Type,171397,Education Support,Education Support,Disbursements,2006,2006-05-01,394,573,Project-type interventions - not Investment Related,Low,Education,U.S. Agency for International Developmentmanaged byU.S. Agency for International Development,U.S. Government - U.S. Agency for International Development and assisted by Government
