##### # Task 1: Data Exploration and Enrichment

#### **Objective:** Understand the starter dataset and enrich it with additional observations, events, and impact links useful for forecasting financial inclusion in Ethiopia.   
#### **Steps:**
#### 1. Load datasets
#### 2. Explore the schema and data
#### 3. Identify areas for enrichment
#### 4. Add new observations, events, and impact links
#### 5. Save enriched dataset
#### 6. Document enrichment in `data_enrichment_log.md`


### 1. Import Libraries

In [1]:
import pandas as pd
from pathlib import Path

### 2. Load Datasets

In [None]:
data_path = Path("data\raw")
processed_path = Path("data\processed")

data = pd.read_csv(data_path / "ethiopia_fi_unified_data.csv")
reference = pd.read_csv(data_path / "reference_codes.csv")

print("Data shape:", data.shape)
print("Reference shape:", reference.shape)
data.head()


Data shape: (43, 34)
Reference shape: (71, 4)


Unnamed: 0,record_id,record_type,category,pillar,indicator,indicator_code,indicator_direction,value_numeric,value_text,value_type,...,impact_direction,impact_magnitude,impact_estimate,lag_months,evidence_basis,comparable_country,collected_by,collection_date,original_text,notes
0,REC_0001,observation,,ACCESS,Account Ownership Rate,ACC_OWNERSHIP,higher_better,22.0,,percentage,...,,,,,,Example_Trainee,2025-01-20,,Baseline year,
1,REC_0002,observation,,ACCESS,Account Ownership Rate,ACC_OWNERSHIP,higher_better,35.0,,percentage,...,,,,,,Example_Trainee,2025-01-20,,,
2,REC_0003,observation,,ACCESS,Account Ownership Rate,ACC_OWNERSHIP,higher_better,46.0,,percentage,...,,,,,,Example_Trainee,2025-01-20,,,
3,REC_0004,observation,,ACCESS,Account Ownership Rate,ACC_OWNERSHIP,higher_better,56.0,,percentage,...,,,,,,Example_Trainee,2025-01-20,,Gender disaggregated,
4,REC_0005,observation,,ACCESS,Account Ownership Rate,ACC_OWNERSHIP,higher_better,36.0,,percentage,...,,,,,,Example_Trainee,2025-01-20,,Gender disaggregated,


### 3. Understand the Schema

In [3]:
print("=== Record Types ===")
display(data["record_type"].value_counts())

print("=== Pillars ===")
display(data["pillar"].value_counts(dropna=False))

print("=== Categories ===")
display(data["category"].value_counts(dropna=False))

# Extract impact links, events, observations
impact_links = data[data["record_type"] == "impact_link"]
events = data[data["record_type"] == "event"]
obs = data[data["record_type"] == "observation"]

=== Record Types ===


record_type
observation    30
event          10
target          3
Name: count, dtype: int64

=== Pillars ===


pillar
ACCESS           16
USAGE            11
NaN              10
GENDER            5
AFFORDABILITY     1
Name: count, dtype: int64

=== Categories ===


category
NaN               33
product_launch     2
infrastructure     2
policy             2
market_entry       1
milestone          1
partnership        1
pricing            1
Name: count, dtype: int64

### 4. Explore the Data (EDA)

In [4]:
# Counts by record type and pillar
print("Record type counts:")
display(data.groupby("record_type").size())

print("Record type x pillar counts:")
display(data.groupby(["record_type", "pillar"]).size())

# Confidence levels
print("Confidence counts:")
display(data["confidence"].value_counts())

# Temporal range of observations
obs_dates = pd.to_datetime(data["observation_date"], errors="coerce")
print("Observation dates summary:")
display(obs_dates.describe())

# List all indicators
indicators = obs["indicator_code"].unique()
print("Unique indicators:")
display(indicators)

print("Observation counts per indicator:")
display(obs.groupby("indicator_code")["observation_date"].count())

# Events table
print("Events overview:")
display(events[["record_id","indicator","category","observation_date"]].sort_values("observation_date"))

# Impact links overview
print("Impact links overview:")
display(impact_links[["record_id","related_indicator","pillar","impact_direction","lag_months"]])


Record type counts:


record_type
event          10
observation    30
target          3
dtype: int64

Record type x pillar counts:


record_type  pillar       
observation  ACCESS           14
             AFFORDABILITY     1
             GENDER            4
             USAGE            11
target       ACCESS            2
             GENDER            1
dtype: int64

Confidence counts:


confidence
high      40
medium     3
Name: count, dtype: int64

Observation dates summary:


count                               43
mean     2024-05-09 02:13:57.209302272
min                2014-12-31 00:00:00
25%                2023-07-16 00:00:00
50%                2024-12-31 00:00:00
75%                2025-07-07 00:00:00
max                2030-12-31 00:00:00
Name: observation_date, dtype: object

Unique indicators:


array(['ACC_OWNERSHIP', 'ACC_MM_ACCOUNT', 'ACC_4G_COV', 'ACC_MOBILE_PEN',
       'ACC_FAYDA', 'USG_P2P_COUNT', 'USG_P2P_VALUE', 'USG_ATM_COUNT',
       'USG_ATM_VALUE', 'USG_CROSSOVER', 'USG_TELEBIRR_USERS',
       'USG_TELEBIRR_VALUE', 'USG_MPESA_USERS', 'USG_MPESA_ACTIVE',
       'USG_ACTIVE_RATE', 'AFF_DATA_INCOME', 'GEN_GAP_ACC',
       'GEN_MM_SHARE', 'GEN_GAP_MOBILE'], dtype=object)

Observation counts per indicator:


indicator_code
ACC_4G_COV            2
ACC_FAYDA             3
ACC_MM_ACCOUNT        2
ACC_MOBILE_PEN        1
ACC_OWNERSHIP         6
AFF_DATA_INCOME       1
GEN_GAP_ACC           2
GEN_GAP_MOBILE        1
GEN_MM_SHARE          1
USG_ACTIVE_RATE       1
USG_ATM_COUNT         1
USG_ATM_VALUE         1
USG_CROSSOVER         1
USG_MPESA_ACTIVE      1
USG_MPESA_USERS       1
USG_P2P_COUNT         2
USG_P2P_VALUE         1
USG_TELEBIRR_USERS    1
USG_TELEBIRR_VALUE    1
Name: observation_date, dtype: int64

Events overview:


Unnamed: 0,record_id,indicator,category,observation_date
33,EVT_0001,Telebirr Launch,product_launch,2021-05-17
41,EVT_0009,NFIS-II Strategy Launch,policy,2021-09-01
34,EVT_0002,Safaricom Ethiopia Commercial Launch,market_entry,2022-08-01
35,EVT_0003,M-Pesa Ethiopia Launch,product_launch,2023-08-01
36,EVT_0004,Fayda Digital ID Program Rollout,infrastructure,2024-01-01
37,EVT_0005,Foreign Exchange Liberalization,policy,2024-07-29
38,EVT_0006,P2P Transaction Count Surpasses ATM,milestone,2024-10-01
39,EVT_0007,M-Pesa EthSwitch Integration,partnership,2025-10-27
42,EVT_0010,Safaricom Ethiopia Price Increase,pricing,2025-12-15
40,EVT_0008,EthioPay Instant Payment System Launch,infrastructure,2025-12-18


Impact links overview:


Unnamed: 0,record_id,related_indicator,pillar,impact_direction,lag_months


### 5. Data Enrichment

In [5]:
new_rows = []

# --- Telebirr Users (Observation) ---
new_rows.append({
    "record_id": "REC_NEW_001",
    "record_type": "observation",
    "category": "",
    "pillar": "USAGE",
    "indicator": "Telebirr active users",
    "indicator_code": "TELEBIRR_USERS",
    "indicator_direction": "higher_better",
    "value_numeric": 54,
    "value_text": "",
    "value_type": "count_millions",
    "unit": "million",
    "observation_date": "2024-12-31",
    "period_start": "",
    "period_end": "",
    "fiscal_year": 2024,
    "gender": "all",
    "location": "national",
    "region": "",
    "source_name": "Ethio Telecom",
    "source_type": "operator_report",
    "source_url": "https://www.ethiotelecom.et/telebirr/",
    "confidence": "medium",
    "related_indicator": "",
    "relationship_type": "",
    "impact_direction": "",
    "impact_magnitude": "",
    "impact_estimate": "",
    "lag_months": "",
    "evidence_basis": "",
    "comparable_country": "",
    "collected_by": "Firehiwet Zerihun",
    "collection_date": "2026-01-29",
    "original_text": "Telebirr has surpassed 54 million users",
    "notes": "Core driver of digital payment usage"
})

In [6]:
# --- Smartphone Penetration (Observation) ---
new_rows.append({
    "record_id": "REC_NEW_002",
    "record_type": "observation",
    "category": "",
    "pillar": "USAGE",
    "indicator": "Smartphone penetration",
    "indicator_code": "SMARTPHONE_PEN",
    "indicator_direction": "higher_better",
    "value_numeric": 28,
    "value_text": "",
    "value_type": "percentage",
    "unit": "%",
    "observation_date": "2023-12-31",
    "period_start": "",
    "period_end": "",
    "fiscal_year": 2023,
    "gender": "all",
    "location": "national",
    "region": "",
    "source_name": "GSMA",
    "source_type": "industry_report",
    "source_url": "https://www.gsma.com/mobileeconomy/",
    "confidence": "low",
    "related_indicator": "",
    "relationship_type": "",
    "impact_direction": "",
    "impact_magnitude": "",
    "impact_estimate": "",
    "lag_months": "",
    "evidence_basis": "",
    "comparable_country": "",
    "collected_by": "Firehiwet Zerihun",
    "collection_date": "2026-01-29",
    "original_text": "About 28% smartphone adoption",
    "notes": "Upper bound for mobile app usage"
})

In [7]:
# --- Telebirr Launch Event ---
new_rows.append({
    "record_id": "EVT_NEW_001",
    "record_type": "event",
    "category": "product_launch",
    "pillar": "",
    "indicator": "Telebirr Launch",
    "indicator_code": "",
    "indicator_direction": "",
    "value_numeric": "",
    "value_text": "",
    "value_type": "",
    "unit": "",
    "observation_date": "2021-05-01",
    "period_start": "",
    "period_end": "",
    "fiscal_year": 2021,
    "gender": "",
    "location": "national",
    "region": "",
    "source_name": "Ethio Telecom",
    "source_type": "press_release",
    "source_url": "https://www.ethiotelecom.et/",
    "confidence": "high",
    "related_indicator": "",
    "relationship_type": "",
    "impact_direction": "",
    "impact_magnitude": "",
    "impact_estimate": "",
    "lag_months": "",
    "evidence_basis": "",
    "comparable_country": "",
    "collected_by": "Firehiwet Zerihun",
    "collection_date": "2026-01-29",
    "original_text": "Telebirr launched nationwide in 2021",
    "notes": "Major digital finance milestone"
})


In [8]:
# --- Impact Link for Telebirr ---
new_rows.append({
    "record_id": "LNK_NEW_001",
    "record_type": "impact_link",
    "category": "",
    "pillar": "USAGE",
    "indicator": "",
    "indicator_code": "",
    "indicator_direction": "",
    "value_numeric": "",
    "value_text": "",
    "value_type": "",
    "unit": "",
    "observation_date": "",
    "period_start": "",
    "period_end": "",
    "fiscal_year": "",
    "gender": "",
    "location": "",
    "region": "",
    "source_name": "",
    "source_type": "",
    "source_url": "",
    "confidence": "medium",
    "related_indicator": "DIGITAL_PAYMENTS",
    "relationship_type": "causal",
    "impact_direction": "positive",
    "impact_magnitude": "high",
    "impact_estimate": "",
    "lag_months": 6,
    "evidence_basis": "Observed growth after launch",
    "comparable_country": "Kenya",
    "collected_by": "Firehiwet Zerihun",
    "collection_date": "2026-01-29",
    "original_text": "",
    "notes": "Telebirr strongly increased usage"
})


In [9]:
# --- M-Pesa Launch & Observation ---
new_rows.append({
    "record_id": "EVT_NEW_002",
    "record_type": "event",
    "category": "product_launch",
    "pillar": "",
    "indicator": "M-Pesa Launch",
    "indicator_code": "",
    "indicator_direction": "",
    "value_numeric": "",
    "value_text": "",
    "value_type": "",
    "unit": "",
    "observation_date": "2023-03-01",
    "period_start": "",
    "period_end": "",
    "fiscal_year": 2023,
    "gender": "",
    "location": "national",
    "region": "",
    "source_name": "Safaricom & Vodacom",
    "source_type": "press_release",
    "source_url": "https://www.safaricom.co.ke/mpesa",
    "confidence": "high",
    "related_indicator": "",
    "relationship_type": "",
    "impact_direction": "",
    "impact_magnitude": "",
    "impact_estimate": "",
    "lag_months": "",
    "evidence_basis": "",
    "comparable_country": "",
    "collected_by": "Firehiwet Zerihun",
    "collection_date": "2026-01-29",
    "original_text": "M-Pesa entered Ethiopian market in March 2023",
    "notes": "New mobile money service affecting USAGE"
})


In [10]:
new_rows.append({
    "record_id": "REC_NEW_003",
    "record_type": "observation",
    "category": "",
    "pillar": "USAGE",
    "indicator": "M-Pesa active users",
    "indicator_code": "MPESA_USERS",
    "indicator_direction": "higher_better",
    "value_numeric": 10,
    "value_text": "",
    "value_type": "count_millions",
    "unit": "million",
    "observation_date": "2024-12-31",
    "period_start": "",
    "period_end": "",
    "fiscal_year": 2024,
    "gender": "all",
    "location": "national",
    "region": "",
    "source_name": "Safaricom & Vodacom",
    "source_type": "operator_report",
    "source_url": "https://www.safaricom.co.ke/mpesa",
    "confidence": "medium",
    "related_indicator": "",
    "relationship_type": "",
    "impact_direction": "",
    "impact_magnitude": "",
    "impact_estimate": "",
    "lag_months": "",
    "evidence_basis": "",
    "comparable_country": "",
    "collected_by": "Firehiwet Zerihun",
    "collection_date": "2026-01-29",
    "original_text": "M-Pesa reached 10 million users by 2024",
    "notes": "Supports USAGE increase"
})

In [16]:
# EthSwitch impact link
new_rows.append({
    "record_id": "LNK_NEW_002",
    "record_type": "impact_link",
    "category": "",
    "pillar": "USAGE",
    "indicator": "",
    "indicator_code": "",
    "indicator_direction": "",
    "value_numeric": "",
    "value_text": "",
    "value_type": "",
    "unit": "",
    "observation_date": "",
    "period_start": "",
    "period_end": "",
    "fiscal_year": "",
    "gender": "",
    "location": "",
    "region": "",
    "source_name": "",
    "source_type": "",
    "source_url": "",
    "confidence": "medium",
    "related_indicator": "DIGITAL_PAYMENTS",
    "relationship_type": "causal",
    "impact_direction": "positive",
    "impact_magnitude": "medium",
    "impact_estimate": "",
    "lag_months": 3,
    "evidence_basis": "Observed uptake after interoperability",
    "comparable_country": "Kenya",
    "collected_by": "Firehiwet Zerihun",
    "collection_date": "2026-01-29",
    "original_text": "",
    "notes": "Interoperability boosts USAGE"
})

In [17]:
# Fayda impact link
new_rows.append({
    "record_id": "LNK_NEW_003",
    "record_type": "impact_link",
    "category": "",
    "pillar": "ACCESS",
    "indicator": "",
    "indicator_code": "",
    "indicator_direction": "",
    "value_numeric": "",
    "value_text": "",
    "value_type": "",
    "unit": "",
    "observation_date": "",
    "period_start": "",
    "period_end": "",
    "fiscal_year": "",
    "gender": "",
    "location": "",
    "region": "",
    "source_name": "",
    "source_type": "",
    "source_url": "",
    "confidence": "medium",
    "related_indicator": "ACCOUNT_OWNERSHIP",
    "relationship_type": "causal",
    "impact_direction": "positive",
    "impact_magnitude": "medium",
    "impact_estimate": "",
    "lag_months": 6,
    "evidence_basis": "Observed improvement in account ownership",
    "comparable_country": "Kenya",
    "collected_by": "Firehiwet Zerihun",
    "collection_date": "2026-01-29",
    "original_text": "",
    "notes": "Digital ID improves ACCESS"
})


### 6. Append Enrichment & Save

In [18]:
# Append new rows
enriched = pd.concat([data, pd.DataFrame(new_rows)], ignore_index=True)

# Save enriched dataset
enriched.to_csv(processed_path / "ethiopia_fi_unified_data_enriched.csv", index=False)

# Quick check
print("Old shape:", data.shape)
print("New shape:", enriched.shape)
display(enriched["record_type"].value_counts())
display(enriched[enriched["record_type"]=="event"].tail())


Old shape: (43, 34)
New shape: (51, 34)


record_type
observation    33
event          12
target          3
impact_link     3
Name: count, dtype: int64

Unnamed: 0,record_id,record_type,category,pillar,indicator,indicator_code,indicator_direction,value_numeric,value_text,value_type,...,impact_direction,impact_magnitude,impact_estimate,lag_months,evidence_basis,comparable_country,collected_by,collection_date,original_text,notes
40,EVT_0008,event,infrastructure,,EthioPay Instant Payment System Launch,EVT_ETHIOPAY,,,Launched,categorical,...,,,,,,Example_Trainee,2025-01-20,,National real-time payment system,
41,EVT_0009,event,policy,,NFIS-II Strategy Launch,EVT_NFIS2,,,Launched,categorical,...,,,,,,Example_Trainee,2025-01-20,,5-year national financial inclusion strategy,
42,EVT_0010,event,pricing,,Safaricom Ethiopia Price Increase,EVT_SAFCOM_PRICE,,,Implemented,categorical,...,,,,,,Example_Trainee,2025-01-20,,Data and voice prices increased 20-82%,
45,EVT_NEW_001,event,product_launch,,Telebirr Launch,,,,,,...,,,,,,,Firehiwet Zerihun,2026-01-29,Telebirr launched nationwide in 2021,Major digital finance milestone
47,EVT_NEW_002,event,product_launch,,M-Pesa Launch,,,,,,...,,,,,,,Firehiwet Zerihun,2026-01-29,M-Pesa entered Ethiopian market in March 2023,New mobile money service affecting USAGE


### 7. Generate Data Enrichment Log

In [19]:
log_rows = []
for row in new_rows:
    log_rows.append({
        "record_id": row["record_id"],
        "record_type": row["record_type"],
        "indicator": row.get("indicator",""),
        "indicator_code": row.get("indicator_code",""),
        "source": row.get("source_url",""),
        "confidence": row.get("confidence",""),
        "notes": row.get("notes","")
    })

enrichment_log = pd.DataFrame(log_rows)
enrichment_log.to_csv(processed_path / "data_enrichment_log.md", index=False, sep="|")
print("Enrichment log saved.")

display(enrichment_log)

Enrichment log saved.


Unnamed: 0,record_id,record_type,indicator,indicator_code,source,confidence,notes
0,REC_NEW_001,observation,Telebirr active users,TELEBIRR_USERS,https://www.ethiotelecom.et/telebirr/,medium,Core driver of digital payment usage
1,REC_NEW_002,observation,Smartphone penetration,SMARTPHONE_PEN,https://www.gsma.com/mobileeconomy/,low,Upper bound for mobile app usage
2,EVT_NEW_001,event,Telebirr Launch,,https://www.ethiotelecom.et/,high,Major digital finance milestone
3,LNK_NEW_001,impact_link,,,,medium,Telebirr strongly increased usage
4,EVT_NEW_002,event,M-Pesa Launch,,https://www.safaricom.co.ke/mpesa,high,New mobile money service affecting USAGE
5,REC_NEW_003,observation,M-Pesa active users,MPESA_USERS,https://www.safaricom.co.ke/mpesa,medium,Supports USAGE increase
6,LNK_NEW_002,impact_link,,,,medium,Interoperability boosts USAGE
7,LNK_NEW_003,impact_link,,,,medium,Digital ID improves ACCESS


In [20]:
# Check last few rows to ensure new events and observations are appended
display(enriched.tail(10))

# Verify record_type counts
display(enriched["record_type"].value_counts())


Unnamed: 0,record_id,record_type,category,pillar,indicator,indicator_code,indicator_direction,value_numeric,value_text,value_type,...,impact_direction,impact_magnitude,impact_estimate,lag_months,evidence_basis,comparable_country,collected_by,collection_date,original_text,notes
41,EVT_0009,event,policy,,NFIS-II Strategy Launch,EVT_NFIS2,,,Launched,categorical,...,,,,,,Example_Trainee,2025-01-20,,5-year national financial inclusion strategy,
42,EVT_0010,event,pricing,,Safaricom Ethiopia Price Increase,EVT_SAFCOM_PRICE,,,Implemented,categorical,...,,,,,,Example_Trainee,2025-01-20,,Data and voice prices increased 20-82%,
43,REC_NEW_001,observation,,USAGE,Telebirr active users,TELEBIRR_USERS,higher_better,54.0,,count_millions,...,,,,,,,Firehiwet Zerihun,2026-01-29,Telebirr has surpassed 54 million users,Core driver of digital payment usage
44,REC_NEW_002,observation,,USAGE,Smartphone penetration,SMARTPHONE_PEN,higher_better,28.0,,percentage,...,,,,,,,Firehiwet Zerihun,2026-01-29,About 28% smartphone adoption,Upper bound for mobile app usage
45,EVT_NEW_001,event,product_launch,,Telebirr Launch,,,,,,...,,,,,,,Firehiwet Zerihun,2026-01-29,Telebirr launched nationwide in 2021,Major digital finance milestone
46,LNK_NEW_001,impact_link,,USAGE,,,,,,,...,positive,high,,6.0,Observed growth after launch,Kenya,Firehiwet Zerihun,2026-01-29,,Telebirr strongly increased usage
47,EVT_NEW_002,event,product_launch,,M-Pesa Launch,,,,,,...,,,,,,,Firehiwet Zerihun,2026-01-29,M-Pesa entered Ethiopian market in March 2023,New mobile money service affecting USAGE
48,REC_NEW_003,observation,,USAGE,M-Pesa active users,MPESA_USERS,higher_better,10.0,,count_millions,...,,,,,,,Firehiwet Zerihun,2026-01-29,M-Pesa reached 10 million users by 2024,Supports USAGE increase
49,LNK_NEW_002,impact_link,,USAGE,,,,,,,...,positive,medium,,3.0,Observed uptake after interoperability,Kenya,Firehiwet Zerihun,2026-01-29,,Interoperability boosts USAGE
50,LNK_NEW_003,impact_link,,ACCESS,,,,,,,...,positive,medium,,6.0,Observed improvement in account ownership,Kenya,Firehiwet Zerihun,2026-01-29,,Digital ID improves ACCESS


record_type
observation    33
event          12
target          3
impact_link     3
Name: count, dtype: int64

In [21]:
enriched[enriched["record_type"]=="impact_link"][["record_id","pillar","related_indicator","impact_direction","lag_months"]]


Unnamed: 0,record_id,pillar,related_indicator,impact_direction,lag_months
46,LNK_NEW_001,USAGE,DIGITAL_PAYMENTS,positive,6
49,LNK_NEW_002,USAGE,DIGITAL_PAYMENTS,positive,3
50,LNK_NEW_003,ACCESS,ACCOUNT_OWNERSHIP,positive,6
