# Task 3 — Event Impact Modeling (Rubric-Compliant)

**Objective:** Model how events (policies, product launches, infrastructure investments) affect financial inclusion indicators.

## Rubric Deliverables
1. **Impact Modeling Notebook**: analyze impact links and model event effects on indicators
2. **Event–Indicator Association Matrix**: table/heatmap of which events affect which indicators and by how much
3. **Historical Validation**: Telebirr launch vs observed mobile money accounts change (4.7% → 9.45%)
4. **Methodology Documentation**: assumptions, functional forms, sources, limitations

## Production Parity (Hard Requirement)
This notebook **must** successfully run `scripts/run_task3.py` to be considered complete.


## 0) Setup
Run from the repo root (or auto-detect if started inside `notebooks/`).

In [None]:
from __future__ import annotations

import sys
from pathlib import Path
import pandas as pd
import numpy as np

PROJECT_ROOT = Path.cwd()
if PROJECT_ROOT.name == "notebooks":
    PROJECT_ROOT = PROJECT_ROOT.parent

SRC = PROJECT_ROOT / "src"
if str(SRC) not in sys.path:
    sys.path.insert(0, str(SRC))

DATA_PROCESSED = PROJECT_ROOT / "data" / "processed" / "eda_enriched"
OUTPUTS = PROJECT_ROOT / "outputs" / "task_3"
OUTPUTS.mkdir(parents=True, exist_ok=True)

assert (PROJECT_ROOT / "scripts" / "run_task3.py").exists(), "Missing scripts/run_task3.py"
assert (PROJECT_ROOT / "src" / "fi" / "event_effects.py").exists(), "Missing src/fi/event_effects.py"

PROJECT_ROOT

## 1) Understand the Impact Data (load + join)

**Requirement:**
- Load impact links
- Join with events to get event details
- Summarize which events affect which indicators and by how much

**Implementation note (your current pipeline):**
- The exported `outputs/task_3/impact_links_summary.csv` already includes event metadata (`event_name`, `event_date`, `event_category`, etc.).
- Therefore, the join step is *already represented* in the audited summary file.


In [None]:
obs_path = DATA_PROCESSED / "observations.csv"
events_path = DATA_PROCESSED / "events.csv"
links_summary_path = OUTPUTS / "impact_links_summary.csv"

obs = pd.read_csv(obs_path)
events = pd.read_csv(events_path)
links_summary = pd.read_csv(links_summary_path)

print("obs:", obs.shape)
print("events:", events.shape)
print("links_summary:", links_summary.shape)
links_summary.head()

In [None]:
# Link summary: which events affect which indicators and by how much
view_cols = [
    "link_record_id",
    "event_record_id",
    "event_name",
    "event_category",
    "event_date",
    "indicator_code",
    "impact_direction",
    "impact_magnitude_pp",
    "lag_months",
    "effect_shape",
    "ramp_years",
    "evidence_basis",
    "confidence_link",
    "impact_magnitude_source",
]
view_cols = [c for c in view_cols if c in links_summary.columns]
links_summary[view_cols].sort_values(["event_name", "indicator_code"]).head(30)

## 2) Impact model: effect over time + combining events

### Functional form
- Each link provides a signed total magnitude (`impact_magnitude_pp`) in percentage points.
- Timing is governed by `lag_months` and an effect shape:
  - **step**: full effect after lag
  - **ramp**: linearly accumulates from 0 → full effect over `ramp_years` (default 3 years)

### Month-aware alignment
We treat FINDEX snapshot years as observed at **December** of each year. This makes lags and ramps align correctly for events that happen mid-year.

### Combining multiple events
We assume **additivity in pp space**:
- total effect (indicator, year) = sum of all realized effects for that indicator-year across events


## 3) Event–Indicator Association Matrix (Rubric)

Rows: Events
Columns: Indicators
Values: **signed total magnitude** (pp)


In [None]:
assoc_matrix = links_summary.pivot_table(
    index="event_name",
    columns="indicator_code",
    values="impact_magnitude_pp",
    aggfunc="sum",
    fill_value=0.0,
)

assoc_matrix.head(20)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Use your exact path:
assoc_path = r"D:\Python\Week10\Forecasting-Financial-Inclusion-in-Ethiopia\outputs\task_3\event_indicator_association_matrix.csv"

mat_raw = pd.read_csv(assoc_path)

meta_cols = ["event_record_id", "event_name", "event_category", "event_date"]
indicator_cols = [c for c in mat_raw.columns if c not in meta_cols]

# index for plotting
idx_col = "event_name" if "event_name" in mat_raw.columns else "event_record_id"
assoc_matrix = mat_raw.set_index(idx_col)[indicator_cols]

# 1) Current/basic heatmap (as-is)
plt.figure(
    figsize=(min(18, 1.2 * assoc_matrix.shape[1] + 4),
             min(12, 0.6 * assoc_matrix.shape[0] + 4))
)
plt.imshow(assoc_matrix.values, aspect="auto")
plt.colorbar(label="Estimated effect (pp)")
plt.yticks(range(len(assoc_matrix.index)), assoc_matrix.index, fontsize=8)
plt.xticks(range(len(assoc_matrix.columns)), assoc_matrix.columns, rotation=45, ha="right", fontsize=8)
plt.title("Event–Indicator Association Matrix (pp effects)")
plt.tight_layout()
plt.show()

# IMPORTANT: restart kernel after overwriting association_matrix.py (or use autoreload)
from fi.association_matrix import plot_heatmap_signed_annotated_inline

# 2) Improved signed heatmap: full matrix
plot_heatmap_signed_annotated_inline(
    assoc_matrix,
    title="Event–Indicator Association (signed, diverging centered at 0, annotated)",
)
plt.show()

# 3) Improved signed heatmap: key indicators only
key_indicators = [
    "ACC_OWNERSHIP",
    "ACC_MM_ACCOUNT",
    "USG_DIGITAL_PAYMENT",
    "GAP_ACC_OWNERSHIP_MALE_MINUS_FEMALE_PP",
]
key_indicators = [c for c in key_indicators if c in assoc_matrix.columns]

plot_heatmap_signed_annotated_inline(
    assoc_matrix[key_indicators],
    title="Event–Key Indicators (signed, diverging @0, annotated)",
)
plt.show()


## 4) Build realized effects over time (Lag + Ramp)

This uses your current `src/fi/event_effects.py` implementation: `effects_tidy()`.


In [None]:
from fi.event_effects import effects_tidy, sum_effects_over_events, FINDEX_YEAR_GRID

event_effects_tidy = effects_tidy(
    df_summary=links_summary,
    indicators=None,
    years=FINDEX_YEAR_GRID,
    default_shape="ramp",
    default_ramp_years=3.0,
)

print("event_effects_tidy:", event_effects_tidy.shape)
event_effects_tidy.head(20)

In [None]:
event_effects_total = sum_effects_over_events(event_effects_tidy)
event_effects_total.head(20)

## 5) Historical Validation (Telebirr) — Rubric

Observed:
- Telebirr launched May 2021
- Mobile money accounts (ACC_MM_ACCOUNT) went from **4.7% (2021)** to **9.45% (2024)**
- Observed delta: **+4.75pp**

We compare this with the model-implied Telebirr realized contribution from 2021→2024.


In [None]:
OBSERVED_DELTA_PP = 9.45 - 4.7

tele_effects = event_effects_tidy[
    (event_effects_tidy["event_name"].astype(str).str.contains("telebirr", case=False, na=False))
    & (event_effects_tidy["indicator_code"] == "ACC_MM_ACCOUNT")
].copy()

tele_pivot = tele_effects.pivot_table(
    index=["event_record_id", "event_name"],
    columns="year",
    values="effect_pp",
    aggfunc="sum",
).fillna(0.0)

tele_pivot["predicted_delta_2024_minus_2021_pp"] = tele_pivot.get(2024, 0.0) - tele_pivot.get(2021, 0.0)
predicted_delta_pp = float(tele_pivot["predicted_delta_2024_minus_2021_pp"].sum())

telebirr_validation = pd.DataFrame([
    {
        "observed_delta_pp": OBSERVED_DELTA_PP,
        "predicted_delta_pp": predicted_delta_pp,
        "residual_pp": predicted_delta_pp - OBSERVED_DELTA_PP,
    }
])

tele_pivot

In [None]:
telebirr_validation

### If mismatch occurs (required interpretation)

If predicted vs observed differ materially:
- Adoption curve may be S-shaped rather than linear
- Other concurrent events also affected mobile money adoption
- Measurement timing/definition changes
- Lag/ramp mis-specification


## 6) Comparable-country evidence + confidence registry (Rubric)

For links where Ethiopian pre/post data is insufficient, document comparable-country evidence and mark uncertainty.

Here we build an evidence registry directly from the audited summary columns.


In [None]:
evidence_registry = links_summary[[
    "link_record_id",
    "event_record_id",
    "event_name",
    "indicator_code",
    "impact_magnitude_pp",
    "lag_months",
    "effect_shape",
    "ramp_years",
    "evidence_basis",
    "confidence_link",
    "event_source_name",
    "event_confidence",
    "impact_magnitude_source",
]].copy()

evidence_registry.head(20)

## 7) HARD REQUIREMENT: Run canonical pipeline script

Notebook fails if `scripts/run_task3.py` fails.


In [None]:
import subprocess

script = PROJECT_ROOT / "scripts" / "run_task3.py"
proc = subprocess.run(
    [sys.executable, str(script)],
    cwd=str(PROJECT_ROOT),
    capture_output=True,
    text=True,
)
print(proc.stdout)
if proc.stderr:
    print("--- STDERR ---")
    print(proc.stderr)
assert proc.returncode == 0, f"run_task3.py failed with return code {proc.returncode}"

## 8) Export Task 3 artifacts

Writes rubric outputs to `outputs/task_3/`.


In [None]:
(OUTPUTS / "event_indicator_association_matrix.csv").write_text(assoc_matrix.to_csv(), encoding="utf-8")
event_effects_tidy.to_csv(OUTPUTS / "event_effects_tidy.csv", index=False)
telebirr_validation.to_csv(OUTPUTS / "telebirr_mm_validation.csv", index=False)
evidence_registry.to_csv(OUTPUTS / "impact_evidence_registry.csv", index=False)

print("wrote:", OUTPUTS / "event_indicator_association_matrix.csv")
print("wrote:", OUTPUTS / "event_effects_tidy.csv")
print("wrote:", OUTPUTS / "telebirr_mm_validation.csv")
print("wrote:", OUTPUTS / "impact_evidence_registry.csv")

## 9) Methodology Documentation (Rubric)

### Assumptions
- **Additivity**: multiple event effects sum in percentage points
- **Lag + (Step/Ramp)**: effects may start after a lag and accumulate linearly over a ramp period
- **Snapshot timing**: FINDEX points treated as December of each year for month-aware lag arithmetic

### Limitations
- Not causal identification (confounding events)
- Linear ramp may not match true adoption curves (S-curve)
- Sparse survey snapshots limit within-year validation
- Comparable-country transfer may not generalize to Ethiopia


## Rubric checklist
- [x] Impact Modeling Notebook
- [x] Event–Indicator Association Matrix
- [x] Historical Validation (Telebirr)
- [x] Methodology Documentation
- [x] Script parity: `scripts/run_task3.py`
