# Task 3: Event Impact Modeling

## Objective
Model how events (policies, product launches, infrastructure investments) affect financial inclusion indicators in Ethiopia.

## Key Deliverables
1. **Event-Indicator Association Matrix**: Quantifying the magnitude and lag of event impacts.
2. **Temporal Model**: Representing how effects build over time (Immediate vs. Gradual).
3. **Historical Validation**: Testing against the Telebirr launch effect (2021-2024).

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

# Set paths
DATA_PATH = '../data/raw/ethiopia_fi_unified_data.csv'
IMPACT_PATH = '../data/raw/impact_links.csv'
REF_PATH = '../data/raw/reference_codes.csv'
FIGURE_DIR = '../reports/figures/'

if not os.path.exists(FIGURE_DIR):
    os.makedirs(FIGURE_DIR)

# Load data
df_unified = pd.read_csv(DATA_PATH)
df_impact = pd.read_csv(IMPACT_PATH)
df_ref = pd.read_csv(REF_PATH)

print(f"Loaded {len(df_unified)} records from unified data")
print(f"Loaded {len(df_impact)} impact links")

## 1. Understand the Impact Data
Joining impact links with event details to see which events drive which changes.

In [None]:
# Filter for events in unified data
events = df_unified[df_unified['record_type'] == 'event'].copy()

# Join impact links with events
impact_full = pd.merge(
    df_impact,
    events[['record_id', 'indicator', 'observation_date', 'notes']], 
    left_on='parent_id', 
    right_on='record_id', 
    suffixes=('', '_event')
)

print(f"Joined impact records: {len(impact_full)}")
impact_full[['parent_id', 'indicator_event', 'related_indicator', 'impact_magnitude', 'lag_months']].head()

## 2. Build the Event-Indicator Matrix
This matrix summarizes “which events affect which indicators and by how much.”

In [None]:
# Create the matrix
matrix_data = impact_full.pivot_table(
    index='indicator_event', 
    columns='related_indicator', 
    values='impact_estimate',
    aggfunc='first'
).fillna(0)

plt.figure(figsize=(14, 10))
sns.heatmap(matrix_data, annot=True, cmap='RdYlGn', center=0, fmt='.1f')
plt.title("Event-Indicator Association Matrix (Impact Estimates in pp/%%)")
plt.tight_layout()
plt.savefig(os.path.join(FIGURE_DIR, 'event_impact_matrix.png'), dpi=150)
plt.show()

## 3. Model Event Effects Over Time
Effects are rarely immediate. We represent an event's effect using a **Linear Build-up** over the specified `lag_months`.

In [None]:
def calculate_impact(event_date, target_date, magnitude, lag_months):
    """
    Calculates the impact of an event at a specific target date.
    Returns the cumulative impact built up by that date.
    """
    event_dt = pd.to_datetime(event_date)
    target_dt = pd.to_datetime(target_date)
    
    if target_dt < event_dt:
        return 0.0
    
    # Calculate months difference
    months_diff = (target_dt.year - event_dt.year) * 12 + (target_dt.month - event_dt.month)
    
    if lag_months <= 0:
        return magnitude
    
    # Linear build-up until lag is reached
    fraction = min(months_diff / lag_months, 1.0)
    return magnitude * fraction

# Visualize build-up Example (Telebirr 15pp impact on ACC_OWNERSHIP over 12 months)
dates = pd.date_range(start='2021-01-01', end='2024-12-31', freq='ME')
telebirr_impact = [calculate_impact('2021-05-17', d, 15.0, 12) for d in dates]

plt.figure(figsize=(10, 6))
plt.plot(dates, telebirr_impact, label='Modeled Telebirr Impact (ACC_OWNERSHIP)', color='#2ecc71', lw=2)
plt.axvline(pd.to_datetime('2021-05-17'), color='#e74c3c', linestyle='--', label='Launch (May 2021)')
plt.axhline(15.0, color='gray', linestyle=':', label='Max Impact (15pp)')
plt.title("Temporal Build-up of Event Impact (Telebirr Example)")
plt.ylabel("Cumulative Impact (Percentage Points)")
plt.legend()
plt.grid(alpha=0.3)
plt.show()

## 4. Test Model Against Historical Data (Telebirr Case)

**Objective:** Check if the modeled Telebirr impact aligns with the observed doubling of mobile money accounts.

**Data points:**
- May 2021 (Start): **4.7%** (MM Account Rate)
- Nov 2024 (End): **9.45%** (MM Account Rate)
- Observed Growth: **+4.75pp**

In [None]:
# From impact_links.csv, we have IMP_0007 (M-Pesa adds 5pp to MM accounts)
# Let's assume a similar or slightly higher impact for Telebirr given its first-mover advantage.
telebirr_mm_impact_estimate = 5.0 # percentage points added
lag = 12 # months

modeled_contribution = calculate_impact('2021-05-17', '2024-11-29', telebirr_mm_impact_estimate, lag)
actual_growth = 9.45 - 4.7

print(f"Modeled Growth contribution from Telebirr: {modeled_contribution:.2f}pp")
print(f"Actual Total Growth (2021-2024): {actual_growth:.2f}pp")
print(f"Explanation Factor: {modeled_contribution/actual_growth:.1%}")

if abs(modeled_contribution - actual_growth) < 1.0:
    print("\n✅ SUCCESS: The model's estimated impact explains the majority of observed growth.")
else:
    print("\n⚠️ DISCREPANCY: Other factors may be influencing growth (e.g., policy, COVID-19 push).")

## 5. Methodology and Assumptions

### Modeling Approach
1. **Identification:** Events are identified via `record_type='event'`.
2. **Quantification:** Impact estimates are sourced from `impact_links.csv`, which uses a mix of empirical Ethiopian data and literature from similar markets (Kenya, India).
3. **Dynamics:** Impact is modeled using a **Linear-to-Plateau** function based on defined lag periods.

### Assumptions
- **Independent Additionality:** We assume event impacts are additive (though in reality, they may be multiplicative or redundant).
- **Standard Lags:** We assume a 12-month lag for infrastructure/product launches unless specified otherwise.
- **Causality:** We attribute observed growth directly to linked events while acknowledging baseline trends.

### Limitations
- **Data Frequency:** Financial inclusion surveys (Findex) are infrequent, making precise lag identification difficult.
- **Attribution Noise:** Hard to decouple the effect of Telebirr from the simultaneous expansion of 4G coverage without complex regression.