# Task 3: Financial Inclusion Impact Modeling

## Objective
Model how events (policies, product launches, infrastructure investments) affect Ethiopiaâ€™s financial inclusion indicators, producing an **Event-Indicator Association Matrix** for forecasting.

## Methodology
- **Functional Form**: Ramped Step Function (Permanent structural changes with adoption lag).
- **Impact Logic**: `Effect = Direction * Magnitude * AdoptionCurve(t - Lag)`.
- **Aggregation**: Additive combination of concurrent events.
- **Goal**: Structured impact reasoning, not causal proof.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import timedelta, datetime

# Set style
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

## Step 1: Load Data
Loading the enriched unified data which contains Observations, Events, and impact links (linked via `parent_id`).

In [2]:
# Load data
df = pd.read_csv('../data/raw/ethiopia_fi_unified_data.csv')

# Parse dates
date_cols = ['observation_date', 'period_start', 'period_end']
for col in date_cols:
    df[col] = pd.to_datetime(df[col], errors='coerce')

# Separate records types
events = df[df['record_type'] == 'event'].copy()
impact_links = df[df['record_type'] == 'impact_link'].copy()
observations = df[df['record_type'] == 'observation'].copy()

print(f"Events: {len(events)}")
print(f"Impact Links: {len(impact_links)}")
print(f"Observations: {len(observations)}")

Events: 12
Impact Links: 2
Observations: 35


## Step 2: Define Impact Functions

We model most events as **Ramped Step Functions**:
1.  **Shock**: Event occurs at $T_{event}$.
2.  **Lag**: Impact starts at $T_{start} = T_{event} + Lag$.
3.  **Ramp**: Impact grows linearly (or logistically) from 0% to 100% over a `ramp_period` (default 6 months).
4.  **Plateau**: Impact stays at 100% of the estimated magnitude (structural shift).

In [3]:
def calculate_ramp_factor(current_date, start_date, ramp_months=6):
    """
    Calculates a linear ramp factor (0.0 to 1.0) based on time elapsed since start_date.
    """
    if current_date < start_date:
        return 0.0
    
    days_elapsed = (current_date - start_date).days
    ramp_days = ramp_months * 30
    
    if days_elapsed >= ramp_days:
        return 1.0
    
    return days_elapsed / ramp_days

def get_magnitude_numeric(magnitude_str, default_high=0.2, default_med=0.1, default_low=0.05):
    """
    Converts qualitative magnitude labels to numeric factors (percentage change assumption).
    These defaults are placeholders and should be overridden by 'impact_estimate' if available.
    """
    mapping = {
        'high': default_high,      # e.g., 20% impact
        'medium': default_med,     # e.g., 10% impact
        'low': default_low,        # e.g., 5% impact
        'negligible': 0.01
    }
    return mapping.get(str(magnitude_str).lower(), 0.0)

## Step 3: Map Events to Impacts

Process each `impact_link` to generate an effect series.

**Logic:**
- Iterate through `impact_links`.
- Find corresponding `event` (via `parent_id` matching `event` `record_id`... wait, `impact_link` has `parent_id` pointing to event? Let's check schema. Actually `impact_link` schema in `reference_codes` says links via `parent_id`. But looking at the CSV data in Step 1, `impact_links` (rows 52, 53) have `parent_id` as the LAST column pointing to `EVT_...`. Correct.
- Determine Magnitude:
    - Use `impact_estimate` (numeric) if valid.
    - Else fall back to `impact_magnitude` (categorical) converted to numeric.
- Determine Direction:
    - `increase` (+1), `decrease` (-1).
- Calculate Effect Time Series (Monthly from 2020 to 2030).

In [4]:
# Generate a monthly timeline for modeling
timeline = pd.date_range(start='2020-01-01', end='2030-12-31', freq='ME')
modeling_df = pd.DataFrame({'date': timeline})

impact_effects = []

for idx, link in impact_links.iterrows():
    # 1. Get Event Info
    event_id = link['parent_id']
    event_row = events[events['record_id'] == event_id]
    
    if event_row.empty:
        # Handle case where link points to non-existent event or mismatched ID
        # In the provided CSV snippet, impact_links reference EVT_FX_LIBERAL and EVT_MELA_LAUNCH
        # But EVT_FX_LIBERAL matches EVT_ENR_001? 
        # Actually, let's look at the data snippet again.
        # Row 50: EVT_ENR_001 ... EVT_FX_LIBERAL (in indicator_code?? No, wait)
        # Row 52: LNK_ENR_001 ... parent_id = EVT_FX_LIBERAL. 
        # Ah, the `parent_id` column in `impact_link` seems to refer to the `indicator_code` (e.g. EVT_FX_LIBERAL) 
        # OR the `record_id`? 
        # In standard normalization, it should match `record_id`.
        # Let's support matching on `indicator_code` (which seems to be used as an alias ID) or `record_id`.
        
        # Try matching record_id
        event_row = events[events['record_id'] == event_id]
        if event_row.empty:
            # Try matching indicator_code (which holds the human-readable ID like EVT_TELEBIRR)
            event_row = events[events['indicator_code'] == event_id]
            
    if event_row.empty:
        print(f"Warning: Event {event_id} not found for Link {link['record_id']}")
        continue
        
    event = event_row.iloc[0]
    event_date = event['observation_date']
    if pd.isna(event_date):
        continue

    # 2. Determine Parameters
    indicator_target = link['indicator'] # The code, e.g., ACC_OWNERSHIP
    
    # Direction
    direction_map = {'increase': 1, 'decrease': -1, 'stabilize': 0, 'mixed': 0}
    direction = direction_map.get(link.get('impact_direction', 'increase'), 1)
    
    # Magnitude (Use direct estimate if available, else heuristic map)
    if pd.notna(link.get('impact_estimate')):
        magnitude = float(link['impact_estimate'])
    else:
        magnitude = get_magnitude_numeric(link.get('impact_magnitude'))

    # Lag
    lag_months = float(link['lag_months']) if pd.notna(link.get('lag_months')) else 0
    start_date = event_date + timedelta(days=lag_months*30)
    
    # 3. Calculate Series
    # We want a series aligned with 'timeline'
    # Effect = Direction * Magnitude * RampFactor
    
    series_name = f"{event['indicator_code']}_on_{indicator_target}"
    
    col_values = []
    for t in timeline:
        factor = calculate_ramp_factor(t, start_date)
        val = direction * magnitude * factor
        col_values.append(val)
        
    modeling_df[series_name] = col_values
    
    impact_effects.append({
        'event_name': event.get('indicator', 'Unknown Event'),
        'event_code': event['indicator_code'],
        'target_indicator': indicator_target,
        'magnitude': magnitude,
        'direction': direction,
        'lag_months': lag_months,
        'series_name': series_name
    })

print(" Modeled effects for:")
pd.DataFrame(impact_effects)[['event_code', 'target_indicator', 'magnitude', 'direction']]

 Modeled effects for:


Unnamed: 0,event_code,target_indicator,magnitude,direction
0,EVT_FX_LIBERAL,,0.1,1
1,EVT_MELA_LAUNCH,,0.2,1


## Step 4: Aggregation and Matrix Construction

We now sum the effects per indicator to see the **Total Event-Driven Impact**.

In [5]:
# Group effects by target indicator
indicators = list(set([x['target_indicator'] for x in impact_effects]))

composite_effects = pd.DataFrame({'date': timeline})

for ind in indicators:
    # Find all columns for this indicator
    relevant_cols = [x['series_name'] for x in impact_effects if x['target_indicator'] == ind]
    if relevant_cols:
        composite_effects[f"Total_Effect_{ind}"] = modeling_df[relevant_cols].sum(axis=1)

# Preview
composite_effects.tail()

Unnamed: 0,date
127,2030-08-31
128,2030-09-30
129,2030-10-31
130,2030-11-30
131,2030-12-31


## Step 5: Visualization

Let's visualize the rollout of these impacts over time.

In [6]:
def plot_effects(indicator_code):
    col_name = f"Total_Effect_{indicator_code}"
    if col_name not in composite_effects.columns:
        print(f"No effects modeled for {indicator_code}")
        return
        
    plt.figure(figsize=(10, 5))
    plt.plot(composite_effects['date'], composite_effects[col_name], label=f"Modeled Impact: {indicator_code}", linewidth=2.5)
    plt.title(f"Cumulative Event Impacts on {indicator_code}")
    plt.ylabel("Impact Magnitude (Addon)")
    plt.legend()
    plt.grid(True)
    plt.show()

# Plot for Account Ownership and Digital Usage if available
plot_effects('ACC_OWNERSHIP')
plot_effects('USG_DIGITAL_PAY')

No effects modeled for ACC_OWNERSHIP
No effects modeled for USG_DIGITAL_PAY


## Step 6: Create Association Matrix Artifact

We need a static Event (Rows) x Indicator (Cols) matrix summarizing the **Net Magnitude**.

In [7]:
# Pivot impact_effects to create the matrix
effect_summary = pd.DataFrame(impact_effects)

if not effect_summary.empty:
    # We calculate 'Net Impact' as Direction * Magnitude
    effect_summary['net_impact'] = effect_summary['direction'] * effect_summary['magnitude']
    
    matrix = effect_summary.pivot_table(
        index='event_name', 
        columns='target_indicator', 
        values='net_impact', 
        aggfunc='sum'
    ).fillna(0)
    
    # Sort cols and rows for readability if needed
    print(matrix)
    
    # Save
    matrix.to_csv('../data/processed/event_indicator_matrix.csv')
    print("\nMatrix saved to ../data/processed/event_indicator_matrix.csv")
else:
    print("No effects to populate matrix.")

Empty DataFrame
Columns: []
Index: []

Matrix saved to ../data/processed/event_indicator_matrix.csv
