# Task 3: Event Impact Modeling (Enhanced)

## Objectives
- Model how events (policies, product launches, infrastructure investments) affect financial inclusion indicators
- Build event-indicator association matrix with uncertainty bounds
- Validate model against historical data with counterfactual analysis
- Create event interaction model and scenario framework
- Document methodology and expert validation

## Enhanced Methodology
- **Temporal Dynamics**: S-curve adoption patterns for new technologies
- **Event Classification**: Policy, Product, Infrastructure categories
- **Effect Duration**: Short-term (0-12m), Medium-term (1-3y), Long-term (3+y)
- **Uncertainty Quantification**: Monte Carlo simulation for parameter uncertainty

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from pathlib import Path
import warnings
from scipy import stats
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

print("Enhanced libraries loaded successfully")
print("✅ Statistical modeling capabilities")
print("✅ Uncertainty quantification tools")
print("✅ Machine learning for impact estimation")

Enhanced libraries loaded successfully
✅ Statistical modeling capabilities
✅ Uncertainty quantification tools
✅ Machine learning for impact estimation


In [3]:
# Load and prepare enhanced dataset
df = pd.read_csv('../data/processed/ethiopia_fi_enriched_data.csv')
print(f"Dataset shape: {df.shape}")
print(f"Record types:")
print(df['record_type'].value_counts())

# Extract different record types
observations = df[df['record_type'] == 'observation'].copy()
events = df[df['record_type'] == 'event'].copy()
impact_links = df[df['record_type'] == 'impact_link'].copy()
targets = df[df['record_type'] == 'target'].copy()

# Enhanced data preparation
# Safely parse dates: coerce unparsable strings to NaT and keep raw copies for debugging
events['observation_date_raw'] = events['observation_date']
events['observation_date'] = pd.to_datetime(events['observation_date'], errors='coerce')
observations['observation_date_raw'] = observations['observation_date']
observations['observation_date'] = pd.to_datetime(observations['observation_date'], errors='coerce')
# Report a few unparsable samples (if any) to help debugging
unparsable_events = events[events['observation_date'].isna()]['observation_date_raw'].dropna().unique()
if len(unparsable_events):
    print('Unparsable event observation_date samples:', unparsable_events[:10])
unparsable_obs = observations[observations['observation_date'].isna()]['observation_date_raw'].dropna().unique()
if len(unparsable_obs):
    print('Unparsable observation observation_date samples:', unparsable_obs[:10])

# Clean impact data
impact_links['impact_magnitude'] = pd.to_numeric(impact_links['impact_magnitude'], errors='coerce')
impact_links['lag_months'] = pd.to_numeric(impact_links['lag_months'], errors='coerce')
events['record_id_num'] = pd.to_numeric(events['record_id'], errors='coerce')
impact_links['parent_id_num'] = pd.to_numeric(impact_links['parent_id'], errors='coerce')

print(f"Data preparation complete:")
print(f"Observations: {len(observations)}")
print(f"Events: {len(events)}")
print(f"Impact links: {len(impact_links)}")
print(f"Targets: {len(targets)}")

Dataset shape: (29, 21)
Record types:
record_type
observation    13
event           7
impact_link     7
target          2
Name: count, dtype: int64
Unparsable event observation_date samples: <StringArray>
[            'Ethio Telecom',                 'Safaricom',
        'Safaricom Ethiopia', 'National Bank of Ethiopia',
                 'EthSwitch']
Length: 5, dtype: str
Data preparation complete:
Observations: 13
Events: 7
Impact links: 7
Targets: 2


## Enhanced Event Classification System

### Event Categories:
- **Policy Events**: Regulatory changes, central bank decisions
- **Product Events**: Service launches, new product introductions
- **Infrastructure Events**: Network deployments, technology upgrades
- **Market Events**: Competition entry, partnerships

### Impact Duration Classification:
- **Short-term**: 0-12 months (immediate market reaction)
- **Medium-term**: 1-3 years (adoption and maturation)
- **Long-term**: 3+ years (structural changes)

In [4]:
# Classify events by type and duration
def classify_event_type(event_name):
    """Classify events by type"""
    event_name_lower = event_name.lower()
    if any(keyword in event_name_lower for keyword in ['regulation', 'policy', 'nfis', 'central bank']):
        return 'Policy'
    elif any(keyword in event_name_lower for keyword in ['launch', 'product', 'service', 'telebirr', 'm-pesa']):
        return 'Product'
    elif any(keyword in event_name_lower for keyword in ['network', '4g', 'infrastructure', 'coverage']):
        return 'Infrastructure'
    elif any(keyword in event_name_lower for keyword in ['entry', 'market', 'safaricom', 'competition']):
        return 'Market'
    else:
        return 'Other'

def estimate_impact_duration(event_type, magnitude):
    """Estimate impact duration based on event type and magnitude"""
    if event_type == 'Policy':
        return 'Long-term'  # 3+ years
    elif event_type == 'Product':
        return 'Medium-term'  # 1-3 years
    elif event_type == 'Infrastructure':
        return 'Long-term'  # 3+ years
    elif event_type == 'Market':
        return 'Medium-term'  # 1-3 years
    else:
        return 'Medium-term'

# Apply classifications
events['event_type'] = events['indicator'].apply(classify_event_type)
events['impact_duration'] = events.apply(lambda row: estimate_impact_duration(row['event_type'], 0.1), axis=1)

print("Event Classification:")
print(events[['indicator', 'event_type', 'impact_duration', 'observation_date']].to_string(index=False))

Event Classification:
              indicator     event_type impact_duration observation_date
         Product Launch        Product     Medium-term              NaT
           Market Entry         Market     Medium-term              NaT
           Market Entry         Market     Medium-term              NaT
                 Policy         Policy       Long-term              NaT
         Infrastructure Infrastructure       Long-term              NaT
Mobile Money Regulation         Policy       Long-term       2022-03-15
      4G Network Launch        Product     Medium-term       2023-01-20


In [5]:
# Build Enhanced Event-Indicator Association Matrix
print("=== BUILDING ENHANCED EVENT-INDICATOR MATRIX ===")

# Define key indicators for analysis
key_indicators = [
    'access_account_ownership',
    'access_account_male',
    'access_account_female',
    'usage_mm_account',
    'usage_digital_payment',
    'usage_wages_account',
    'infra_mobile_penetration',
    'infra_4g_coverage',
    'infra_smartphone_penetration'
]

# Join impact links with events
impact_with_events = impact_links.merge(
    events[['record_id_num', 'indicator', 'observation_date', 'event_type', 'impact_duration']],
    left_on='parent_id_num',
    right_on='record_id_num',
    how='left',
    suffixes=('_link', '_event')
)

# Create enhanced matrix with uncertainty bounds
event_names = impact_with_events['indicator_event'].unique()
association_matrix = pd.DataFrame(0.0, index=event_names, columns=key_indicators)
confidence_matrix = pd.DataFrame(0.0, index=event_names, columns=key_indicators)

# Fill matrices with impact magnitudes and confidence scores
for _, row in impact_with_events.iterrows():
    event = row['indicator_event']
    indicator = row['related_indicator']
    magnitude = row['impact_magnitude']
    direction = row['impact_direction']
    confidence = 0.7  # Base confidence
    
    # Adjust confidence based on data source
    if row['source_name'] and 'Global Findex' in str(row['source_name']):
        confidence = 0.9
    elif row['event_type'] == 'Policy':
        confidence = 0.8
    elif row['event_type'] == 'Product':
        confidence = 0.75
    
    if event in association_matrix.index and indicator in association_matrix.columns:
        # Apply direction sign
        if str(direction).lower() == 'positive':
            signed_magnitude = magnitude
        elif str(direction).lower() == 'negative':
            signed_magnitude = -magnitude
        else:
            signed_magnitude = magnitude
        
        association_matrix.loc[event, indicator] = signed_magnitude
        confidence_matrix.loc[event, indicator] = confidence

print("Event-Indicator Association Matrix:")
print(association_matrix.round(3))
print("Confidence Matrix:")
print(confidence_matrix.round(2))

=== BUILDING ENHANCED EVENT-INDICATOR MATRIX ===
Event-Indicator Association Matrix:
                   access_account_ownership  access_account_male  \
NaN                                     0.0                  0.0   
4G Network Launch                       0.0                  0.0   

                   access_account_female  usage_mm_account  \
NaN                                  0.0               0.2   
4G Network Launch                    0.0               0.2   

                   usage_digital_payment  usage_wages_account  \
NaN                                 0.12                  0.0   
4G Network Launch                   0.00                  0.0   

                   infra_mobile_penetration  infra_4g_coverage  \
NaN                                     0.0                0.0   
4G Network Launch                       0.0                0.0   

                   infra_smartphone_penetration  
NaN                                         0.0  
4G Network Launch           

In [6]:
# Enhanced Event Impact Modeling with S-Curve Adoption
print("=== ENHANCED EVENT IMPACT MODELING ===")

def s_curve_adoption(t, k=1, m=0, a=1, b=1):
    """
    S-curve adoption function
    t: time since event
    k: maximum impact
    m: time of maximum growth
    a: growth rate parameter
    b: asymmetry parameter
    """
    return k / (1 + np.exp(-a * (t - m)))

def apply_enhanced_event_effect(base_value, magnitude, lag_months, current_date, event_date, event_type):
    """
    Apply enhanced event effect with S-curve adoption
    """
    # Convert dates
    if isinstance(current_date, str):
        current_date = pd.to_datetime(current_date)
    if isinstance(event_date, str):
        event_date = pd.to_datetime(event_date)
    
    # Calculate months since event
    months_since_event = (current_date - event_date).days / 30.44
    
    # If event hasn't occurred yet, no effect
    if months_since_event < lag_months:
        return base_value
    
    # Effective months after lag
    effective_months = months_since_event - lag_months
    
    # S-curve parameters by event type
    if event_type == 'Product':
        # Fast adoption for new products
        k, m, a, b = magnitude, 6, 0.5, 1
    elif event_type == 'Policy':
        # Slower, sustained adoption for policies
        k, m, a, b = magnitude, 12, 0.3, 1
    elif event_type == 'Infrastructure':
        # Gradual adoption for infrastructure
        k, m, a, b = magnitude, 18, 0.25, 1
    else:
        # Default parameters
        k, m, a, b = magnitude, 9, 0.4, 1
    
    # Calculate adoption factor
    adoption_factor = s_curve_adoption(effective_months, k, m, a, b)
    
    # Apply effect
    adjusted_value = base_value * (1 + adoption_factor)
    
    return adjusted_value

# Test enhanced model
base_val = 0.46
mag = 0.15
lag = 6
event_dt = pd.to_datetime('2021-05-01')

test_dates = ['2021-04-01', '2021-08-01', '2021-12-01', '2022-05-01', '2023-05-01']
print("Enhanced event effect test (Product launch):")
for date in test_dates:
    adjusted = apply_enhanced_event_effect(base_val, mag, lag, date, event_dt, 'Product')
    print(f"{date}: {base_val:.3f} -> {adjusted:.3f} (change: {adjusted-base_val:+.3f})")

=== ENHANCED EVENT IMPACT MODELING ===
Enhanced event effect test (Product launch):
2021-04-01: 0.460 -> 0.460 (change: +0.000)
2021-08-01: 0.460 -> 0.460 (change: +0.000)
2021-12-01: 0.460 -> 0.465 (change: +0.005)
2022-05-01: 0.460 -> 0.494 (change: +0.034)
2023-05-01: 0.460 -> 0.529 (change: +0.069)


In [12]:
# Telebirr Launch Validation with Counterfactual Analysis
print("=== TELEBIRR LAUNCH VALIDATION WITH COUNTERFACTUAL ===")

# Get Telebirr event details
telebirr_event = events[events['indicator'].str.contains('Telebirr', case=False, na=False)]

if not telebirr_event.empty:
    # Safely parse telebirr date (coerce unparsable values to NaT)
    telebirr_date = pd.to_datetime(telebirr_event.iloc[0]['observation_date'], errors='coerce')
    telebirr_type = telebirr_event.iloc[0]['event_type'] if 'event_type' in telebirr_event.columns else 'Unknown'
    if pd.isna(telebirr_date):
        print('Telebirr event found but observation_date is unparsable; skipping validation')
    else:
        print(f"Telebirr launch: {telebirr_date.date()} ({telebirr_type} event)")
        
        # Get actual mobile money data
        mm_data = observations[observations['indicator_code'] == 'usage_mm_account'].copy()
        mm_data['observation_date'] = pd.to_datetime(mm_data['observation_date'], errors='coerce')
        mm_data = mm_data.sort_values('observation_date')
        # Report any unparsable mm_data dates
        if 'observation_date_raw' in mm_data.columns:
            unparsable_mm = mm_data[mm_data['observation_date'].isna()]['observation_date_raw'].dropna().unique()
        else:
            unparsable_mm = mm_data[mm_data['observation_date'].isna()]['observation_date'].dropna().unique()
        if len(unparsable_mm):
            print('Unparsable mm_data observation_date samples:', unparsable_mm[:10])
        
        print("Actual mobile money adoption:")
        for _, row in mm_data.dropna(subset=['observation_date']).iterrows():
            print(f"{row['observation_date'].year}: {row['value_numeric']:.3f} ({row['value_numeric']*100:.1f}%)")
        
        # Get pre-event trend for counterfactual
        pre_event_data = mm_data[mm_data['observation_date'] < telebirr_date]
        
        if len(pre_event_data) >= 2:
            # Calculate pre-event growth rate
            pre_event_data = pre_event_data.sort_values('observation_date')
            pre_values = pre_event_data['value_numeric'].values
            pre_years = pre_event_data['observation_date'].dt.year.values
            
            # Simple linear trend for counterfactual
            growth_rate = (pre_values[-1] - pre_values[0]) / (pre_years[-1] - pre_years[0])
            
            print(f"Pre-event growth rate: {growth_rate:.4f} per year")
            
            # Project counterfactual (what would have happened without Telebirr)
            baseline_2021 = pre_values[-1]
            counterfactual_2024 = baseline_2021 + growth_rate * 3  # 3 years to 2024
            actual_2024_series = mm_data[mm_data['observation_date'].dt.year == 2024]['value_numeric']
            if actual_2024_series.empty:
                print('No actual 2024 mobile-money data available; cannot compute impact')
            else:
                actual_2024 = actual_2024_series.iloc[0]
                # Calculate actual impact
                actual_impact = actual_2024 - counterfactual_2024
                impact_percentage = (actual_impact / counterfactual_2024) * 100 if counterfactual_2024 != 0 else float('nan')
                
                print(f"Counterfactual Analysis:")
                print(f"2021 baseline: {baseline_2021:.3f} ({baseline_2021*100:.1f}%)")
                print(f"Counterfactual 2024: {counterfactual_2024:.3f} ({counterfactual_2024*100:.1f}%)")
                print(f"Actual 2024: {actual_2024:.3f} ({actual_2024*100:.1f}%)")
                print(f"Attributable impact: {actual_impact:+.3f} ({impact_percentage:+.1f}%)")
                
                # Compare with model prediction
                telebirr_impacts = impact_with_events[impact_with_events['indicator_event'].str.contains('Telebirr', case=False, na=False)]
                mm_impact = telebirr_impacts[telebirr_impacts['related_indicator'] == 'usage_mm_account']
                
                if not mm_impact.empty:
                    model_magnitude = mm_impact.iloc[0]['impact_magnitude']
                    model_lag = mm_impact.iloc[0]['lag_months']
                    
                    predicted_2024 = apply_enhanced_event_effect(
                        counterfactual_2024, model_magnitude, model_lag, 
                        '2024-12-31', telebirr_date, telebirr_type
                    )
                    
                    predicted_impact = predicted_2024 - counterfactual_2024
                    model_accuracy = np.nan if actual_impact == 0 else 1 - abs(predicted_impact - actual_impact) / abs(actual_impact)
                    
                    print(f"Model Validation:")
                    print(f"Model prediction: {predicted_2024:.3f} ({predicted_2024*100:.1f}%)")
                    print(f"Predicted impact: {predicted_impact:+.3f} ({predicted_impact/counterfactual_2024*100:+.1f}%)")
                    print(f"Model accuracy: {model_accuracy if np.isnan(model_accuracy) else '{:.1%}'.format(model_accuracy)}")
                    
                    # Visualization
                    years = [pre_event_data['observation_date'].dt.year.min(), telebirr_date.year, 2024]
                    actual_values = [pre_values[0], baseline_2021, actual_2024]
                    counterfactual_values = [pre_values[0], baseline_2021, counterfactual_2024]
                    predicted_values = [pre_values[0], baseline_2021, predicted_2024]
                    
                    plt.figure(figsize=(10, 6))
                    plt.plot(years, actual_values, 'o-', label='Actual', linewidth=2, markersize=8)
                    plt.plot(years, counterfactual_values, '--', label='Counterfactual', linewidth=2)
                    plt.plot(years, predicted_values, ':', label='Model Prediction', linewidth=2)
                    plt.axvline(x=telebirr_date.year, color='red', linestyle='-', alpha=0.3, label='Telebirr Launch')
                    plt.xlabel('Year')
                    plt.ylabel('Mobile Money Account Ownership')
                    plt.title('Telebirr Launch Impact Validation')
                    plt.legend()
                    plt.grid(True, alpha=0.3)
                    plt.show()
        else:
            print("Insufficient pre-event data for counterfactual analysis")
else:
    print("Telebirr event not found in dataset")

=== TELEBIRR LAUNCH VALIDATION WITH COUNTERFACTUAL ===
Telebirr event not found in dataset
