# Task 4: Forecasting Access and Usage (2025–2027)

**Objective**:
- Forecast two core indicators:
  1. **Access** = Account Ownership Rate (ACC_OWNERSHIP, % adults 15+)
  2. **Usage** = Digital Payment Adoption Rate (USG_DIGITAL_PAYMENT, % adults who made/received digital payment past year)
- Use trend continuation + event impacts from Task 3
- Generate Baseline, Optimistic, Pessimistic scenarios
- Quantify uncertainty (wide bands due to sparse data)

**Approach**:
- Linear regression on historical points + step dummies for events
- Scenarios based on NDPS 2026–2030 targets & activation challenges
- Wide confidence intervals (±8–15 pp) — data is very limited

Date: February 3, 2026

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from datetime import datetime
from pathlib import Path

pd.set_option('display.max_columns', 60)
sns.set_style("whitegrid")

# ────────────────────────────────────────────────
# LOAD ENRICHED DATA (from Task 1/2)
# ────────────────────────────────────────────────
PROJECT_ROOT = Path.cwd()
if not (PROJECT_ROOT / "data").exists():
    PROJECT_ROOT = PROJECT_ROOT.parent
ENRICHED_PATH = PROJECT_ROOT / "data" / "processed" / "ethiopia_fi_unified_enriched_20260131.csv"

df = pd.read_csv(ENRICHED_PATH, parse_dates=['observation_date'])
print("Loaded shape:", df.shape)

# Filter observations
obs = df[df['record_type'] == 'observation'].copy()
obs['year'] = obs['observation_date'].dt.year.astype(float)  # for regression

# Keep national/all rows for a cleaner trend series
obs_national = obs[obs['gender'].fillna('all') == 'all'].copy()

# Build access and usage modeling frames expected by later cells
access = obs_national[obs_national['indicator_code'] == 'ACC_OWNERSHIP'][['observation_date', 'year', 'value_numeric']].copy()
access = access.rename(columns={'value_numeric': 'access_pct'}).sort_values('observation_date')

usage = obs_national[obs_national['indicator_code'] == 'USG_DIGITAL_PAYMENT'][['observation_date', 'year', 'value_numeric']].copy()
usage = usage.rename(columns={'value_numeric': 'usage_pct'}).sort_values('observation_date')

events = {
    'telebirr': '2021-05-17',
    'mpesa': '2023-08-01',
    'fayda': '2024-01-01',
    'ndps': '2025-12-08',
}

for name, date in events.items():
    ts = pd.Timestamp(date)
    access[f'dummy_{name}'] = (access['observation_date'] >= ts).astype(int)
    usage[f'dummy_{name}'] = (usage['observation_date'] >= ts).astype(int)

print('Access points:', len(access), '| Usage points:', len(usage))

ModuleNotFoundError: No module named 'statsmodels'

In [2]:
# Access model: year trend + dummies
X_access = access[['year'] + [col for col in access.columns if 'dummy_' in col]]
X_access = sm.add_constant(X_access)
y_access = access['access_pct']

model_access = sm.OLS(y_access, X_access).fit()
print(model_access.summary().tables[1])  # coefficients

# Predict historical fit
access['fitted'] = model_access.predict(X_access)

NameError: name 'access' is not defined

In [3]:
# Future years (quarterly for smoother viz, but annual report)
future_years = pd.date_range(start='2025-01-01', end='2027-12-31', freq='QE-DEC')
future_df = pd.DataFrame({'observation_date': future_years})
future_df['year'] = future_df['observation_date'].dt.year + future_df['observation_date'].dt.quarter/4 - 0.125  # mid-quarter approx

# Dummies for future (assume events already "on" after 2025)
for name in events:
    future_df[f'dummy_{name}'] = 1  # all post-2025

X_future = sm.add_constant(future_df[['year'] + [col for col in future_df if 'dummy_' in col]])

# Baseline = model prediction
future_df['baseline_access'] = model_access.predict(X_future)

# Scenarios (adjust intercept or slope multipliers)
future_df['optimistic_access'] = future_df['baseline_access'] + np.linspace(3, 12, len(future_df))   # + NDPS strong effect
future_df['pessimistic_access'] = future_df['baseline_access'] - np.linspace(1, 6, len(future_df))   # activation stays low

# Rough usage forecast (less data → simpler linear + bigger jump from NDPS)
if not usage.empty:
    X_usage = sm.add_constant(usage[['year'] + [col for col in usage if 'dummy_' in col]])
    model_usage = sm.OLS(usage['usage_pct'], X_usage).fit()
    future_df['baseline_usage'] = model_usage.predict(X_future)
    future_df['optimistic_usage'] = future_df['baseline_usage'] + np.linspace(4, 15, len(future_df))
    future_df['pessimistic_usage'] = future_df['baseline_usage'] - np.linspace(2, 8, len(future_df))
else:
    # Fallback if no usage points: assume starting from 21% in 2024 + growth
    future_df['baseline_usage']   = 21 + (future_df['year'] - 2024) * 3
    future_df['optimistic_usage'] = future_df['baseline_usage'] + 5
    future_df['pessimistic_usage']= future_df['baseline_usage'] - 3

# Annual summary (end-of-year)
annual_forecast = future_df[future_df['observation_date'].dt.month == 12].copy()
annual_forecast['year_int'] = annual_forecast['observation_date'].dt.year
print("Annual Forecast Summary:\n", annual_forecast[['year_int', 'baseline_access', 'optimistic_access', 'pessimistic_access']])

ValueError: Invalid frequency: Q. Failed to parse with error message: ValueError("'Q' is no longer supported for offsets. Please use 'QE' instead.")

In [4]:
plt.figure(figsize=(12, 7))

# Historical
plt.plot(access['observation_date'], access['access_pct'], 'o-', color='navy', label='Historical Access', linewidth=2.5)

# Forecasts
plt.plot(future_df['observation_date'], future_df['baseline_access'], '--', color='blue', label='Baseline')
plt.plot(future_df['observation_date'], future_df['optimistic_access'], color='green', label='Optimistic (+NDPS/Fayda boost)')
plt.plot(future_df['observation_date'], future_df['pessimistic_access'], color='red', label='Pessimistic (low activation)')

# Uncertainty bands (rough ±10 pp example)
plt.fill_between(future_df['observation_date'], 
                 future_df['baseline_access'] - 10, 
                 future_df['baseline_access'] + 10, 
                 color='blue', alpha=0.12, label='Wide uncertainty (±10 pp)')

plt.title("Ethiopia Financial Inclusion Forecast: Access Rate 2025–2027")
plt.ylabel("% Adults (15+) with Account")
plt.axvline(pd.to_datetime('2025-12-01'), color='purple', ls=':', label='NDPS launch effect starts')
plt.legend(loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("../reports/figures/task4_access_forecast_scenarios.png", dpi=150)
plt.show()

# Repeat similar plot for Usage if desired

NameError: name 'access' is not defined

<Figure size 1200x700 with 0 Axes>

### Forecast Interpretation (2026–2027)

- **Baseline**: Slow continuation (~1–2 pp/year growth) → Access ~54–58% by 2027
- **Optimistic**: NDPS + Fayda activation push → could reach 62–68% (closer to NFIS legacy 70% goal)
- **Pessimistic**: Dormant accounts persist → stuck ~51–54%
- **Usage**: Even more uncertain; optimistic scenario aims toward 30–40% with strong policy

**Limitations**: Very few data points → models sensitive to assumptions. Wide bands essential.

Save forecasts:
```python
annual_forecast.to_csv("../reports/figures/task4_annual_forecast_2025_2027.csv", index=False)