# Task 4: Forecasting Access and Usage (2025–2027)

**Objective:** Forecast **Account Ownership (Access)** and **Digital Payment Usage** for 2025–2027 with trend regression, event-augmented adjustments, scenario analysis, and quantified uncertainty.

**Deliverables:**
- Define targets (Access, Usage)
- Approach: trend regression, event-augmented model, scenario analysis (given sparse Findex data)
- Forecasts: baseline (trend), with events, scenarios (optimistic, base, pessimistic)
- Uncertainty: confidence intervals, scenario ranges, limitations
- Interpretation: predictions, largest event impacts, key uncertainties

---
## 1. Define Targets

- **Account Ownership Rate (Access):** % of adults (15+) with an account at a financial institution or using mobile money (past 12 months). *Indicator: `ACC_OWNERSHIP`.*
- **Digital Payment Usage:** % of adults who made or received a digital payment (past 12 months). *Indicator: `USG_DIGITAL_PAY`.*

In [None]:
import sys
from pathlib import Path
ROOT = Path.cwd().parent if "notebooks" in str(Path.cwd()) else Path.cwd()
sys.path.insert(0, str(ROOT))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from src.data import load_unified_data, enrich_unified_data
from src.analysis import get_access_series, get_usage_series, get_events_timeline
from src.models import (
    build_impact_matrix,
    build_event_indicator_association_matrix,
    forecast_access_usage,
    forecast_access_usage_with_uncertainty,
)

pd.set_option("display.max_columns", 24)
sns.set_theme(style="whitegrid")

---
## 2. Load Data and Inspect Sparse Series

We have **5 Findex-style points** over ~13 years for Access (2011, 2014, 2017, 2021, 2024) and fewer for Usage (2021, 2024). This sparsity motivates:
- **Trend regression** (linear)
- **Event-augmented model** (trend + lagged event effects)
- **Scenario analysis** to span uncertainty

In [None]:
df = load_unified_data()
df = enrich_unified_data(df)

access_hist = get_access_series(df, indicator_code="ACC_OWNERSHIP")
usage_hist = get_usage_series(df, indicator_code="USG_DIGITAL_PAY")

print("Access (ACC_OWNERSHIP) — by year:")
acc_by_year = access_hist.copy()
acc_by_year.index = pd.to_datetime(acc_by_year.index).year
print(acc_by_year.groupby(level=0).mean().to_string())
print("\nUsage (USG_DIGITAL_PAY) — by year:")
usage_by_year = usage_hist.copy()
usage_by_year.index = pd.to_datetime(usage_by_year.index).year
print(usage_by_year.groupby(level=0).mean().to_string())
print("\nSparse data: 5 Findex points for Access over 13 years; 2 for Usage. CIs will be wide.")

---
## 3. Approach: Trend, Event-Augmented, and Scenarios

1. **Baseline:** Linear trend regression on historical years; extrapolate 2025–2027.
2. **With events:** Baseline + cumulative event effects from `impact_link` (lagged by `lag_months`).
3. **Scenarios:** Same trend; event effects scaled — **optimistic** (1.2×), **base** (1×), **pessimistic** (0.6×).
4. **Uncertainty:** 95% prediction intervals from trend regression; scenario ranges; explicit limitations.

In [None]:
FORECAST_YEARS = [2025, 2026, 2027]
confidence = 0.95

access_f, usage_f = forecast_access_usage_with_uncertainty(
    df,
    forecast_years=FORECAST_YEARS,
    apply_events=True,
    trend_method="linear",
    confidence=confidence,
    scenario_optimistic_mult=1.2,
    scenario_pessimistic_mult=0.6,
)

print("Forecasts with uncertainty (confidence=%.0f%%)" % (confidence * 100))
print("Access (Account Ownership):")
display(access_f)
print("Usage (Digital Payment Adoption):")
display(usage_f)

---
## 4. Forecast Table with Confidence Intervals

Summary table: year, baseline (trend only), event-adjusted (base), 95% CI (lower, upper), and scenario range (optimistic / pessimistic).

In [None]:
def format_forecast_table(forecast_df: pd.DataFrame, title: str) -> pd.DataFrame:
    """Build a display table: year, baseline, adjusted, CI, scenarios."""
    out = forecast_df[
        [
            "year",
            "value_baseline",
            "value_adjusted",
            "ci_lower",
            "ci_upper",
            "scenario_optimistic",
            "scenario_base",
            "scenario_pessimistic",
        ]
    ].copy()
    out = out.rename(columns={
        "value_baseline": "Baseline (trend)",
        "value_adjusted": "With events (base)",
        "ci_lower": "95% CI lower",
        "ci_upper": "95% CI upper",
        "scenario_optimistic": "Scenario: Optimistic",
        "scenario_base": "Scenario: Base",
        "scenario_pessimistic": "Scenario: Pessimistic",
    })
    for c in out.select_dtypes(include=[np.floating]).columns:
        out[c] = out[c].round(2)
    return out

access_table = format_forecast_table(access_f, "Account Ownership (Access)")
usage_table = format_forecast_table(usage_f, "Digital Payment Usage")

print("=== Forecast table: Account Ownership (Access) — % ===")
display(access_table)
print("\n=== Forecast table: Digital Payment Usage — % ===")
display(usage_table)

---
## 5. Scenario Visualization

Plot historical series + baseline trend, event-adjusted (base), and scenario bands (optimistic / pessimistic) for 2025–2027.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

def plot_forecast_with_scenarios(ax, hist_series, forecast_df, title, color="C0"):
    years_hist = pd.to_datetime(hist_series.index).year if hasattr(hist_series.index, "year") else hist_series.index
    if hasattr(hist_series.index, "year"):
        by_year = hist_series.groupby(pd.to_datetime(hist_series.index).year).mean()
    else:
        by_year = hist_series
    ax.scatter(by_year.index, by_year.values, color=color, s=60, zorder=3, label="Historical")
    y = forecast_df["year"]
    ax.plot(y, forecast_df["value_baseline"], "--", color="gray", label="Baseline (trend)")
    ax.plot(y, forecast_df["value_adjusted"], "-", color=color, linewidth=2, label="With events (base)")
    ax.fill_between(y, forecast_df["ci_lower"], forecast_df["ci_upper"], alpha=0.2, color=color, label="95% CI (trend)")
    ax.fill_between(y, forecast_df["scenario_pessimistic"], forecast_df["scenario_optimistic"], alpha=0.15, color=color, label="Scenario range")
    ax.set_title(title)
    ax.set_xlabel("Year")
    ax.set_ylabel("%")
    ax.legend(loc="best", fontsize=8)
    ax.set_ylim(0, None)
    ax.grid(True, alpha=0.3)

plot_forecast_with_scenarios(axes[0], access_hist, access_f, "Account Ownership (Access)", color="#1f77b4")
plot_forecast_with_scenarios(axes[1], usage_hist, usage_f, "Digital Payment Usage", color="#2ca02c")
plt.tight_layout()
plt.suptitle("Forecasts 2025–2027: Baseline, event-adjusted, 95% CI, scenario range", y=1.02)
plt.show()

---
## 6. Event Impact Summary (Largest Potential Impacts)

Which events have the largest estimated impact on Access and Usage in the forecast window?

In [None]:
impact_matrix = build_impact_matrix(df)
events = df[df["record_type"] == "event"].copy()
events["observation_date"] = pd.to_datetime(events["observation_date"])
impact_matrix["event_date"] = pd.to_datetime(impact_matrix["event_date"])

def add_months(ts, months):
    y = ts.year + (ts.month - 1 + months) // 12
    m = (ts.month - 1 + months) % 12 + 1
    return y

impact_matrix["effect_year"] = impact_matrix.apply(
    lambda r: add_months(r["event_date"], int(r.get("lag_months", 0) or 0)), axis=1
)
effect_pp = impact_matrix["impact_estimate"].astype(float)
effect_pp = effect_pp.where(impact_matrix["impact_direction"].str.lower() != "decrease", -effect_pp)
impact_matrix["effect_pp"] = effect_pp

in_window = impact_matrix[impact_matrix["effect_year"].between(2025, 2027)]
summary = in_window.groupby(["event_id", "related_indicator"]).agg({"effect_pp": "sum"}).reset_index()
summary = summary.merge(events[["record_id", "indicator"]].rename(columns={"record_id": "event_id"}), on="event_id", how="left")
summary["indicator_name"] = summary["indicator"].fillna(summary["event_id"])
summary = summary.sort_values("effect_pp", key=abs, ascending=False)

print("Event effects in forecast window (2025–2027), by event and indicator (percentage points):")
display(summary[["event_id", "indicator_name", "related_indicator", "effect_pp"]])

---
## 7. Written Interpretation

### What does the model predict?
- **Access (Account Ownership):** The linear trend plus event effects yields a **base forecast** that continues the historical upward path; the exact level depends on how many event effects fall in 2025–2027 (e.g. Telebirr, interoperability, Fayda). The **baseline (trend only)** reflects pure continuation of past growth; **with events** adds the cumulative impact of product launches and policy (from impact_link).
- **Usage (Digital Payment):** Similarly, **base** = trend + event effects. Usage typically has fewer historical points, so trend uncertainty is higher; event effects (e.g. NBE interoperability, digital ID) can add several percentage points in the forecast window.

### What events have the largest potential impact?
- Events that affect **ACC_OWNERSHIP** in 2025–2027 (e.g. Telebirr, Fayda digital ID) have the largest impact on **Access**.
- Events that affect **USG_DIGITAL_PAY** (e.g. NBE Interoperability, Fayda) have the largest impact on **Usage**.
- The table above lists effect sizes (percentage points) by event and indicator; the ones with the largest |effect_pp| in the window dominate.

### What are the key uncertainties?
- **Sparse data:** Only 5 Findex points over 13 years (and 2 for Usage). Trend estimates and **confidence intervals are wide**; extrapolation uncertainty is high.
- **Event estimates:** Impact magnitudes and lags are from literature/comparable countries, not estimated from Ethiopian data; **magnitude and timing** are uncertain.
- **Additivity:** Effects are summed; **overlap and saturation** are not modeled (e.g. ceiling at 100%, interaction between events).
- **Scenarios:** Optimistic/pessimistic scales **event effects** only; trend uncertainty is reflected in the **CI**, not in the scenario band.

### Limitations (explicit acknowledgment)
- Forecasts are **illustrative** and not official NBE or World Bank projections.
- No formal econometric estimation of event effects (e.g. DiD); estimates are expert/literature-driven.
- Confidence intervals are **trend-only** (prediction interval from linear regression); they do not include uncertainty in event impacts.
- Scenario ranges reflect **alternative realizations of event impact strength**, not full parameter uncertainty.

In [None]:
# Optional: export forecast table to CSV for reporting
out_dir = ROOT / "data" / "processed"
out_dir.mkdir(parents=True, exist_ok=True)
access_table.to_csv(out_dir / "forecast_access_2025_2027.csv", index=False)
usage_table.to_csv(out_dir / "forecast_usage_2025_2027.csv", index=False)
print("Forecast tables saved to data/processed/forecast_access_2025_2027.csv and forecast_usage_2025_2027.csv")