# Synthetic Control (donor-mode) vs regression-based counterfactual

This notebook exists to show **why donor-based synthetic control is a different tool** than a regression-style counterfactual (impact-like).

We construct a synthetic example where:
- **Donor series** (untreated controls) are stable and track the treated unit in the pre-period.
- A **confounded covariate** changes at the intervention date (a common failure mode for regression-based causal time-series).

We then compare:
- **Impact-like** (regression on covariates + optional time features)
- **Synthetic Control (lite)** in **donor mode** (ridge-weighted donor combination)

In [None]:
import sys
from pathlib import Path

# Assumption for repo runs: notebook executed with CWD = repo root
repo_root = Path.cwd()
sys.path.insert(0, str(repo_root / "src"))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from tecore.causal import DataSpec, ImpactConfig, ImpactMethod, run_impact

## 1) Generate a donor-style synthetic dataset

- `y` is the treated unit outcome.
- `donor_1..donor_k` are untreated controls (not affected by the intervention).
- `marketing_spend` is **confounded**: it jumps at the intervention date (as if the campaign caused both spend and outcome).

In the real world, **marketing_spend** (and sometimes sessions/DAU proxies) often *moves because of the intervention*,
so using it as a covariate can bias counterfactual estimates.

In [None]:
rng = np.random.default_rng(42)

n_days = 200
start_date = pd.Timestamp("2025-01-01")
intervention_date = pd.Timestamp("2025-05-01")
dates = pd.date_range(start_date, periods=n_days, freq="D")
t0 = int((intervention_date - start_date).days)

t = np.arange(n_days)

# Latent common factor (trend + weekly seasonality)
trend = 0.03 * t
weekly = 2.0 * np.sin(2.0 * np.pi * t / 7.0)
latent = trend + weekly

# Donor series: noisy views of the latent factor (NOT treated)
n_donors = 5
donor_weights_true = rng.normal(loc=1.0, scale=0.2, size=n_donors)
donors = {}
for i in range(n_donors):
    donors[f"donor_{i+1}"] = 50 + donor_weights_true[i] * latent + rng.normal(0, 1.5, size=n_days)

# Treated baseline: a weighted combination of donors (plus some idiosyncratic noise)
W = rng.normal(loc=0.3, scale=0.15, size=n_donors)
W = np.clip(W, 0.05, None)
W = W / W.sum()

donor_mat = np.column_stack([donors[f"donor_{i+1}"] for i in range(n_donors)])
y_base = donor_mat @ W + rng.normal(0, 1.0, size=n_days)

# True intervention effect: level shift on y after T0
true_level_shift = 6.0
effect = np.zeros(n_days)
effect[t0:] = true_level_shift
y = y_base + effect

# Confounded covariate: also shifts at T0 (bad control)
marketing_spend = 100 + 0.2 * t + 10 * np.sin(2 * np.pi * t / 30) + rng.normal(0, 3.0, size=n_days)
marketing_spend[t0:] += 25  # confounding jump

# Additional "activity" proxies (also potentially impacted)
sessions = 1000 + 8 * latent + rng.normal(0, 15.0, size=n_days)
active_users = 300 + 3.5 * latent + rng.normal(0, 6.0, size=n_days)

df = pd.DataFrame(
    {
        "date": dates,
        "y": y,
        "marketing_spend": marketing_spend,
        "sessions": sessions,
        "active_users": active_users,
        **donors,
    }
)

df.head()

In [None]:
plt.figure(figsize=(11, 4))
plt.plot(df["date"], df["y"], label="treated y")
plt.axvline(intervention_date, linestyle="--", label="intervention")
plt.title("Treated series with a true level shift at intervention")
plt.legend()
plt.tight_layout()
plt.show()

plt.figure(figsize=(11, 3.5))
plt.plot(df["date"], df["marketing_spend"], label="marketing_spend (confounded)")
plt.axvline(intervention_date, linestyle="--")
plt.title("Confounded covariate shifts at intervention (bad control)")
plt.legend()
plt.tight_layout()
plt.show()

## 2) Fit two estimators

**A) Impact-like regression counterfactual (covariates)**
- Uses `sessions`, `active_users`, `marketing_spend`.
- Adds minimal time features (trend + day-of-week).

**B) Synthetic Control (donor mode)**
- Uses only donor series `donor_*` as controls.
- Typically disables extra time features to highlight donor matching.

In [None]:
# Shared config parts
cfg_common = dict(
    intervention_date=str(intervention_date.date()),
    ridge_alpha=1.0,
    bootstrap_iters=200,
    block_size=7,
    alpha=0.05,
    run_placebo=True,
    n_placebos=25,
    random_state=42,
    verbose=False,
)

# A) Impact-like (covariate regression)
spec_reg = DataSpec(
    date_col="date",
    y_col="y",
    x_cols=["sessions", "active_users", "marketing_spend"],
    freq="D",
    add_time_trend=True,
    add_day_of_week=True,
)
cfg_reg = ImpactConfig(method=ImpactMethod.CAUSAL_IMPACT_LIKE, **cfg_common)

# B) Synthetic control (donor mode)
donor_cols = [c for c in df.columns if c.startswith("donor_")]
spec_sc = DataSpec(
    date_col="date",
    y_col="y",
    x_cols=donor_cols,
    freq="D",
    add_time_trend=False,
    add_day_of_week=False,
)
cfg_sc = ImpactConfig(method=ImpactMethod.SYNTHETIC_CONTROL, **cfg_common)

res_reg = run_impact(df, spec_reg, cfg_reg)
res_sc  = run_impact(df, spec_sc, cfg_sc)

print("True daily level shift:", true_level_shift)
print("\nImpact-like (regression) summary:")
print(res_reg.summary())
print("\nSynthetic control (donor-mode) summary:")
print(res_sc.summary())

## 3) Compare counterfactuals and effects

In [None]:
eff_reg = res_reg.effect_series.copy()
eff_sc = res_sc.effect_series.copy()

for eff in (eff_reg, eff_sc):
    eff["date"] = pd.to_datetime(eff["date"])

plt.figure(figsize=(12, 4.2))
plt.plot(eff_reg["date"], eff_reg["y"], label="observed y", linewidth=2)
plt.plot(eff_reg["date"], eff_reg["y_hat"], label="impact-like y_hat")
plt.plot(eff_sc["date"], eff_sc["y_hat"], label="synthetic control y_hat")
plt.axvline(intervention_date, linestyle="--")
plt.title("Observed vs counterfactual (two estimators)")
plt.legend()
plt.tight_layout()
plt.show()

# point effect
plt.figure(figsize=(12, 3.8))
plt.plot(eff_reg["date"], eff_reg["effect"], label="impact-like effect")
plt.plot(eff_sc["date"], eff_sc["effect"], label="synthetic control effect")
plt.axhline(0.0, linestyle="--")
plt.axvline(intervention_date, linestyle="--")
plt.title("Point effect over time")
plt.legend()
plt.tight_layout()
plt.show()

## 4) Donor weights (interpretability)

In donor-mode, a key artifact is the set of donor weights / coefficients used to match the treated unit in the pre-period.

In [None]:
# `res_sc.model` is a dict produced by the underlying ridge regressor.
# We try to extract coefficients by feature name (donor columns).
model_sc = res_sc.model or {}

coef = model_sc.get("coef", None)
feature_names = model_sc.get("feature_names", donor_cols)

if coef is not None:
    w = pd.Series(coef, index=feature_names).sort_values(key=lambda s: s.abs(), ascending=False)
    display(w.head(10).to_frame("ridge_coef"))

    plt.figure(figsize=(10, 3.5))
    w.head(10).plot(kind="bar")
    plt.title("Top donor coefficients (ridge)")
    plt.tight_layout()
    plt.show()
else:
    print("No coefficients found in res_sc.model. Available keys:", list(model_sc.keys()))

## 5) Results summary (auto-generated)

In [None]:
from IPython.display import Markdown, display

def fmt(x):
    try:
        return f"{float(x):.3f}"
    except Exception:
        return str(x)

display(Markdown(f"""    ## Results summary

- **True effect (daily level shift):** `{true_level_shift:.2f}`
- **Impact-like (covariate regression):** point `{fmt(res_reg.point_effect)}`, cumulative `{fmt(res_reg.cum_effect)}`, relative `{fmt(res_reg.rel_effect)}`
- **Synthetic control (donor-mode):** point `{fmt(res_sc.point_effect)}`, cumulative `{fmt(res_sc.cum_effect)}`, relative `{fmt(res_sc.rel_effect)}`

**Interpretation**
- If the covariate `marketing_spend` shifts at the intervention date (confounding), regression-based counterfactuals can attribute part of that shift to the treatment.
- Donor-based synthetic control relies on untreated series to track common shocks and seasonality. When donors are valid and pre-fit is strong, it is often more robust to covariate confounding.

**Next checks**
- Inspect `res_sc.warnings` and placebo results for both estimators.
- Try sensitivity: remove the top-weight donor and re-fit (leave-one-donor-out).
"""))

## 6) Save artifacts (optional)

In [None]:
out_dir = repo_root / "out" / "notebook_10_synth_control_donor_mode"
out_dir.mkdir(parents=True, exist_ok=True)

eff_reg.to_csv(out_dir / "effect_series_impact_like.csv", index=False)
eff_sc.to_csv(out_dir / "effect_series_synth_control_donor.csv", index=False)

print("Saved to:", out_dir)