# Tutorial: Causal Inference 01 - Counterfactuals and ATE

Audience:
- Students who have seen the causal inference lecture and want executable intuition.

Prerequisites:
- Basic Python/pandas.
- Familiarity with treatment/control language.

Learning goals:
- Compute and interpret empirical ATE.
- Connect ATE to counterfactual reasoning.
- See why subgroup effects motivate CATE/ITE methods.


## Outline

1. Load and inspect the marketing A/B data.
2. Compute empirical ATE.
3. Explore subgroup-level ATE proxies.
4. Exercise + common pitfall + extension.


In [None]:
from pathlib import Path
import sys

project_root = Path.cwd().resolve()
if not (project_root / "src").exists():
    project_root = project_root.parent

sys.path.insert(0, str(project_root / "src"))

import numpy as np
import pandas as pd

from causal_showcase.data import load_marketing_ab_data
from causal_showcase.evaluation import estimate_empirical_ate

data_path = project_root / "data" / "raw" / "marketing_ab.csv"
prepared = load_marketing_ab_data(data_path)
df = pd.read_csv(data_path)

print(f"Rows: {len(df):,}")
print(f"Treatment share (ad): {prepared.treatment.mean():.3f}")
print(f"Overall conversion rate: {prepared.outcome.mean():.3f}")


## Step 1 - Treatment and control summary

This table mirrors your lecture narrative: treatment is the ad campaign, control is PSA.


In [None]:
group_summary = (
    df.assign(test_group=df["test group"].str.lower())
    .groupby("test_group", as_index=False)["converted"]
    .agg(["count", "mean"])
    .rename(columns={"mean": "conversion_rate"})
)
group_summary


## Step 2 - Empirical ATE (difference in means)

ATE is the average difference in outcome if everyone were treated vs everyone were not.
In randomized A/B data, a simple difference-in-means is a reasonable estimator.


In [None]:
rng = np.random.default_rng(42)
ate = estimate_empirical_ate(prepared.outcome, prepared.treatment)

bootstrap = []
for _ in range(400):
    idx = rng.integers(0, len(prepared.outcome), len(prepared.outcome))
    bootstrap.append(estimate_empirical_ate(prepared.outcome[idx], prepared.treatment[idx]))

ci_low, ci_high = np.quantile(bootstrap, [0.025, 0.975])
print(f"Empirical ATE: {ate:.4f}")
print(f"Approx. bootstrap 95% CI: [{ci_low:.4f}, {ci_high:.4f}]")


## Step 3 - Subgroup ATE proxy by day

This is not yet a formal CATE estimator, but it builds intuition that treatment effects can vary.


In [None]:
df_by_day = df.copy()
df_by_day["test_group"] = df_by_day["test group"].str.lower()

rows = []
for day, g in df_by_day.groupby("most ads day"):
    treated = g.loc[g["test_group"] == "ad", "converted"]
    control = g.loc[g["test_group"] == "psa", "converted"]
    ate_proxy = float(treated.mean() - control.mean()) if len(treated) and len(control) else float("nan")
    rows.append({"most_ads_day": day, "n": len(g), "ate_proxy": ate_proxy})

pd.DataFrame(rows).sort_values("ate_proxy", ascending=False)


## Exercises, pitfalls, and extension

- Exercise: Compare day-time vs evening ATE.
- Pitfall: Confusing correlation with causation in observational data.
- Extension: Replace subgroup slicing with model-based CATE in Notebook 2.


In [None]:
def subgroup_ate_by_hour(frame: pd.DataFrame, start_hour: int, end_hour: int) -> float:
    subset = frame[(frame["most ads hour"] >= start_hour) & (frame["most ads hour"] <= end_hour)].copy()
    subset["test_group"] = subset["test group"].str.lower()
    treated = subset.loc[subset["test_group"] == "ad", "converted"]
    control = subset.loc[subset["test_group"] == "psa", "converted"]
    if len(treated) == 0 or len(control) == 0:
        return float("nan")
    return float(treated.mean() - control.mean())

print("Evening ATE proxy (18-23):", round(subgroup_ate_by_hour(df, 18, 23), 4))
print("Daytime ATE proxy (9-17):", round(subgroup_ate_by_hour(df, 9, 17), 4))
