# Tutorial: Causal Inference 02 - Meta-Learners S T X R

Audience:
- Students ready to move from ATE to heterogeneous treatment effects.

Prerequisites:
- Notebook 01.
- Understanding of train/test splits.

Learning goals:
- Train S/T/X/R learners with CausalML.
- Compare models using policy-facing metrics.
- Interpret why model rankings can differ.


## Outline

1. Data split and model training.
2. Compare ATE estimates and uplift@30%.
3. Inspect uplift score distributions.
4. Exercise + pitfall + extension.


In [None]:
from pathlib import Path
import sys

project_root = Path.cwd().resolve()
if not (project_root / "src").exists():
    project_root = project_root.parent

sys.path.insert(0, str(project_root / "src"))

import pandas as pd

from causal_showcase.data import load_marketing_ab_data, train_test_split_prepared
from causal_showcase.evaluation import uplift_at_k
from causal_showcase.modeling import fit_meta_learners

data_path = project_root / "data" / "raw" / "marketing_ab.csv"
prepared = load_marketing_ab_data(data_path)
train_data, test_data = train_test_split_prepared(prepared)

results = fit_meta_learners(train_data, test_data)
print("Learners:", list(results.keys()))


## Step 1 - Metric table across learners

We compare each learner on:
- `estimated_ate` (global effect estimate),
- `uplift_at_30pct` (policy-relevant targeting quality).


In [None]:
rows = []
for name, result in results.items():
    rows.append(
        {
            "model": name,
            "estimated_ate": result.ate,
            "ate_ci_low": result.ate_ci_low,
            "ate_ci_high": result.ate_ci_high,
            "uplift_at_30pct": uplift_at_k(
                test_data.outcome,
                test_data.treatment,
                result.uplift_scores,
                top_fraction=0.30,
            ),
        }
    )

metrics_df = pd.DataFrame(rows).sort_values("uplift_at_30pct", ascending=False)
metrics_df


## Step 2 - Uplift score quantiles

Quantiles help us see whether a learner spreads users into distinct low/high uplift groups.


In [None]:
quantile_rows = []
for name, result in results.items():
    q = pd.Series(result.uplift_scores).quantile([0.1, 0.5, 0.9]).to_dict()
    quantile_rows.append(
        {
            "model": name,
            "q10": q[0.1],
            "q50": q[0.5],
            "q90": q[0.9],
        }
    )

pd.DataFrame(quantile_rows).sort_values("q90", ascending=False)


## Exercises, pitfalls, and extension

- Exercise: Evaluate `uplift_at_20pct` and compare rank changes.
- Pitfall: Interpreting ATE alone as policy quality.
- Extension: Swap the base learner and compare robustness.


In [None]:
def ranking_at_fraction(fraction: float) -> pd.DataFrame:
    rows = []
    for name, result in results.items():
        rows.append(
            {
                "model": name,
                "uplift_at_fraction": uplift_at_k(
                    test_data.outcome,
                    test_data.treatment,
                    result.uplift_scores,
                    top_fraction=fraction,
                ),
            }
        )
    return pd.DataFrame(rows).sort_values("uplift_at_fraction", ascending=False)

ranking_at_fraction(0.20)
