## Intro

We'll be going through the three meta-learning frameworks covered in Meta-learners for [Estimating Heterogeneous Treatment Effects using Machine Learning](https://arxiv.org/abs/1706.03461).

The dataset we're using the famous "Adult" dataset, which has basic demographic and employment information for 50k-ish adults from the 1994 census. The basic task is to predict which of them make more than $50,000 a year. Since this is about conditional treatment effect estimation, we'll estimate the 

(This dataset is kind of equivalent of MNIST for tabular machine learning, both in terms of ubiquity and of being a little bit of a meme.)

(This notebook will require Python 3.11 to run, if you're having issues.)

In [None]:
import polars as pl
import yaml

from train import setup_data, train_model

x_train, x_valid = setup_data("./adult.csv")

## S-Learner

The S-learner uses a single model, where the treatment is indicated by a predictor in the model.

In [None]:
with open("cfg/s_learner.yaml", "r") as f:
    config = yaml.load(f, Loader=yaml.CLoader)

s_learner = train_model(x_train, x_valid, config)

# # Now let's predict with no graduate degrees and graduate degrees
x_valid_ng = x_valid.with_columns(**{"grad-degree": pl.lit(0)})
no_grad_p = s_learner.predict_proba(x_valid_ng)[:, 1]
x_valid_g = x_valid.with_columns(**{"grad-degree": pl.lit(1)})
grad_p = s_learner.predict_proba(x_valid_g)[:, 1]

s_results = x_valid.with_columns(
    grad_p=grad_p,
    no_grad_p=no_grad_p,
    grad_cate=grad_p - no_grad_p,
)

Now that we have a model, let's look at some of how the model behaves on the validation set. We'll score each individual twice:

  - Once with the treatment variable `grad-degree` set to 1.
  - Once with it set to 0.

Then we can look at the difference between the two predictions, which is our estimated treatment effect.

Let's see what this looks like.

In [None]:
s_results.select(
    config["categorical"]
    + config["numeric"]
    + ["grad_p", "no_grad_p", "grad_cate"]
    + [config["target"]]
)

In [None]:
# Show the average treatment effect for specific groups.
# We'll look at gender because it's easy.
display(
    s_results.group_by(pl.col("gender")).agg(
        base_rate=pl.col("income").mean(), ate=pl.col("grad_cate").mean()
    )
)
# Does this look right?
# First, for women
display(
    s_results.filter(
        pl.col("gender") == "Female",
        pl.col("grad-degree") == 1,
    ).select(
        average_untreated=pl.col("no_grad_p").mean(),
        average_cte=pl.col("grad_cate").mean(),
        average_target=pl.col("income").mean(),
    )
)
# Then for men
display(
    s_results.filter(
        pl.col("gender") == "Male",
        pl.col("grad-degree") == 1,
    ).select(
        average_untreated=pl.col("no_grad_p").mean(),
        average_cte=pl.col("grad_cate").mean(),
        average_target=pl.col("income").mean(),
    )
)

# Overall
display(
    s_results.filter(
        pl.col("grad-degree") == 1,
    ).select(
        average_untreated=pl.col("no_grad_p").mean(),
        average_cte=pl.col("grad_cate").mean(),
        average_target=pl.col("income").mean(),
    )
)

## T-Learner

The T-learner uses two models, one for the treated group and then one for the control group, and then looks at the difference between the scores from each model to estimate the treatment effect.

In [None]:
with open("cfg/t_learner.yaml", "r") as f:
    config = yaml.load(f, Loader=yaml.CLoader)

# First we train the no treatment model
ng_x_train = x_train.filter(pl.col(config["treatment"]) == 0)
ng_x_valid = x_valid.filter(pl.col(config["treatment"]) == 0)


t_learner_ng = train_model(ng_x_train, ng_x_valid, config)

# Then the treatment model
# First we train the no treatment model
g_x_train = x_train.filter(pl.col(config["treatment"]) == 1)
g_x_valid = x_valid.filter(pl.col(config["treatment"]) == 1)

t_learner_g = train_model(g_x_train, g_x_valid, config)

# # Now let's predict with no graduate degrees and graduate degrees
no_grad_p = t_learner_ng.predict_proba(x_valid)[:, 1]
grad_p = t_learner_g.predict_proba(x_valid)[:, 1]

t_results = x_valid.with_columns(
    grad_p=grad_p,
    no_grad_p=no_grad_p,
    grad_cate=grad_p - no_grad_p,
)

This time we have two models. We score everyone in the validation dataset with both models and then look at the score differences. This will look similar to the S-Learner.

In [None]:
t_results.select(
    config["categorical"]
    + [config["treatment"]]
    + config["numeric"]
    + ["grad_p", "no_grad_p", "grad_cate"]
    + [config["target"]]
)

If we look at estimated treatment effects overall and for specific subgroups, we find that the estimated effects here are larger.

In [None]:
# Show the average treatment effect for specific groups.
# We'll look at gender because it's easy.
display(
    t_results.group_by(pl.col("gender")).agg(
        base_rate=pl.col("income").mean(), ate=pl.col("grad_cate").mean()
    )
)
# Does this look right?
# First, for women
display(
    t_results.filter(
        pl.col("gender") == "Female",
        pl.col("grad-degree") == 1,
    ).select(
        average_untreated=pl.col("no_grad_p").mean(),
        average_cte=pl.col("grad_cate").mean(),
        average_target=pl.col("income").mean(),
    )
)
# Then for men
display(
    t_results.filter(
        pl.col("gender") == "Male",
        pl.col("grad-degree") == 1,
    ).select(
        average_untreated=pl.col("no_grad_p").mean(),
        average_cte=pl.col("grad_cate").mean(),
        average_target=pl.col("income").mean(),
    )
)

# Overall
display(
    t_results.filter(
        pl.col("grad-degree") == 1,
    ).select(
        average_untreated=pl.col("no_grad_p").mean(),
        average_cte=pl.col("grad_cate").mean(),
        average_target=pl.col("income").mean(),
    )
)