## Meta Learners

### Concepts


#### S-Learner
- S stands for 'Single'
  - estimate treatment effect using a single machine learning model
- Expression:
  - $\hat{\tau}(x)=E[Y(1)-Y(0)|X=x]$
- Put control group and experiment group into the **same** model **at the same time**.
- Evaluation:
  - Simple to implement, be good at dealing with difference biases
  - cannot handle datasets with <u>high dimensional features, imbalanced dataset, and selection bias</u>.

#### T-Learner
- T stands for 'Two'
  - estimate control group and experiment group with two difference machine learning models **separately**
- Expression:
  - $\hat{\mu_0(x)}=E[Y(0)|X=x]$
  - $\hat{\mu_1(x)}=E[Y(1)|X=x]$
  - $\hat{\tau}(x)=\hat{\mu_1(x)}-\hat{\mu_0(x)}$
- Restriction: treatments have to be <u>discrete variables</u>.
- Evaluation:
  - Two machine learning models cannot learn dataset of each other, contributing to *huge error* when make predictions.
    - With learning less dataset, the model performance is more likely to be influenced by some noise.

#### X-Learner
- X describes the shape of the training set
- Procedures:
  - estimate response effect with supervised or regression models with traditional machine learning models:
    - $\hat{\mu_0(x)}=E[Y(0)|X=x]$
    - $\hat{\mu_1(x)}=E[Y(1)|X=x]$
  - impute treatment effects for both treatment and control groups by gaining difference between true effect and estimated effect:
    - $\hat{D_i ^1}=Y_i ^1-\hat{\mu_0(X_i ^1)}$
    - $\hat{D_i ^0}=\hat{\mu_0(X_i ^0)}-Y_i ^0$
  - CATE estimator with traditional machine learning models:
    - $\hat{\tau}(x)=g(x)\hat{\tau_0}(x)+(1-g(x))\hat{\tau_1}(x)$
      - $\hat{\tau_0}(x)=E[\hat{D_i ^0}|X=x]$
      - $\hat{\tau_1}(x)=E[\hat{D_i ^1}|X=x]$
      - $g(x)\in [0,1]$:weights function, in order to minimize the high variance of $\hat{\tau}$
        - sometimes, we use propensity score $e(x)$ to get $g(x)$.
- Evaluation:
  - make efficient use of an unbalanced data, make full use of dataset to extract information

#### R-Learner

### Evaluation Methods

##### Qini Curve & AUUC


## Case Study

In [4]:
%pip install fklearn

Collecting fklearn
  Downloading fklearn-3.0.0-py3-none-any.whl (83 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m83.7/83.7 kB[0m [31m240.0 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting joblib<2,>=1.3.2 (from fklearn)
  Downloading joblib-1.3.2-py3-none-any.whl (302 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.2/302.2 kB[0m [31m650.0 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting numpy<2,>=1.24.4 (from fklearn)
  Downloading numpy-1.24.4-cp38-cp38-macosx_10_9_x86_64.whl (19.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.8/19.8 MB[0m [31m422.1 kB/s[0m eta [36m0:00:00[0m00:01[0m00:02[0m
[?25hCollecting pandas<3,>=2 (from fklearn)
  Downloading pandas-2.0.3-cp38-cp38-macosx_10_9_x86_64.whl (11.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.7/11.7 MB[0m [31m380.1 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting statsmodels<1,>=0.14.0 (f

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from fklearn.causal.validation.curves import relative_cumulative_gain_curve
from fklearn.causal.validation.auc import area_under_the_relative_cumulative_gain_curve

### Meta-Learners with Discrete Treatments

- Background:
  - We want to know which customers are sensitive to the marketing email, estimating the conditional average treatment effect of an email on customers' future purchase amount.
- Datasets:
  - customer behavior historical data, huge amount
  - randomized, tiny amount

In [6]:
#import datasets
data_biased=pd.read_csv('/Users/macbookpro/Desktop/email_obs_data.csv')
data_random=pd.read_csv('/Users/macbookpro/Desktop/email_rnd_data.csv')

In [7]:
data_biased.head()

Unnamed: 0,mkt_email,next_mnth_pv,age,tenure,ammount_spent,vehicle,food,beverage,art,baby,...,electronics,sports,tools,games,industry,pc,jewel,books,music_books_movies,health
0,0,34.38,32.0,0.0,18.05,0,0,1,1,1,...,3,0,1,0,1,2,2,0,1,1
1,0,183.14,23.0,1.0,182.97,0,0,0,1,0,...,1,1,0,0,0,2,2,1,2,1
2,0,54.26,29.0,0.0,29.57,0,0,0,1,4,...,3,0,0,0,3,0,0,0,1,4
3,1,1409.71,44.0,0.0,142.15,1,2,0,1,0,...,1,0,1,1,1,3,0,1,0,5
4,0,120.16,30.0,0.0,132.11,0,1,1,0,1,...,1,2,1,1,2,3,0,0,2,5


In [8]:
data_random.head()

Unnamed: 0,mkt_email,next_mnth_pv,age,tenure,ammount_spent,vehicle,food,beverage,art,baby,...,electronics,sports,tools,games,industry,pc,jewel,books,music_books_movies,health
0,0,244.26,61.0,1.0,21.84,0,2,2,0,2,...,1,0,0,3,1,0,1,0,0,2
1,0,29.67,36.0,1.0,107.4,0,2,0,2,0,...,1,1,1,2,1,2,1,0,2,2
2,0,11.73,64.0,0.0,59.81,0,1,0,0,0,...,2,0,0,3,0,1,0,1,0,1
3,0,41.41,74.0,0.0,62.98,0,1,0,0,3,...,1,0,2,2,1,1,0,4,1,0
4,0,447.89,59.0,0.0,72.56,0,1,1,3,2,...,5,0,0,1,0,0,1,1,2,1


In [9]:
#查看数据量
print(len(data_biased), len(data_random))

300000 10000


- Treatment Variables: `mkt_email`
- Outcome Variables: `next_mnth_pv`
- Confounders: other variables, making treatment heterogeneity

In [11]:
# state the variables
Y='next_mnth_pv'
T='mkt_email'
X=list(data_random.drop(columns=[Y,T]).columns)
train, test = data_biased, data_random

- T-Learner Training with LightGBM

In [22]:
from lightgbm import LGBMRegressor

OSError: Could not load shared object file: libllvmlite.dylib

### References
- Causal Inference for Brave and True. https://matheusfacure.github.io/python-causality-handbook/21-Meta-Learners.html
- CausalML. https://matheusfacure.github.io/python-causality-handbook/21-Meta-Learners.html
- EconML.
  - https://econml.azurewebsites.net/spec/estimation/metalearners.html
  - https://nbviewer.org/github/py-why/EconML/blob/main/notebooks/Metalearners%20Examples.ipynb
- Meta-learners for Estimating Treatment Effect in Causal Inference. https://towardsdatascience.com/meta-learners-for-estimating-treatment-effect-in-causal-inference-4f7071503401