Skip to content

Latest commit

 

History

History
205 lines (146 loc) · 11 KB

File metadata and controls

205 lines (146 loc) · 11 KB

Causal Forest

In [Athey2018], the authors argued that by imposing the local centering technique, i.e., by first regressing out the effect and treatment respectively aka the so called double machine learning framework, the performance of grf (GRF) can be further improved. In YLearn, we implement the class CausalForest to support such technique. We illustrate its useage in the following example.

Example

We first build a dataset and define the names of treatment, outcome, and covariate separately.

import numpy as np
import matplotlib.pyplot as plt

from ylearn.estimator_model import CausalForest
from ylearn.exp_dataset.exp_data import sq_data
from ylearn.utils._common import to_df


# build dataset
n = 2000
d = 10     
n_x = 1
y, x, v = sq_data(n, d, n_x)
true_te = lambda X: np.hstack([X[:, [0]]**2 + 1, np.ones((X.shape[0], n_x - 1))])
data = to_df(treatment=x, outcome=y, v=v)
outcome = 'outcome'
treatment = 'treatment'
adjustment = data.columns[2:]

# build test data
v_test = v[:min(100, n)].copy()
v_test[:, 0] = np.linspace(np.percentile(v[:, 0], 1), np.percentile(v[:, 0], 99), min(100, n))
test_data = to_df(v=v_test)

Now it leaves us to train the CausalForest and use it in the test data. Typically, we should first specify two models which regressing out the treatment and outcome respectively on the covariate. In this example, we use the RandomForestRegressor from sklearn to be such models. Note that if we use a regression model for the treatment, then the parameter is_discrete_treatment must be set as False. To have better performance, it is also recommended to set the honest_subsample_num as not None.

from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LogisticRegressionCV

cf = CausalForest(
    x_model=RandomForestRegressor(),
    y_model=RandomForestRegressor(),
    cf_fold=1,
    is_discrete_treatment=False,
    n_jobs=1,
    n_estimators=100,
    random_state=3,
    min_samples_split=10,
    min_samples_leaf=3,
    min_impurity_decrease=1e-10,
    max_depth=100,
    max_leaf_nodes=1000,
    sub_sample_num=0.80,
    verbose=0,
    honest_subsample_num=0.45,
)
cf.fit(data=data, outcome=outcome, treatment=treatment, adjustment=None, covariate=adjustment)
effect = cf.estimate(test_data)

Class Structures