orthogonal_forests: Orthogonal Random Forests
Orthogonal Random Forest (ORF) is an algorithm for heterogenous treatment effect (HTE) estimation. Orthogonal Random Forest combines orthogonalization, a technique that effectively removes the confounding effect in two-stage estimation, with generalized random forests (Athey et al., 2017), a flexible method for estimating treatment effect heterogeneity.
This repository offers an implementation of the orthogonal random forest, as well as Monte Carlo simulations that compare its performance with other methods in literature (and their variants). The code base is in prototype mode and is subject to frequent changes.
ortho_forest.py: Orthogonal Random Forest (OrthoForest) algorithm and variants.
hetero_dml.py: Extensions of the double machine learning technique (Chernozhukov et al., 2017) for heterogenous treatment effect estimation. Used mainly for comparisons with the ORF algorithm.
GRF_treatment_effects.R: Application of the Generalized Random Forest (GRF) algorithm (R Package) to the data generated by the Monte Carlo simulations. Used for comparisons with the ORF algorithm.
monte_carlo.py: Monte Carlo simulations script that takes in parameters for the data generating process (DGP) and the ORF method.
comparison_plots.py: Script that generates comparison plots from the files produced by the Monte Carlo script.
seq_map.sh: Script that sweeps over the different HTE estimation methods and DGP parameters and generates comparison plots. Compatible with Linux, but also executable from
Git Bashon Windows and MacOS. Takes as input an output folder for the monte carlo script and a 0-3 index representing the treatment response function considered (0=piecewise linear, 1=piecewise constant, 2=piecewise polynomial, 3=2D treatment response).
The ORF algorithm requires
Python 3.6 and
scikit-learn > 0.19,
numpy > 1.14. The monte carlo simulations and plotting scripts require
R 3.3 or above and CRAN packages
from ortho_forest import OrthoForest from residualizer import dml from sklearn.linear_model import Lasso, LassoCV model_T = Lasso(alpha=0.04) model_Y = Lasso(alpha=0.04) est = OrthoForest(n_trees=100, min_leaf_size=5, residualizer=dml, max_splits=20, subsample_ratio=0.1, bootstrap=False, model_T=model_T, model_Y=model_Y, model_T_final=LassoCV(), model_Y_final=LassoCV()) est.fit(W, x, T, Y) # high-dimensional controls, features, treatments, outcomes est.predict(x_test) # test features
For more information on parameter choices for the ORF algorithm, see the References section.
Monte Carlo simulations
To generate comparison plots for the different methods considered, execute the following script on Linus or
Git Bash on Windows/MacOS:
./seq_map.sh results/piecewise_linear 0
Miruna Oprescu, Vasilis Syrgkanis, Zhiwei Steven Wu. Orthogonal Random Forest for Heterogenous Treatment Effects.