Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


orthogonal_forests: Orthogonal Random Forests

Orthogonal Random Forest (ORF) is an algorithm for heterogenous treatment effect (HTE) estimation. Orthogonal Random Forest combines orthogonalization, a technique that effectively removes the confounding effect in two-stage estimation, with generalized random forests (Athey et al., 2017), a flexible method for estimating treatment effect heterogeneity.


This repository offers an implementation of the orthogonal random forest, as well as Monte Carlo simulations that compare its performance with other methods in literature (and their variants). The code base is in prototype mode and is subject to frequent changes.

File contents:

  • ortho_forest.py: Orthogonal Random Forest (OrthoForest) algorithm and variants.
  • hetero_dml.py: Extensions of the double machine learning technique (Chernozhukov et al., 2017) for heterogenous treatment effect estimation. Used mainly for comparisons with the ORF algorithm.
  • GRF_treatment_effects.R: Application of the Generalized Random Forest (GRF) algorithm (R Package) to the data generated by the Monte Carlo simulations. Used for comparisons with the ORF algorithm.
  • monte_carlo.py: Monte Carlo simulations script that takes in parameters for the data generating process (DGP) and the ORF method.
  • comparison_plots.py: Script that generates comparison plots from the files produced by the Monte Carlo script.
  • seq_map.sh: Script that sweeps over the different HTE estimation methods and DGP parameters and generates comparison plots. Compatible with Linux, but also executable from Git Bash on Windows and MacOS. Takes as input an output folder for the monte carlo script and a 0-3 index representing the treatment response function considered (0=piecewise linear, 1=piecewise constant, 2=piecewise polynomial, 3=2D treatment response).


The ORF algorithm requires Python 3.6 and scikit-learn > 0.19, numpy > 1.14. The monte carlo simulations and plotting scripts require matplotlib>2.1, R 3.3 or above and CRAN packages optparse, grf.

Example Usage

from ortho_forest import OrthoForest
from residualizer import dml
from sklearn.linear_model import Lasso, LassoCV

model_T = Lasso(alpha=0.04)
model_Y = Lasso(alpha=0.04)
est = OrthoForest(n_trees=100, min_leaf_size=5, residualizer=dml,
            max_splits=20, subsample_ratio=0.1, bootstrap=False, 
            model_T=model_T, model_Y=model_Y, model_T_final=LassoCV(), model_Y_final=LassoCV())
est.fit(W, x, T, Y) # high-dimensional controls, features, treatments, outcomes
est.predict(x_test) # test features

For more information on parameter choices for the ORF algorithm, see the References section.

Monte Carlo simulations

To generate comparison plots for the different methods considered, execute the following script on Linus or Git Bash on Windows/MacOS:

./seq_map.sh results/piecewise_linear 0


Miruna Oprescu, Vasilis Syrgkanis, Zhiwei Steven Wu. Orthogonal Random Forest for Heterogenous Treatment Effects.