This repository contains replication materials and implementations of the algorithms in the AISTATs 2021 paper "Efficient Balanced Treatment Assignments for Experimentation" by David Arbour, Drew Dimmery and Anup Rao.
The folders contain the step-by-step demonstrations of how a variety of methods perform design on a given set of data.
This folder contains the implementations of all methods shown in the paper. These methods are:
- Bernoulli (simple) randomization
- Complete (fixed margins) randomization
- GreedyNeighbors (new in our paper): MAXCUT on the nearest-neighbor graph
- Kallus Heuristic (see Kallus 2017)
- Kallus PSOD (see Kallus 2017)
- Matched pairs
- OptBlock (see Greevy et al 2004)
- QuickBlock (see Higgins et al 2016)
- Rerandomization (see, e.g. Morgan et al 2012)
- SoftBlock (new in our paper): MAXCUT on the maximal spanning tree
The simulated data generating processes considered in our simulation analyses:
- IHDP (a common causal benchmark)
- Linear (just a linear outcome model)
- QuickBlock (the product of two uniform random variables)
- Sinusoidal (the outcome is a sinusoidal function of covariates)
- Two Circles (the covariates are distributed uniformly in two concentric circles and the outcome is a linear function of the radius and angle of the point from the origin)
The raw figures used in the paper.
For replication of each of the subfigures in Figure 1, run create_example_data.py
to generate example data and then plot it using plot_example.R
.
Run run_test.py
. When it has completed, run analysis.R
.
Once run_test.py
has completed, run analysis.R
.
Run run_hp_comparison.py
followed by hp_analysis.R
.
Additionally, we have provided an R implementation of SoftBlock and GreedyNeighbors using tidyverse semantics in r_implementation.R
.
An self-contained example randomization using precinct-level elections data is available in r-demo.Rmd
.
The core of the implementation is to simply call:
source("https://raw.githubusercontent.com/ddimmery/softblock/master/r_implementation.R")
df %>% assign_greedy_neighbors(c(
covariate_to_balance_1, covariate_to_balance_2
))
This will add a column to df
named treatment
with the assigned treatment.
The python implementation resides in design/
. You can perform a shallow clone of just this directory as follows:
mkdir design-local
cd design-local
git init
git remote add origin -f https://github.com/ddimmery/softblock.git
Add the design
directory to .git/info/sparse-checkout
then git pull origin master
will clone the contents of design/
and may be used as follows:
import design
import numpy as np
X = np.random.rand(100,5) # numeric numpy matrix on which to balance
softblock = design.SoftBlock()
softblock.fit(X)
A = softblock.assign(X)