Randomize conducts random assignment of units to equally sized groups for experimental trials. It can check for balance on a specified list of covariates. If blocking variables are specified it will conduct the randomization within blocks. It can rerandomize within blocks a certain number of times, such as conducting 100 randomizations and choosing the randomization with the best balance across covariates. It can also rerandomize until the balance p-value exceeds a certain cut-off value (e.g. 0.2). If unequal allocation sizes are desired, multiple groups can be aggregated after the randomization.
For clustered random assignment, one will need to handle the clustering manually, such as collapsing the dataset to the cluster level or choosing one representative unit per cluster. The randomization algorithm can then be run on that dataset, and the assignments can be copied to all units in the cluster. Examples are provided below.
We do not yet support factor variable syntax in the balance check so categorical variables will need to be converted to a series of indicators.
This module is compatible with Stata version 12 or higher.
Installing from SSC is the simplest method, although it may provide a slightly less up-to-date version than installing from github.
. ssc install randomize
Stata 13 can install directly from github. If using Stata 12 this method may not work due to the "https".
. net install randomize, from(https://raw.githubusercontent.com/ck37/randomize_ado/master/)
Download the zip file of the repository (link), unzip it, then add that folder to Stata's search path for ado files. Example:
. adopath + "~/Documents/randomize_ado-master/"
You will then be able to run the command and view the help file within Stata.
. ssc install git . git install http://github.com/ck37/randomize_ado
- Randomize a dataset into 2 groups, checking for balance by gender.
. randomize, balance(gender)
- Randomize a dataset into 5 equally-sized groups, blocking by state and gender.
. randomize, groups(5) block(state gender)
- Randomize a dataset into 3 groups, checking for balance on age and gender. Rerandomize up to 100 times or until the balance p-value exceeds 0.2.
. randomize, groups(3) balance(age gender) jointp(0.2) maxruns(100)
- Create 4 groups, check for covariate balance on gender, race, and age, block on state, choose the most balanced of 500 randomizations within each block, and specify the random number generator seed.
. randomize, groups(4) balance(gender race age) block(state) minruns(500) seed(1)
- Create a 10% / 20% / 70% split by randomizing into 10 equally sized groups then aggregating those assignments.
. randomize, groups(10) aggregate(1 2 7)
- Use the quiet prefix to hide all randomization output and just get the result.
. quiet randomize, balance(state) minruns(1000)
- Use the details option to show all randomization output.
. randomize, balance(state) minruns(1000) details
- Simulated dataset example - randomize 10,000 records across 4 blocks, and take the best randomization out of 500 per block.
clear set obs 10000 set seed 2 gen covariate = uniform() gen block_var = ceil(uniform() * 4) randomize, block(block_var) balance(covariate) minruns(500)
- Clustered Randomization v1 - select a random record within the cluster, conduct the randomization on those records, then apply the assignment to the full cluster.
* Create a combined cluster id egen cluster_id = group(cluster_field1 cluster_field2) set seed 1 set sortseed 2 gen double random = runiform() * Randomly order individuals within clusters. bysort cluster_id (random): egen cluster_seq = seq() * Randomize using the demographics of the first cluster member to check for balance. randomize if cluster_seq == 1, balance(covar1 covar2) block(blockvar1 blockvar2) replace * Expand assignment to all units in the cluster. bysort cluster_id: egen assignment = mode(_assignment)
One could skip the last step and treat a random unit per cluster in order to measure spillover effects within treatment clusters.
- Clustered Randomization v2 - compress the dataset to the cluster level, conduct the randomization, then merge the assignment back to the full dataset.
* Create a combined cluster id egen cluster_id = group(cluster_field1 cluster_field2) set seed 1 set sortseed 2 * Save the uncompressed version of the dataset. preserve * Aggregate to the cluster level, creating summary statistics for the randomization. collapse (mean) covar1 covar2 (max) rare_covar3 (count) cluster_size, by(cluster_id) * Execute the randomization at the cluster level. randomize, balance(covar1 covar2 rare_covar3) replace * Restrict to the data that we need. keep cluster_id _assignment save "cluster-assignments.dta", replace * Switch back to the full dataset. restore merge m:1 cluster_id using "cluster-assignments.data"