Inference for High-dimensional Nested Regression
Statistical inference in high-dimensional models under endogeneity of regressors of interest.
Contents and usage
The present repository contains the software used to conduct the simulation studies cited in the paper "Inference for high-dimensional nested regression." Please note that the software is designed for use on a cluster under the Slurm Workload Manager. The simulation routine may be adapted for serial execution, or for parallel execution under a different workload manager. Our primary objective in hosting this repository is the transparent presentation of the processes used to obtain our empirical results. As is, the software is not suitable for practical analytical objectives, though it may be adapted for such purposes.
The following subsections detail the installation and use of the software on a compute cluster with Slurm.
After connecting to your cluster, you should clone the present git repository with
git clone firstname.lastname@example.org:LedererLab/HDIV.git
Note that you may need to enter a short interactive Slurm session to use git, for instance with
srun --pty --time="60" --mem-per-cpu="100" /bin/bash
You will need the following R packages to run the simulations:
dplyr purrr MASS mvtnorm Matrix methods lpSolve glmnet
Configuring the simulations
The simulations are conducted under a variety of parameter configurations. Each parameter configuration determines a data-generation mechanism that is used to generate random samples to which the HDIV estimation method may be applied.
To configure the simulations, navigate to the
This will launch a series of Slurm jobs that each generate the model's regression parameters for given a configuration and writes the regression parameters to disk.
The script also creates the
res folders required to run the simulations.
Running the simulations
The present software uses the Slurm Workload Manager to run separate trials in parallel.
To run the simulations for a contiguous range of configurations, first navigate to the
Then, use the
src/run.sh script with first and second command line arguments denoting the first and last configuration numbers of the desired range.
If the second argument is omitted, the range will be taken to consist solely of the first argument.
An example usage is
src/run.sh 1 10
Authors: David Gold, Johannes Lederer, Jing Tao — University of Washington
Cite as "Inference for high-dimensional nested regression, Gold, Lederer, and Tao, arXiv:1708.05499, 2017"