Amy X. Zhang 11/20/2022
This R package provides code and data to reproduce all examples from the paper
Zhang, A. X., Bao, L., & Daniels, M. J. (2020). Approximate Cross-validated Mean Estimates for Bayesian Hierarchical Regression Models. arXiv preprint arXiv:2011.14238.
The paper presents a novel method for approximating
Figure 2 from the paper gives point-by-point comparisons of ground truth manual cross-validation (x-axis) against AXE approximations (panel A, y-axis). Point-by-point comparisons for other LCO methods integrated importance sampling (iIS) and ghosting (GHOST) are also included. (Other methdos are omitted to preserve the scale of the axes; results summarized in Figure 1 of paper.)
The code contains the following directories:
R
: Contains all code to fit models, run manual cross-validation, AXE, iIS, GHOST, and VEHTARI. Also contains code helpers functions to format results for the paper, as well as helper functions to save posterior fits (for use in Python).python
: Contains code to fit IJ and NS methods, as well as IJ and NS results.pretrained
: Contains files with posterior fit data, for the purpose of using in the Python code.data-raw
: Contains data in raw format, as well as scripts used to prepare data.data
: Contains all datasets and paper results.stan
: Contains all STAN model code.
The following scripts are also included:
pretrained_models.R
: Fits models to each dataset and saves posterior information inpretrained
directory.run_examples.R
: Runs all paper examples.paper_output.R
: Uses data indata
directory to get paper figures and tables.
To use, from within the project directory, load/source the appropriate files, i.e. in R do:
for (file in list.files("R")) {
source(file.path("R", file))
} # imports all functions for re-running MCV, AXE, or other LCO approximations
for (obj in list.files("data")) {
load(file.path("data", obj))
} # loads pre-obtained MCV, AXE, and LCO values
The main functions in R
folder are:
-
prep_*()
: Creates a list with data for each example-
prep_eight()
: Eight schools (LMM) -
prep_radon_full()
: Radon (LMM) -
prep_radon_simul()
: Radon subsets (LMM) -
prep_lol()
: Esports players (GLMM) -
prep_slc()
: Scottish lip cancer (GLMM, CAR) -
prep_air()
: Scottish respiratory disease (GLMM, spatio-temporal CAR)
-
-
pfit_*()
: Fits model to full data and produces LCO approximations, based on posterior samples, for iIS, Vehtari, and GHOST methods. -
axe_*()
: Uses co/variance posterior means to produce AXE estimates. -
mcv_*()
: Runs manual cross-validation and saves$E[Y_j | Y_{-j}]$ .
Example results are obtained by calling the above functions for each example in turn, e.g. the Eight schools results are generated using the following code:
eight <- prep_eight()
eight$cv_yhats <- mcv_eight()
eight$posteriors <- pfit_eight()
eight$axe_yhats <- axe_eight()
The object eight
is available in the package, listed in the data
folder. Each of the paper’s examples is saved in data
under the
following names:
-
eight
: Eight schools -
radon_1
: Radon -
radon_2
: Radon subsets -
lol
: Esports players -
slc
: Scottish lip cancer -
air
: Scottish respiratory disease