Skip to content

Conceptualizing Reproducibility Using Simulations and Theory

License

Notifications You must be signed in to change notification settings

Devezer-Buzbas/CRUST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CRUST Conceptualizing Reproducibility Using Simulations and Theory

CURST is a model-centric meta-scientific framework in which scientific discovery progresses by confirming models proposed in idealized and replication experiments.

Software Pre-requisites

The CRUST theoretical and agent-based simulation models are written in R Statistics, thus it can be run on any R-supported operating system. In addition to R v3.3.2+, these implementations use a set of libraries that need to be installed prior to their execution:

  • caTools v.1.17.1
  • cowplot v.0.9.2
  • data.table v.1.10.4-3
  • directlabels v.2017.03.31
  • dplyr v.0.7.4
  • ggExtra v.0.7
  • ggplot2 v.2.2.1
  • ggpubr v.0.1.6
  • ggthemes v.3.4
  • grid v.3.4.4
  • gridExtra v.2.3
  • Hmisc v.4.1-1
  • lattice v.0.20-35
  • matrixStats v.0.53.1
  • MCMCpack v.1.4-2
  • permute v.0.9-4
  • reshape2 v.1.4.3
  • stringi v.1.1.7
  • tibble v.1.4.2
  • tidyverse v.1.2.1

Download Repository from GitHub

The directory created by cloning the CRUST repository will henceforth be referred as <crustDir>.

The description of the content of each directory under <crustDir> is provided in the table below.

Directory Description
data Input and output data used and generated by the scripts
data/modelComparison Store the model comparison probability file
data/plot Store the generated plots
data/raw Store the raw data generated by the reproducibility.R script
data/summary Store the summary data generated by the summary.R script
src Script source-code
src/abm Agent-based model scripts
src/functions Auxiliary functions used by the simulation scripts
src/theory Theorethical simulation scripts

Theoretical Simulation Model

The theoretical simulation version of the reproducibility model is a temporal stochastic process of scientific discovery in which we define scientists with diverse research strategies who search the true model generating the data.

The scripts of the theoretical model are available in the directory /src/theory.

noReplicator.R

The noReplicator.R script is the main script to execute the reproducibility theoretical simulation.

Configuration

You can configure the theoretical reproducibility model by changing the values of the configuration parameters in the noReplicator.R script file as described below.

Parameter Description
baseDir Full path to the base directory of CRUST (i.e., <crustDir>)
inpuDir Directory where the model comparison probability file is stored (default: <crust>/data/modelComparison)
outputDir Directory where to store plots generated (default: <crust>/data/plot)
k Maximum number of factors that linear models explored by scientistics
sigma Data generation error variance

Execution

To execute the theoretical simulation:

  • Navigate to the <crustDir> folder
  • Edit the src/theory/noReplicator.R script file and set the parameters described in the Configuration
  • Execute: Rscript src/theory/noReplication.R --no-save

modelComparisonProbabilitiesByMonteCarlo.R

The modelComparisonProbabilitiesByMonteCarlo.R script generates estimates for the noReplicator.R script using the Monte Carlo method.

Configuration

You can configure the script by changing the values of the configuration parameters in the modelComparisonProbabilitiesByMonteCarlo.R script file as described below.

Parameter Description
baseDir Full path to the base directory of CRUST (i.e., <crustDir>)
outpuDir Directory where to store the model comparison probability file (default: <crust>/data/modelComparison)
k Maximum number of factors that linear models explored by scientistics
sigmas List of data generation error variances
sampleSize Size of the set of stochastic values generated
nIter Number of independent samples on which model comparison statistic is based

Execution

To execute the script:

  • Navigate to the <crustDir> folder
  • Edit the src/theory/modelComparisonProbabilitiesByMonteCarlo.R script file and set the parameters described in the Configuration
  • Execute: Rscript src/theory/modelComparisonProbabilitiesByMonteCarlo.R --no-save

Agent-Based Simulation Model

The agent-based (ABM) simulation version of the reproducibility model is a forward-in-time simulation-based implementation of a process at the individual level. In our ABM, each scientist is represented as an agent that updates the scientific community consensus. ABM helps us assess interesting properties of our scientific process by allowing the inclusion of replication in the system.

The scripts of the agent-based model are available in the directory /src/abm.

reproducibility.R

The reproducibility.R script is the main script to execute the reproducibility agent-based simulation.

Configuration

You can configure the ABM reproducibility model by changing the values of the configuration parameters in the reproducibility.R script file as described below.

Parameter Description
baseDir Full path to the base directory of CRUST (i.e., <crustDir>)
replications Number of replications to run the simulation
timesteps Number of time steps of each simulation run
k Maximum number of factors that linear models explored by scientists
sigma Data generation error variance
sampleSize Size of the set of stochastic values generated under the True Model
trueModel True Model
correlation Correlation of the predictor values
nRey Number of agents of the Rey type
nTess Number of agents of the Tess type
nBo Number of agents of the Bo type
nMave Number of agents of the Maverick type
modelCompare Type of statistic used for model comparison: TSTATISTICS t statistics, RSQ R-Squared, ARSQ Adjusted R-Squared, AIC Akaike Information Criterion, BIC Bayesian Information Criterion (Schwarz Criterion)
modelSelection Type of research strategy: soft or hard
outputFile Name of the file storing the simulation output data
paramFile Name of the file storing the parameters of the simulation
verbose Indicates if log messages are shown during the simulation execution
ndec Decimal precision of the stored values

Execution

To execute the reproducibility simulation:

  • Navigate to the <crustDir> folder
  • Edit the src/abm/reproducibility.R script file and set the parameters described in the Configuration
  • Execute: Rscript src/abm/reproducibility.R --no-save

summary.R

The summary.R script is used to summarize data generated by a completely randomized factorial design experiment.

Configuration

The values of the configuration parameters of the summary.R script depends on the parameters you used when executing the reproducibility.R script. Additionally, you can define the number of time steps you want to discard as burn-in (i.e., skip parameter). The configuration parameters for the summary.R script are shown below.

Parameter Description
baseDir Full path to the base directory of CRUST (i.e., <crustDir>)
inpuDir Directory where the raw data is stored (default: <crust>/data/raw)
outputDir Directory where to write the summary data (default: <crust>/data/summary)
replications Number of replications executed at each simulation
timesteps Number of time steps executed at each simulation
k Maximum number of factors that linear models explored by scientistics
m List of model indexes
sigmas List of sigma indexes
types List of combination of scientist types indexes
verbose Indicates if log messages are shown during the simulation execution

Execution

To execute the summary script:

  • Navigate to the <crustDir> folder
  • Edit the src/abm/summary.R script file and set the parameters described in the Configuration
  • Execute: Rscript src/abm/summary.R --no-save

createABMPlotsSummary10000.R

The createABMPlotsSummary10000.R script calculates statistics and generates plots from the summary data files created through a completely randomized factorial design experiment using the summary.R script (11,000 timesteps, initial 1,000 timesteps discarded).

Configuration

The values of the configuration parameters of the createABMPlotsSummary10000.R script depends on storage location of the summary data generated by the summary.R script for the AIC and SC model comparison statistic and hard and soft research strategies, files summaryAIChard10000.csv, summaryAICsoft10000.csv, summarySCshard10000.csv and summarySCsoft10000.csv. The configuration parameters are shown below.

Parameter Description
baseDir Full path to the base directory of CRUST (i.e., <crustDir>)
inpuDir Directory where the summary data is stored (default: <crust>/data/summary)
outputDir Directory where to store the generated plots (default: <crust>/data/plot)

Execution

To execute the summary script:

  • Navigate to the <crustDir> folder
  • Edit the src/abm/createABMPlotsSummary10000.R script file and set the parameters described in the Configuration
  • Execute: Rscript src/abm/createABMPlotsSummary10000.R --no-save

createABMPlotsSummary11000.R

The createABMPlotsSummary11000.R script calculates statistics and generates plots from the summary data files created through a completely randomized factorial design experiment using the summary.R script (11,000 timesteps and no timesteps discarded).

Configuration

The values of the configuration parameters of the createABMPlotsSummary11000.R script depends on storage location of the summary data generated by the summary.R script for the AIC and SC model comparison statistic and hard and soft research strategies, files summaryAIChard11000.csv, summaryAICsoft11000.csv, summarySCshard11000.csv and summarySCsoft11000.csv. The configuration parameters are shown below.

Parameter Description
baseDir Full path to the base directory of CRUST (i.e., <crustDir>)
inpuDir Directory where the summary data is stored (default: <crust>/data/summary)
outputDir Directory where to store the generated plots (default: <crust>/data/plot)

Execution

To execute the summary script:

  • Navigate to the <crustDir> folder
  • Edit the src/abm/createABMPlotsSummary11000.R script file and set the parameters described in the Configuration
  • Execute: Rscript src/abm/createABMPlotsSummary11000.R --no-save

Auxiliary Functions

The Theoretical Simulation Model and Agent-Based Simulation Model require several auxiliary functions in the src/functions directory.

Script File Name Function Description
analysis.R analysis(sModel, gModel, yset, xset, weights) Calculate the statistics for the selected and the global models assuming data generated under the True Model, randomly generated X values, and betas weigths.
calculateDet.R calculateDet(model, xset, weights, betas) Calculate the deterministic part of a model
calculateDistance.R calculateDistance(betas1, betas2) Calculate the distance among betas of two models
compareModels.R compareModels(model1, model2) Compare if two models are the equal
constants.R Define all the constants
convertBinary.R convertBinary(v, k) Convert a number into binary format
generateBetas.R generateBetas(models) Set the weights of all betas
generateModels.R generateModels(k) Generate all possible models with factor k
generateXSet.R generateXSet(n, k, correlation) Generate a set n of predictor values
generateY.R generateY(deterministic, sigma) Generate a set of stochastic values under the True Model
getBetas.R getBetas(model, weights, sigma) Generate a set of random betas to the True Model
getModelComparison.R getModelComparison(xset, sampleSize, tModel, sigma, models, nIter, ms, msConstant) Calculate the ProbMC of switching from model i to model j.
getModelSelectionConstant.R getModelSelectionConstant(models, xset) Generate the constants to be used at getModelComparison.R
getPredictors.R getPredictors(models) Get a list of predictors of all models
modelSimilarByInteraction.R modelSimilarByInteraction(model, models, mode=["all", "random"], modelSelection=["hard", "soft"]) Generate a similar model adding an interaction
modelSimilarByTerm.R modelSimilarByTerm(model, models, mode=["all", "random"], modelSelection=["hard", "soft"]) Generate a similar model adding or removing a term
modelToStr.R modelToStr(model) Convert a model represented as a matrix into a string format
searchModel.R searchModel(model, models) Search for the index of the model in a list of models
seedGenerator.R seedGenerator(N, filename) Upload seeds from a text file or generate them randomly
simulator.R simulator(replications, timesteps, models, k, tModel, nRey, nTess, nBo, nMave, weights, sampleSize, correlation, sigma, modelCompare, modelSelection, inputDir, outputDir, outputFile, paramFile, verbose, ndec, seeds) Execute a certain number of replications of the reproducibility model
strToModel.R strToModel(modelStr, k) Convert a model represented as a string into a matrix format

About

Conceptualizing Reproducibility Using Simulations and Theory

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages