CURST is a model-centric meta-scientific framework in which scientific discovery progresses by confirming models proposed in idealized and replication experiments.
The CRUST theoretical and agent-based simulation models are written in R Statistics, thus it can be run on any R-supported operating system. In addition to R v3.3.2+, these implementations use a set of libraries that need to be installed prior to their execution:
caTools v.1.17.1
cowplot v.0.9.2
data.table v.1.10.4-3
directlabels v.2017.03.31
dplyr v.0.7.4
ggExtra v.0.7
ggplot2 v.2.2.1
ggpubr v.0.1.6
ggthemes v.3.4
grid v.3.4.4
gridExtra v.2.3
Hmisc v.4.1-1
lattice v.0.20-35
matrixStats v.0.53.1
MCMCpack v.1.4-2
permute v.0.9-4
reshape2 v.1.4.3
stringi v.1.1.7
tibble v.1.4.2
tidyverse v.1.2.1
- Open a terminal
- Navigate to the directory where you want to download the CRUST code
- Type: git clone https://github.com/gnardin/CRUST.git
The directory created by cloning the CRUST repository will henceforth be referred as <crustDir>
.
The description of the content of each directory under <crustDir>
is provided in the table below.
Directory | Description |
---|---|
data | Input and output data used and generated by the scripts |
data/modelComparison | Store the model comparison probability file |
data/plot | Store the generated plots |
data/raw | Store the raw data generated by the reproducibility.R script |
data/summary | Store the summary data generated by the summary.R script |
src | Script source-code |
src/abm | Agent-based model scripts |
src/functions | Auxiliary functions used by the simulation scripts |
src/theory | Theorethical simulation scripts |
The theoretical simulation version of the reproducibility model is a temporal stochastic process of scientific discovery in which we define scientists with diverse research strategies who search the true model generating the data.
The scripts of the theoretical model are available in the directory /src/theory.
The noReplicator.R script is the main script to execute the reproducibility theoretical simulation.
You can configure the theoretical reproducibility model by changing the values of the configuration parameters in the noReplicator.R script file as described below.
Parameter | Description |
---|---|
baseDir |
Full path to the base directory of CRUST (i.e., <crustDir> ) |
inpuDir |
Directory where the model comparison probability file is stored (default: <crust>/data/modelComparison ) |
outputDir |
Directory where to store plots generated (default: <crust>/data/plot ) |
k |
Maximum number of factors that linear models explored by scientistics |
sigma |
Data generation error variance |
To execute the theoretical simulation:
- Navigate to the
<crustDir>
folder - Edit the src/theory/noReplicator.R script file and set the parameters described in the Configuration
- Execute: Rscript src/theory/noReplication.R --no-save
The modelComparisonProbabilitiesByMonteCarlo.R script generates estimates for the noReplicator.R
script using the Monte Carlo method.
You can configure the script by changing the values of the configuration parameters in the modelComparisonProbabilitiesByMonteCarlo.R script file as described below.
Parameter | Description |
---|---|
baseDir |
Full path to the base directory of CRUST (i.e., <crustDir> ) |
outpuDir |
Directory where to store the model comparison probability file (default: <crust>/data/modelComparison ) |
k |
Maximum number of factors that linear models explored by scientistics |
sigmas |
List of data generation error variances |
sampleSize |
Size of the set of stochastic values generated |
nIter |
Number of independent samples on which model comparison statistic is based |
To execute the script:
- Navigate to the
<crustDir>
folder - Edit the src/theory/modelComparisonProbabilitiesByMonteCarlo.R script file and set the parameters described in the Configuration
- Execute: Rscript src/theory/modelComparisonProbabilitiesByMonteCarlo.R --no-save
The agent-based (ABM) simulation version of the reproducibility model is a forward-in-time simulation-based implementation of a process at the individual level. In our ABM, each scientist is represented as an agent that updates the scientific community consensus. ABM helps us assess interesting properties of our scientific process by allowing the inclusion of replication in the system.
The scripts of the agent-based model are available in the directory /src/abm.
The reproducibility.R script is the main script to execute the reproducibility agent-based simulation.
You can configure the ABM reproducibility model by changing the values of the configuration parameters in the reproducibility.R script file as described below.
Parameter | Description |
---|---|
baseDir |
Full path to the base directory of CRUST (i.e., <crustDir> ) |
replications |
Number of replications to run the simulation |
timesteps |
Number of time steps of each simulation run |
k |
Maximum number of factors that linear models explored by scientists |
sigma |
Data generation error variance |
sampleSize |
Size of the set of stochastic values generated under the True Model |
trueModel |
True Model |
correlation |
Correlation of the predictor values |
nRey |
Number of agents of the Rey type |
nTess |
Number of agents of the Tess type |
nBo |
Number of agents of the Bo type |
nMave |
Number of agents of the Maverick type |
modelCompare |
Type of statistic used for model comparison: TSTATISTICS t statistics, RSQ R-Squared, ARSQ Adjusted R-Squared, AIC Akaike Information Criterion, BIC Bayesian Information Criterion (Schwarz Criterion) |
modelSelection |
Type of research strategy: soft or hard |
outputFile |
Name of the file storing the simulation output data |
paramFile |
Name of the file storing the parameters of the simulation |
verbose |
Indicates if log messages are shown during the simulation execution |
ndec |
Decimal precision of the stored values |
To execute the reproducibility simulation:
- Navigate to the
<crustDir>
folder - Edit the src/abm/reproducibility.R script file and set the parameters described in the Configuration
- Execute: Rscript src/abm/reproducibility.R --no-save
The summary.R script is used to summarize data generated by a completely randomized factorial design experiment.
The values of the configuration parameters of the summary.R script depends on the parameters you used when executing the reproducibility.R script. Additionally, you can define the number of time steps you want to discard as burn-in (i.e., skip
parameter). The configuration parameters for the summary.R script are shown below.
Parameter | Description |
---|---|
baseDir |
Full path to the base directory of CRUST (i.e., <crustDir> ) |
inpuDir |
Directory where the raw data is stored (default: <crust>/data/raw ) |
outputDir |
Directory where to write the summary data (default: <crust>/data/summary ) |
replications |
Number of replications executed at each simulation |
timesteps |
Number of time steps executed at each simulation |
k |
Maximum number of factors that linear models explored by scientistics |
m |
List of model indexes |
sigmas |
List of sigma indexes |
types |
List of combination of scientist types indexes |
verbose |
Indicates if log messages are shown during the simulation execution |
To execute the summary script:
- Navigate to the
<crustDir>
folder - Edit the src/abm/summary.R script file and set the parameters described in the Configuration
- Execute: Rscript src/abm/summary.R --no-save
The createABMPlotsSummary10000.R script calculates statistics and generates plots from the summary data files created through a completely randomized factorial design experiment using the summary.R script (11,000 timesteps, initial 1,000 timesteps discarded).
The values of the configuration parameters of the createABMPlotsSummary10000.R script depends on storage location of the summary data generated by the summary.R script for the AIC
and SC
model comparison statistic and hard
and soft
research strategies, files summaryAIChard10000.csv
, summaryAICsoft10000.csv
, summarySCshard10000.csv
and summarySCsoft10000.csv
. The configuration parameters are shown below.
Parameter | Description |
---|---|
baseDir |
Full path to the base directory of CRUST (i.e., <crustDir> ) |
inpuDir |
Directory where the summary data is stored (default: <crust>/data/summary ) |
outputDir |
Directory where to store the generated plots (default: <crust>/data/plot ) |
To execute the summary script:
- Navigate to the
<crustDir>
folder - Edit the src/abm/createABMPlotsSummary10000.R script file and set the parameters described in the Configuration
- Execute: Rscript src/abm/createABMPlotsSummary10000.R --no-save
The createABMPlotsSummary11000.R script calculates statistics and generates plots from the summary data files created through a completely randomized factorial design experiment using the summary.R script (11,000 timesteps and no timesteps discarded).
The values of the configuration parameters of the createABMPlotsSummary11000.R script depends on storage location of the summary data generated by the summary.R script for the AIC
and SC
model comparison statistic and hard
and soft
research strategies, files summaryAIChard11000.csv
, summaryAICsoft11000.csv
, summarySCshard11000.csv
and summarySCsoft11000.csv
. The configuration parameters are shown below.
Parameter | Description |
---|---|
baseDir |
Full path to the base directory of CRUST (i.e., <crustDir> ) |
inpuDir |
Directory where the summary data is stored (default: <crust>/data/summary ) |
outputDir |
Directory where to store the generated plots (default: <crust>/data/plot ) |
To execute the summary script:
- Navigate to the
<crustDir>
folder - Edit the src/abm/createABMPlotsSummary11000.R script file and set the parameters described in the Configuration
- Execute: Rscript src/abm/createABMPlotsSummary11000.R --no-save
The Theoretical Simulation Model and Agent-Based Simulation Model require several auxiliary functions in the src/functions directory.
Script File Name | Function | Description |
---|---|---|
analysis.R | analysis(sModel, gModel, yset, xset, weights) |
Calculate the statistics for the selected and the global models assuming data generated under the True Model, randomly generated X values, and betas weigths. |
calculateDet.R | calculateDet(model, xset, weights, betas) |
Calculate the deterministic part of a model |
calculateDistance.R | calculateDistance(betas1, betas2) |
Calculate the distance among betas of two models |
compareModels.R | compareModels(model1, model2) |
Compare if two models are the equal |
constants.R | Define all the constants | |
convertBinary.R | convertBinary(v, k) |
Convert a number into binary format |
generateBetas.R | generateBetas(models) |
Set the weights of all betas |
generateModels.R | generateModels(k) |
Generate all possible models with factor k |
generateXSet.R | generateXSet(n, k, correlation) |
Generate a set n of predictor values |
generateY.R | generateY(deterministic, sigma) |
Generate a set of stochastic values under the True Model |
getBetas.R | getBetas(model, weights, sigma) |
Generate a set of random betas to the True Model |
getModelComparison.R | getModelComparison(xset, sampleSize, tModel, sigma, models, nIter, ms, msConstant) |
Calculate the ProbMC of switching from model i to model j. |
getModelSelectionConstant.R | getModelSelectionConstant(models, xset) |
Generate the constants to be used at getModelComparison.R |
getPredictors.R | getPredictors(models) |
Get a list of predictors of all models |
modelSimilarByInteraction.R | modelSimilarByInteraction(model, models, mode=["all", "random"], modelSelection=["hard", "soft"]) |
Generate a similar model adding an interaction |
modelSimilarByTerm.R | modelSimilarByTerm(model, models, mode=["all", "random"], modelSelection=["hard", "soft"]) |
Generate a similar model adding or removing a term |
modelToStr.R | modelToStr(model) |
Convert a model represented as a matrix into a string format |
searchModel.R | searchModel(model, models) |
Search for the index of the model in a list of models |
seedGenerator.R | seedGenerator(N, filename) |
Upload seeds from a text file or generate them randomly |
simulator.R | simulator(replications, timesteps, models, k, tModel, nRey, nTess, nBo, nMave, weights, sampleSize, correlation, sigma, modelCompare, modelSelection, inputDir, outputDir, outputFile, paramFile, verbose, ndec, seeds) |
Execute a certain number of replications of the reproducibility model |
strToModel.R | strToModel(modelStr, k) |
Convert a model represented as a string into a matrix format |