Skip to content

Christopher-Peterson/copepod_worm_adapt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data and analyses for “Local adaptation and host specificity to copepod intermediate hosts by the Schistocephalus solidus tapeworm”

These include the data and scripts for the Bayesian hurdle model ensemble analysis.

Dataset metadata

The main data is data/chapter_2_copepod_for_bayes.csv. It has the following columns: The data file has the following columns:

  • number: sequential row number
  • cop.lake: Copepod Lake of origin (factor: lau, ech, rob, gos, boo)
  • worm.fam: Worm family; nested within worm.lake (factor)
  • worm.lake: Worm lake of origin (factor: boo, gos, ech)
  • plate: experimental block identifier (factor)
  • numb.worm: number of worms present in copepod (integer)
  • native: Does cop.lake == worm.lake? (logical)
  • genus: Worm genus (factor: M, A)

Re-analysis instructions

TACC setup

The hurdle models were originally run on the Texas Advanced Computing Center’s (TACC) Stampede2 supercomputer. It could probably be adapted to other HPC systems.

  1. Create a directory in $SCRATCH for the project, then copy the R & data folders to it. Make a folder called “logs” with mkdir.
  2. Download the docker image.
    a. Make the folder $WORK/singularity and go there.
    b. Open an idev session c. Run ml tacc-singularity d. Run singularity pull docker://crpeters/docker-stan:21.1.2 e. This will download a docker image that has all the analysis packages in it. f. Run ls *.sif and copy the name of the sif file.
  3. Configure docker_stan shell script a. Copy the docker_stan file into your local bin folder (could be $HOME/bin or $WORK/apps/bin or something like that). b. Edit the file with nano or something; replace the sif file it references with the one you coppied in the previous step, then save. c. Make the file executable.

Run the Hurdle models

This will create a list of all possible hurdle model candidates and submit them to TACC .  

  1. Submit the hurdle model batch job with sbatch slurm/run_hurdle.slurm. You may need to edit the slurm file’s parameters (cores, nodes, etc) to suite your HPC system.

  2. If the all of the tasks aren’t completed, run sbatch slurm/continue_hurdle.slurm to continue the run. This should be done repeatedly until all models have been run

For each model, there should be two output files: the posterior distribution and the loo log predictive density.

Model Stacking

Model stacking is a three-step process, which is fully handled by the slurm/model_stacking.slurm script. If it times out, the easiest option is to give the job more time and then re-run it.

Post-processing

Several other scripts are useful for analyzing the stacking posterior:

  • get_ev.r provides expected values
  • get_effect_sizes.R calculates effect sizes
  • make_effect_size_plots.r visualizes the effect sizes
  • copepod_corrrelation_plot.r creates figure S1 in the manuscript. All of these are probably best run on the HPC system used for the other analyses.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published