Skip to content

admixVIE/sstar-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dry-run

sstar-analysis

Introduction

This repo contains Snakemake pipelines for replicating the analysis in our sstar paper. These pipelines were tested on Linux operating systems (CentOS8, Oracle Linux 8, and Ubuntu 20.04).

To replicate our analysis, users should install Anaconda or Miniconda first, then use the following commands to create a virtual environment for the analysis.

conda install mamba -n base -c conda-forge
mamba env create -f environment.yml
conda activate sstar-analysis

For tools that cannot be installed through conda, users could follow the commands below.

mkdir ext && cd ext

# Download SPrime and its pipeline
mkdir SPrime && cd SPrime
wget https://faculty.washington.edu/browning/sprime.jar
git clone https://github.com/YingZhou001/sprimepipeline
chmod a+x sprimepipeline/pub.pipeline.pbs/tools/map_arch_genome/map_arch
sed 's/out<-c()/out<-data.frame()/' sprimepipeline/pub.pipeline.pbs/tools/score_summary.r > tmp
mv tmp sprimepipeline/pub.pipeline.pbs/tools/score_summary.r
cd ..

# Download SkovHMM
git clone https://github.com/LauritsSkov/Introgression-detection SkovHMM
cd SkovHMM
git checkout 3d1865a56b8fdecc2768448e6cb982a157c37c50
cd ..

# Download ArchaicSeeker2.0
git clone https://github.com/Shuhua-Group/ArchaicSeeker2.0
cd ArchaicSeeker2.0
cd libnlopt.so.0_free
chmod a+x ArchaicSeeker2
cd ../..

Users also need to download ms.tar.gz from Hudson Lab and decompress it under the ext folder and compile it with the following commands.

cd msdir
${CONDA_PREFIX}/bin/gcc -o ms ms.c streec.c rand1.c -lm
cd ../..

Running the pipelines

After installing the tools above, users could test the pipelines locally by using dry-run.

snakemake -np

If users want to run all the pipelines, users can use the following command.

snakemake -c 1 --use-conda

-c specifies the number of threads and snakemake could run jobs parallelly as many as possible with the given number of threads.

However, we recommend users to run each pipeline individually. Users need to run the two pipelines for simulation first.

snakemake -s workflows/1src/simulation.snake -c 1
snakemake -s workflows/2src/simulation.snake -c 1

Other pipelines, including workflows/1src/sstar.snake, workflows/1src/sprime.snake, workflows/1src/skovhmm.snake, workflows/2src/sstar.snake, workflows/2src/sprime.snake, workflows/2src/archaicseeker2.snake, can be executed in any order after simulation.

For the SkovHMM pipeline, users need to add the --use-conda argument, because it requires Python2, which is different from the main environment sstar-analysis. A specific environment for SkovHMM needs to be created.

snakemake -s workflows/1src/skovhmm.snake -c 1 --use-conda

Finally, users could plot the results.

snakemake -s workflows/plot.snake -c 1

The plots and tables are in results/plots. They may be slightly different from those in our manuscript because of random effects.

Users could also submit the pipelines to HPC. Users should create profiles depending on the HPC scheduler. Users could find an example profile for SLURM in config/slurm/config.yaml. To submit jobs by SLURM, users first create a folder logs_slurm.

mkdir logs_slurm

Then, for example, to run simulation in the cluster,

snakemake -s workflows/1src/simulation.snake --profile config/slurm -j 200 

-j specifies the number of threads in the cluster.

About

Snakemake pipelines for replicating sstar analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages