-
Notifications
You must be signed in to change notification settings - Fork 2
Singularity
This demo session will be used for hands-on workshop at ISRFG '22.
Prof. Rod A. Wing, Director, Center for Desert Agriculture,4700 King Abdullah University of Science and Technology, Thuwal 23955-6900, KSA
Nagarajan Kathiresan {nagarajan.kathiresan@kaust.edu.sa} and Yong Zhou {yong.zhou@kaust.edu.sa}
For any pipeline support, suggestions and collaborations, please contact us pipeline.cda@gmail.com
Prerequisite:
-
Singularity should be installed in your cluster or High-Performance Computing environment.
-
Download the demo rice genome dataset and pipeline scripts:
Download link
-
untar the download file:
tar -xzvf singularity.tar.gz -
Change to
rice_pipeline_demodirectory. This directory will be your working directory for this demo exercise.cd rice_pipeline_demo -
Your directory structure for demo exercise will be:
~/Rice_pipeline/rice_pipeline_demo$ tree -L 2 ├── input ├── ERS467814_ERR614071_1.fastq.gz ├── ERS467814_ERR614071_2.fastq.gz ├── ERS467814_ERR614072_1.fastq.gz ├── ERS467814_ERR614072_2.fastq.gz ├── ERS467860_ERR615300_1.fastq.gz ├── ERS467860_ERR615300_2.fastq.gz ├── ERS467860_ERR615301_1.fastq.gz └── ERS467860_ERR615301_2.fastq.gz ├── output ├── ref ├── Nipponbare_chr.dict ├── Nipponbare_chr.fasta ├── Nipponbare_chr.fasta.amb ├── Nipponbare_chr.fasta.ann ├── Nipponbare_chr.fasta.bwt ├── Nipponbare_chr.fasta.fai ├── Nipponbare_chr.fasta.pac └── Nipponbare_chr.fasta.sa ├── scripts ├── Phase1 ├── Phase2 ├── Phase3 └── Phase4 └── tmp └── create_singularity.sh └── BioApps.sif └── Singularity_submit.sh -
We have provided a Singularity image file called BioApps.sif in the tar file (singularity.tar.gz).
Alternatively, you may build your own singularity image using the below example script:
( Example script to build the Singularity SIF image from Docker repository as follows:)#!/bin/bash module load singularity SINGULARITY_CACHEDIR=$HOME/singularity/cache SINGULARITY_PULLFOLDER=$HOME/singularity/images SINGULARITY_TMPDIR=$HOME/singularity/singularity/tmp mkdir -p ${SINGULARITY_CACHEDIR} $SINGULARITY_PULLFOLDER{} ${SINGULARITY_TMPDIR} singularity build BioApps.sif docker://ibexcluster/bioapps:v1.0
We are providing 2 rice genome samples (ERS467814 and ERS467860). To improve the quality, each samples are resequenced two times and the summary is as follows (All these *.fastq.gz files are in the input directory):
Sample #1
├── ERS467814_ERR614071_1.fastq.gz
├── ERS467814_ERR614071_2.fastq.gz
├── ERS467814_ERR614072_1.fastq.gz
├── ERS467814_ERR614072_2.fastq.gz
Sample #2
├── ERS467860_ERR615300_1.fastq.gz
├── ERS467860_ERR615300_2.fastq.gz
├── ERS467860_ERR615301_1.fastq.gz
└── ERS467860_ERR615301_2.fastq.gz
We are providing Nipponbare rice genome reference along with all the required index files and it's available in ref directory.
├── Nipponbare_chr.dict
├── Nipponbare_chr.fasta
├── Nipponbare_chr.fasta.amb
├── Nipponbare_chr.fasta.ann
├── Nipponbare_chr.fasta.bwt
├── Nipponbare_chr.fasta.fai
├── Nipponbare_chr.fasta.pac
└── Nipponbare_chr.fasta.sa
Step 1: Prepare your input files. i.e., Copy the list of unique forward sequence files from input/ directory into scripts/Phase1/Phase1.txt file directory.
cd input/
ls -lrta *_1.fastq.gz | awk '{print $9}' > ../scripts/Phase1/Phase1.txt
Step 2: Execute the MPI wrapper program for Phase 1:
Environment variables:
DOCKER_MOUNT should be a full path of rice_pipeline_demo/ directory.
For example: export DOCKER_MOUNT=/ibex/scratch/projects/c2072/work/Singularity/rice_genome/rice_pipeline_demo
SIF is a singularity image file directory.
For example: SIF=/ibex/scratch/projects/c2072/work/Singularity/rice_genome/rice_pipeline_demo/BioApps.sif
-
One core will be assigned to 1 sample (by default)
-
Number of MPI processes (
-np)should be equal to the number of samples listed in thePhase1.txtfile.
Example job submission script as follows:
(This script uses 2 nodes, 2 cores per node totaling 4 cores)#!/bin/bash #SBATCH --ntasks-per-node=2 #SBATCH -N 2 #SBATCH --mem=16GB #SBATCH -J Singularity #SBATCH --error=STDERR.Singularity.%J.err #SBATCH --output=STDOUT.Singularity.%J.out #SBATCH --time=10:00 #SBATCH -A ibex-cs ## User environment export DOCKER_MOUNT=/ibex/scratch/projects/c2072/work/Singularity/rice_genome/rice_pipeline_demo ; export SIF=/ibex/scratch/projects/c2072/work/Singularity/rice_genome/BioApps.sif ; ## Module file module load mpich/3.3/gnu-6.4.0 singularity export SINGULARITY_BIND="$DOCKER_MOUNT,$PWD,/sw" ## Create required files and directories scontrol show hostnames > hostfile mpicc ${DOCKER_MOUNT}/scripts/Phase1/Phase1.c -o ${DOCKER_MOUNT}/scripts/Phase1/Phase1.exe mpiexec -np 4 -hostfile ./hostfile singularity exec $SIF ${DOCKER_MOUNT}/scripts/Phase1/Phase1.exe
Step 1: Prepare for Phase 2 execution.
- List the Sample names and update into
Phase2.prefix.txt
ls -ld output/tmpBAM/* | awk -F'/' '{print $NF}' > scripts/Phase2/Phase2.prefix.txt
-
List the sample directories and update into
Phase2.directory.txtls -ld $PWD/output/tmpBAM/* | awk '{print $9} ' > scripts/Phase2/Phase2.directory.txt -
Find the number of MPI process (-np) for multi-core runs.
cat scripts/Phase2/Phase2.prefix.txt | wc -l
(This number should be used as an argument for multi-core runs)
Step 2: Execute the MPI wrappers
(This script uses 2 nodes, 1 cores per node totaling 2 cores)
#!/bin/bash
#SBATCH --ntasks-per-node=1
#SBATCH -N 2
#SBATCH --mem=16GB
#SBATCH -J Singularity
#SBATCH --error=STDERR.Singularity.%J.err
#SBATCH --output=STDOUT.Singularity.%J.out
#SBATCH --time=1:00:00
#SBATCH -A ibex-cs
## User environment
export DOCKER_MOUNT=/ibex/scratch/projects/c2072/work/Singularity/rice_genome/rice_pipeline_demo ;
export SIF=/ibex/scratch/projects/c2072/work/Singularity/rice_genome/BioApps.sif ;
## Module file
module load mpich/3.3/gnu-6.4.0 singularity
export SINGULARITY_BIND="$DOCKER_MOUNT,$PWD,/sw"
## Create required files and directories
scontrol show hostnames > hostfile
mpicc ${DOCKER_MOUNT}/scripts/Phase2/Phase2.c -o ${DOCKER_MOUNT}/scripts/Phase2/Phase2.exe
mpiexec -np 2 -hostfile ./hostfile singularity exec $SIF ${DOCKER_MOUNT}/scripts/Phase2/Phase2.exe
Step 1: Execute the Prerequisite data distribution script with number of Cores available in your HPC/cluster. Here I’m using 112 cores in my Ibex cluster.
sh scripts/Phase3/Phase3.prerequisite.optimized.sh 250
This script will give the estimated optimal number of cores! sh ./Phase3.prerequisite.optimized.sh 250
*************************************************************
Max size of Chromosome in the given reference is: 43270923
Total no. of Chromosomes in the given reference is: 12
No. of optimal CPUs will be calculated as follows:
Please use: -np 178
*************************************************************
Step 2: Launch the MPI wrapper script for Phase 3.
(This script uses 23 nodes, 8 cores per node totaling 178 cores)
#!/bin/bash
#SBATCH --ntasks-per-node=8
#SBATCH -N 23
#SBATCH --mem=16GB
#SBATCH -J Singularity
#SBATCH --error=STDERR.Singularity.%J.err
#SBATCH --output=STDOUT.Singularity.%J.out
#SBATCH --time=1:00:00
#SBATCH -A ibex-cs
## User environment
export DOCKER_MOUNT=/ibex/scratch/projects/c2072/work/Singularity/rice_genome/rice_pipeline_demo ;
export SIF=/ibex/scratch/projects/c2072/work/Singularity/rice_genome/BioApps.sif ;
## Module file
module load mpich/3.3/gnu-6.4.0 singularity
export SINGULARITY_BIND="$DOCKER_MOUNT,$PWD,/sw"
## Create required files and directories
scontrol show hostnames > hostfile
mpicc ${DOCKER_MOUNT}/scripts/Phase3/Phase3.c -o ${DOCKER_MOUNT}/scripts/Phase3/Phase3.exe
mpiexec -np 178 -hostfile ./hostfile singularity exec $SIF ${DOCKER_MOUNT}/scripts/Phase3/Phase3.exe
Step 1: Run the Prerequisite
sh scripts/Phase4/Phase4.prerequisite.sh
Step 2: Run the Variant caller program
(This script uses 1 node and 12 cores per node)
#!/bin/bash
#SBATCH --ntasks-per-node=12
#SBATCH -N 1
#SBATCH --mem=16GB
#SBATCH -J Singularity
#SBATCH --error=STDERR.Singularity.%J.err
#SBATCH --output=STDOUT.Singularity.%J.out
#SBATCH --time=1:00:00
#SBATCH -A ibex-cs
#User environment
export DOCKER_MOUNT=/ibex/scratch/projects/c2072/work/Singularity/rice_genome/rice_pipeline_demo ;
export SIF=/ibex/scratch/projects/c2072/work/Singularity/rice_genome/BioApps.sif ;
#Module file
module load mpich/3.3/gnu-6.4.0 singularity
export SINGULARITY_BIND="$DOCKER_MOUNT,$PWD,/sw"
#Create required files and directories
scontrol show hostnames > hostfile
mpicc ${DOCKER_MOUNT}/scripts/Phase4/Phase4.c -o ${DOCKER_MOUNT}/scripts/Phase4/Phase4.exe
mpiexec -np 12 -hostfile ./hostfile singularity exec $SIF ${DOCKER_MOUNT}/scripts/Phase4/Phase4.exe