-
Notifications
You must be signed in to change notification settings - Fork 2
Docker
This demo session will be used for hands-on workshop at ISRFG '22.
Prof. Rod A. Wing, Director, Center for Desert Agriculture,4700 King Abdullah University of Science and Technology, Thuwal 23955-6900, KSA
Nagarajan Kathiresan {nagarajan.kathiresan@kaust.edu.sa} and Yong Zhou {yong.zhou@kaust.edu.sa}
For any pipeline support, suggestions and collaborations, please contact us pipeline.cda@gmail.com
Prerequisite:
-
Docker installation on your laptop/workstation.
-
Download the demo rice genome dataset and pipeline scripts:
Download link
-
untar the download file:
tar -xzvf rice_pipeline_demo.tar.gz -
Change to
rice_pipeline_demodirectory. This directory will be your working directory for this demo exercise.cd rice_pipeline_demo -
Your directory structure for demo exercise will be:
~/Rice_pipeline/rice_pipeline_demo$ tree -L 2 ├── input ├── ERS467814_ERR614071_1.fastq.gz ├── ERS467814_ERR614071_2.fastq.gz ├── ERS467814_ERR614072_1.fastq.gz ├── ERS467814_ERR614072_2.fastq.gz ├── ERS467860_ERR615300_1.fastq.gz ├── ERS467860_ERR615300_2.fastq.gz ├── ERS467860_ERR615301_1.fastq.gz └── ERS467860_ERR615301_2.fastq.gz ├── output ├── ref ├── Nipponbare_chr.dict ├── Nipponbare_chr.fasta ├── Nipponbare_chr.fasta.amb ├── Nipponbare_chr.fasta.ann ├── Nipponbare_chr.fasta.bwt ├── Nipponbare_chr.fasta.fai ├── Nipponbare_chr.fasta.pac └── Nipponbare_chr.fasta.sa ├── scripts ├── Phase1 ├── Phase2 ├── Phase3 └── Phase4 └── tmp -
Download the docker image:
sudo docker pull ibexcluster/biohpc:v1.0 -
Ensure the docker image is available in your workstation/Laptop:
sudo docker images -
Start the docker image and include the mount points (source
$PWDand destinations/demo):sudo docker run -d -it --name biohpc --mount type=bind,source="$(pwd)",target=/demo ibexcluster/biohpc:v1.0 -
Ensure the docker image is running at your workstation:
sudo docker ps -a
We are providing 2 rice genome samples (ERS467814 and ERS467860). To improve the quality, each samples are resequenced two times and the summary is as follows (All these *.fastq.gz files are in the input directory):
Sample #1
├── ERS467814_ERR614071_1.fastq.gz
├── ERS467814_ERR614071_2.fastq.gz
├── ERS467814_ERR614072_1.fastq.gz
├── ERS467814_ERR614072_2.fastq.gz
Sample #2
├── ERS467860_ERR615300_1.fastq.gz
├── ERS467860_ERR615300_2.fastq.gz
├── ERS467860_ERR615301_1.fastq.gz
└── ERS467860_ERR615301_2.fastq.gz
We are providing Nipponbare rice genome reference along with all the required index files and it's available in ref directory.
├── Nipponbare_chr.dict
├── Nipponbare_chr.fasta
├── Nipponbare_chr.fasta.amb
├── Nipponbare_chr.fasta.ann
├── Nipponbare_chr.fasta.bwt
├── Nipponbare_chr.fasta.fai
├── Nipponbare_chr.fasta.pac
└── Nipponbare_chr.fasta.sa
Step 1: Prepare your input files. i.e., Copy the list of unique forward sequence files from input/ directory into scripts/Phase1/Phase1.txt file directory.
cd input/
ls -lrta *_1.fastq.gz | awk '{print $9}' > ../scripts/Phase1/ Phase1.txt
Step 2: Execute the MPI wrapper program for Phase 1:
- One core will be assigned to 1 sample (by default)
- Number of MPI processes (
-np)should be equal to the number of samples listed in thePhase1.txtfile
sudo docker exec -ti biohpc sh -c "mpirun --allow-run-as-root -np 4 /demo/scripts/Phase1/Phase1.exe"
Step 1: Prepare for Phase 2 execution.
- List the Sample names and update into
Phase2.prefix.txt
ls -ld output/tmpBAM/* | awk -F'/' '{print $NF}' > scripts/Phase2/Phase2.prefix.txt
- List the sample directories and update into
Phase2.directory.txt
ls -ld output/tmpBAM/* | awk '{print "/demo/"$9} ' > scripts/Phase2/Phase2.directory.txt
- Find the number of MPI process (-np) for multi-core runs.
cat scripts/Phase2/Phase2.prefix.txt | wc -l
(This number should be used as an argument for multi-core runs)
Step 2: Execute the MPI wrappers
sudo docker exec -ti biohpc sh -c "mpirun --allow-run-as-root -np 2 /demo/scripts/Phase2/Phase2.exe"
Step 1: Execute the Prerequisite data distribution script with number of Cores available in your workstation. Here I’m using 112 cores in my workstation.
sudo docker exec -ti biohpc sh -c "sh /demo/scripts/Phase3/Phase3.prerequisite.optimized.sh 112"
Step 2: Launch the MPI wrapper script for Phase 3.
sudo docker exec -ti biohpc sh -c "mpirun --allow-run-as-root --oversubscribe -np 84 /demo/scripts/Phase3/Phase3.exe"
Step 1: Run the Prerequisite
sudo docker exec -ti biohpc sh -c " sh /demo/scripts/Phase4/Phase4.prerequisite.sh"
Step 2: Run the Variant caller program
sudo docker exec -ti biohpc sh -c "mpirun --allow-run-as-root --oversubscribe -np 12 /demo/scripts/Phase4/Phase4.exe"
Successfully completed results are available here:
Phase #1: Results
Phase #2: Results
Phase #3: Results
Phase #4: Results
For your comments, feedback and support, please contact us: pipeline.cda@gmail.com