Skip to content

Installation & Testing

singerj edited this page Jul 6, 2018 · 16 revisions

NGS-Pipe is a pipeline for the core analysis of DNA and RNA sequencing samples generated in the context of precision oncology. One of the main design goals is to provide an easy to use and robust toolkit for users with bioinformatic expertise. As any other pipeline, NGS-pipe relies on the underlying software that has to be installed before analysis can be performed. In this section we describe different options how to install the bioinformatic software needed for analysis as well as show how the pipeline can be executed with example data that we provide.

Installation

The pipeline comprises a large number of software tools, spanning from aligners, to quality control tools to variant callers. We believe that there are currently 2 viable options to install/provide the tools on your environment.

Conda

Conda is a package manager that automatically installs software and encapsulates it into an environment. Since a large number of bioinformatic software is available via the bioconda channel that cover the majority of the tools needed in the NGS-pipe we provide conda scripts for DNA and RNA. We recommend the use of conda for the installation.

Manual installation

Installation of tools by hand is also possible but also cumbersome. You will be in charge to find software in the correct version and install it on your own system. Snakemake will require to adjust the path in the config files.

Why not Docker?

We have decided not to integrate our pipeline into docker. Docker is a neat tool to package your software and its dependencies into a simple container. But there are multiple flaws when it comes to executing Docker containers on a HPC environment, such as privilege escalation or performance. The flaws can be fixed e.g. by "translating" the container to Singularity. But in total, the overhead to make a Docker container HPC-ready is similar to the installation of tools by hand and bare metal.

Conda Installation and Examples

A large fraction of tools required by the NGS-Pipe is covered by Conda and the bioconda channel. Installation of tools is performed by a single command.

RNA

Installation of Tools

All tools required for the analysis of RNASeq experiments are provided by conda. The tools will be installed via conda and the environment activated.

#The RNA environment (environments/rna_environment.yaml)
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - fastqc=0.11.5
  - samtools=1.2
  - star=2.5.3a
  - trimmomatic=0.36
  - subread=1.5.2
  - snakemake=3.13.3
#Install tools from rna-environment.yaml
conda env create -n ngs-pipe-rna --file environments/rna_environment.yaml

#Activate environment
conda activate ngs-pipe-rna

After the environment is activated all tools are available via commandline and ready to be executed in the pipeline.

Test Run of RNASeq Pipeline

We provide data and a test script to get familiar with how the raw data has to be formatted and how to execute the pipeline.

#1. Go to examples folder:
cd examples/rna
#2. Download test data: We provide an additional snakemake pipeline to 
#   download test sequences, databases and adapter files:
./run_prepare_data_locally.sh
# This will download 8 test data sets, the adapters, the human reference 
# and build the STAR database index
#3. Execute the RNASeq Pipeline:
./run_analysis_locally.sh
# This will execute: RAW-->Trimmomatic-->STAR-->FeatureCounts

DNA

Installation of Tools

All core tools required for the analysis of DNA sequencing experiments are provided by conda. These tools will be installed via conda and the environment activated. However some tools are not provided by conda and need to be installed by hand (see list below).

#The DNA environment (environments/dna_environment.yaml). 
#The disabled dependencies are not needed for the example data and can be enabled when needed
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - snakemake=3.13.3
  - fastqc=0.11.5
  - samtools=1.4
  - trimmomatic=0.36
  - bwa=0.7.15
  - picard=2.9.2
  - gatk=3.5
  - varscan=2.4.2
  - qualimap=2.2
  - sra-tools=2.8.1
  #- bowtie2=2.3.2
  #- yara=0.9.6
  #- snpeff=4.3
  #- snpsift=4.3
  #- freebayes=1.1.0
  #- somatic-sniper=1.0.5.0
  #- pindel=0.2.5b8
  #- bioconductor-deepsnv=1.20.0
  #- vardict-java=1.4.10
  #- vardict=2017.04.18
#Install tools from dna-environment.yaml
conda env create -n ngs-pipe-dna --file environments/dna_environment.yaml

#Activate environment
conda activate ngs-pipe-dna

After the environment is activated all tools are available via command line and ready to be executed in the pipeline.

Test Run of DNA Pipeline

We provide data and a test script to get familiar with how the raw data has to be formatted and how to execute the pipeline. However, this test script doesn't execute the full pipeline but only a subset due to limitations of tools installable by conda. The full pipeline can be executed once all required tools are installed.

#1. Go to examples folder:
cd examples/dna
#2. Download test data: We provide an additional snakemake pipeline to 
#   download test sequences, databases and adapter files:
./run_prepare_data_locally.sh
# This will download 6 test data sets, the adapters, regions file,
# the human reference and build the BWA database index
#3. Execute the DNA Pipeline:
./run_analysis_locally.sh
# This will execute: RAW --> QC(Trimmomatic) --> Mapping(BWA) --> Sort(Picard)
# --> Merge(Picard) --> Remove Secondary Alignments(Samtools) --> MarkDuplicates(Picard)
# --> RemoveDuplicates(Samtools) --> SNV Calling (VarScan2)

Tools to be installed by hand

Tool Optional/Mandatory Comment Version
GATK Mandatory The jar needs to be registered by conda 3.5
JointSNVMix Optional 0.75
JointSNVMix2 Optional current
Seqpurge Optional current
mutect Optional current
dindel Optional current
rankCombineVariants Optional current
bicseq2 Optional current
annovar Optional current
facets Optional current
somaticseq Optional v2.1.2
strelka Optional current