Skip to content
Switch branches/tags
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Constructing Computational Pipelines

Russell S. Hamilton

Centre for Trophoblast Research, Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Site, Cambridge, CB2 3DY, UK

ChuangKee Ong

Open Targets, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK


Ong, C-K & Hamilton, R.S. (2018) Constructing Computational Pipelines Encyclopedia of Bioinformatics and Computational Biology, 3, 135-143 [DOI]

Worked Example Code and Documentation

Implementation of a simple 5 step RNA-Seq workflow in a selection of pipeline tools: Bash, Clusterflow and eHive

Pipeline Tools URL

The software packages for the basic workflow steps are in the table below and are required to be installed prior to running the pipeline tools.

Resource Brief Description URL
FastQC Quality assessment on Fastq files
Trim_galore Trim low quality and adapters from Fastq files
HiSat2 Performs alignment of reads to reference genome
HTSeq-counts Gene level quantification of aligned reads
QualiMap Quality assessment on alignned reads
MultiQC Aggregates results from analyses performed

Bash Script Example of Simple RNA-Seq

In this very simple bash shell script the read files and reference genome should be edited manually at the top of the file:

Change the filenames to match the names of the samples to be run


Change the filenames to correspond to the annotation (GTF) and indexed reference genome appropriate for the samples being analysed


Ensure the script has the executable permissions

$ chmod 755

Run the script from the command line

$ ./

Clusterflow Example of Simple RNA-Seq

Download and install clusterflow from the link in the table above. Clusterflow modules from each of the pipeline steps are already included in Clusterflow, with the exception of qualimap_rnaseq. This file (qualimap_rnaseq.cfmod) is provided in the Clusterflow directory and should be copied to the Clusterflow installation module directory. The simple RNA-Seq pipeline is provided as a file (SimpleRNA-Seq.config) and should be copied to the clusterflow installation pipeline directory

Once installed Clusterflow can be run on a set of sample with the following command. Replace the genome reference as appropriate (note these should be specified as part of the Clusterflow install).

$ cf --genome <YOURGENOME REF> SimpleRNA-Seq *.fq.gz

Results will be written into the directory the cf command was run from

eHive Example of Simple RNA-Seq

Pre-requisite: A Mysql instance

Clone eHive module from github repository in the table above. Setup $PERL5LIB pointing to the ehive cloned repository, next setup eHive Mysql database parameters $HIVE_HOST, $HIVE_PORT, $HIVE_USER, $HIVE_PASS, $hive_dbname.

  • Run the pipeline initialisation step using the configuration file: ehiveRNAseq::RNAseq_conf \
  -hive_host ${HIVE_HOST}   \
  -hive_port ${HIVE_PORT}   \
  -hive_user ${HIVE_USER}   \
  -hive_password ${HIVE_PASS} \
  -hive_dbname ${hive_dbname} \
  -data_dir [where your raw data resides] \
  -output_dir [where the output will be] \
  -hive_force_init 1 \
  -flag_pe 1   
  • After successfully initialised the eHive database for the pipeline, execute the beekeeper script to start creating worker to run jobs. -url mysql://[mysql_username]:[mysql_password]@[mysql_hostname]:[mysql_port]/[hive_dbname] -sync -url mysql://[mysql_username]:[mysql_password]@[mysql_hostname]:[mysql_port]/[hive_dbname] -loop


Ong, C-K & Hamilton, R.S. (2018) Constructing Computational Pipelines Encyclopedia of Bioinformatics and Computational Biology, 3, 135-143



No releases published


No packages published