Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Welcome to the BitSeqVB_benchmarking wiki!
This set of script files can be used in order to replicate the simulation analysis presented in Section 3.1 (inference accuracy on synthetic data) of the BitSeqVB manuscript .
The following software is required:
- Anaconda Python Distribution
- BitSeq (version 0.7.0 or higher)
- Bowtie 2 (version 2.1.0 or higher).
- Cufflinks (version 2.1.1 or higher).
- eXpress (version 1.5.1 or higher)
- R with the following libraries: Genomic features, Rsamtools, Casper, parallel.
- RSEM (version 1.2.15 or higher)
- Sailfish (version 0.6.3 or higher)
- Samtools (version 0.1.18 or higher).
- Spanki simulator
- Tigar 2
- Tophat (version 2.0.9 or higher).
The gcc compiler (4.8.2 release or higher) should also be available in your machine.
This analysis is based on the UCSC/hg19 reference annotation (download link ~ 21GB). After downloading the annotation, follow the instructions written in the
simulationScripts/README file. The main jobscript is written in the commented file
commands.sh, consisting of the following steps:
- Choose dataset (4 simulation scenarios)
- Generate RPK values
- Simulate fastq files with spanki.
- Align reads with bowtie
- Align reads with tophat
- Run BitSeqMCMC
- Run BitSeqVB
- Run Casper
- Run Cufflinks
- Run RSEM
- Run Sailfish
- Run Tigar2
- Run eXpress
- Produce graphs
For a reasonable computing time the user should split the jobscript into parallel ones according to the instructions given in file
Warning: big data files will be generated
This downstream analysis was processed using the linux operating system on the High Performance Computing cluster (CSF) at the University of Manchester. The user has to make sure that at least 2.5T of free disk space is available.
- J Hensman, P Papastamoulis, P Glaus, A Honkela, M Rattray (2014). Fast and accurate approximate inference of transcript expression from RNA-seq data. arXiv preprint arXiv:1412.5995