Dr. Armin Töpfer, armintoepfer.com
RNA viruses are present in a single host as a population of different but related strains. This population, shaped by the combination of genetic change and selection, is called quasispecies. Genetic change is due to both point mutations and recombination events. We present a jumping hidden Markov model that describes the generation of the viral quasispecies and a method to infer its parameters by analysing next generation sequencing data. We offer an implementation of the EM algorithm to find maximum a posteriori estimates of the model parameters and a method to estimate the distribution of viral strains in the quasispecies. The model is validated on simulated data, showing the advantage of explicitly taking the recombination process into account, and validated on experimental HIV samples.
This java command line application is a toolbox, combining all necessary steps to infer a viral quasispecies from Next Generation Sequencing (NGS) data.
If you use QuasiRecomb, please cite the paper Töpfer et al. in Journal of Computational Biology
Please get the latest binary at https://github.com/cbg-ethz/QuasiRecomb/releases
- First algorithm that models the recombination process
- Allows position-wise mutation events
- Infers a parametric probability distribution from the underlying viral population
- Error correction by estimating position-wise sequencing error-rates
- Local, gene- and genome-wide reconstruction
- Reports SNV (single nucleotide variant) posteriors
- Incorporates paired-end information
- Uses PHRED scores to weight each base of each read
- Input may contain paired-end and single reads
- Supports reads of all current sequencing technologies (454/Roche, Illumina and PacBio)
- Suitable for amplicon and shotgun sequencing projects
- Reports reconstructed haplotypes and their relative frequencies
- Reports translated proteins in all three reading frames with their relative frequencies
- Input data can be in BAM or SAM format
- JDK 7 (http://jdk7.java.net/)
If you are new to QuasiRecomb, please read the Beginners' guide to viral population inference
java -jar QuasiRecomb.jar -i alignment.bam
Reads need to be properly aligned.
-conservative
In this case, only major haplotypes will be reconstructed.
-noGaps
If deletions are not of interest, not expected, or only due to technical noise, all deletions will be ignored.
-K 2
-K 1-8
-r 790-2292
-quality
-noRecomb
-maxDel INT
-maxPercDel DOUBLE
Interval if between 0.0 - 1.0
-unpaired
If read names are not unique and reads are single-end, prevent pairing and merging. Should be used with 454/Roche sequencing data, because read names are often not unique.
The reconstructed DNA haplotype distribution quasispecies.fasta will be saved in the working directory.
An amino acid translation of the quasispecies in all three reading frame is saved as support/quasispecies_protein_(0|1|2).fasta, if -protein
is used.
Summary statistics can be produced with R:
R CMD BATCH support/coverage.R
R CMD BATCH support/modelselection.R
java -XX:NewRatio=9 -jar QuasiRecomb.jar
java -XX:NewRatio=9 -Xms2G -Xmx10G -jar QuasiRecomb.jar
java -XX:+UseParallelGC -XX:NewRatio=9 -Xms2G -Xmx10G -jar QuasiRecomb.jar
java -XX:+UseParallelGC -XX:+UseNUMA -XX:NewRatio=9 -Xms2G -Xmx10G -jar QuasiRecomb.jar
function qr() { java -XX:+UseParallelGC -Xms2g -Xmx10g -XX:+UseNUMA -XX:NewRatio=9 -jar ~/QuasiRecomb.jar $*; }
Further help can be showed by running without additional parameters:
java -jar QuasiRecomb.jar
- Maven 3 (http://maven.apache.org/)
cd QuasiRecomb
mvn -DartifactId=samtools -DgroupId=net.sf -Dversion=1.8.9 -Dpackaging=jar -Dfile=src/main/resources/jars/sam-1.89.jar -DgeneratePom=false install:install-file
mvn clean package
java -jar QuasiRecomb/target/QuasiRecomb.jar
Armin Töpfer
armin.toepfer (at) gmail.com
http://www.armintoepfer.com
GNU GPLv3 http://www.gnu.org/licenses/gpl-3.0