The following software packages are known to be compatible with PacBio® data, in addition to PacBio's own SMRT® Analysis suite. All packages are believed to be open source or freely available for non-commercial use. See the individual project sites for up-to-date license information. A separate page lists commercial software.
Know of any other open source software for PacBio data? Email us.
- De novo assembly
- Structural Variations Detection
- Reference-based alignment
- Consensus and variant calling
- RNA analysis
- Epigenetic base modifications and methylation
- Genome Browsers
Detailed information on Large Genome Assembly with PacBio Long Reads is published here
- Falcon: An experimental diploid assembler, tested on ~100 Mb genomes
- Canu: Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing
- wtdbg2: A fuzzy Bruijn graph approach to long noisy reads assembly
- MHAP: This is a reference implementation of a probabilistic sequence overlapping algorithm. Designed to efficiently detect all overlaps between noisy long-read sequence data. It efficiently estimates Jaccard similarity by compressing sequences to their representative fingerprints composed on min-mers (minimum k-mer).
- HGAP: hierarchical genome assembler for PacBio long reads only. Bundled in SMRT Analysis since v1.4
- HBAR-DTK: Hierarchical-Based AssembleR Development ToolKit, recommended for advanced users only
- ALLORA: a long read assembler for PacBio long reads alone. Available only in SMRT Analysis. Since v1.0.
- Celera® Assembler: Celera® Assembler 8.1 now offers a way to directly assemble subreads
- Sprai: A preassembly-based assembler that aims to generate longer contigs
- PBcR self-correction: A mode within PBcR (aka pacBioToCA) to do self-correction in the same style as HGAP. Celera® Assembler 8.2 uses the MHAP algorithm for faster overlap calculation during the self-correction phase.
- pacBioToCA + Celera® Assembler: A scalable hybrid assembly to combine PacBio long reads with Illumina®, 454, Sanger, Ion Torrent or CCS. Bundled in SMRT Analysis from v1.3.3
- ECTools: A set of tools for hybrid assembly. It that contigs instead of short reads for correction.
- SPAdes: True hybrid assembler, PacBio with Illumina or Ion Torrent; small(er) genomes only
- Cerulean: Ceruleanis a hybrid assembly. It starts with an assembly graph from Abyss and extends contigs by resolving bubbles in the graph using PacBio long reads. Was successfully run on genomes <100 Mb.
- dbg2olc: dbg2olc is a hybrid assembly which uses Illumina contigs as anchors to build an overlap graph with PacBio reads, allowing very fast performance
- ALLPATHS-LG: hybrid assembler for PacBio long reads plus Illumina mate pairs plus Illumina jumping libraries
- AHA: A hybrid assembler to scaffold existing contigs and fill gaps. Available only in SMRT Analysis. Since v1.0
- PBJelly 2: Gap filling and scaffolding for large genomes
- MIRA: de novo assembler
Sniffles: Calls all types of structural variants using evidence from split-read alignments, high-mismatch regions, and coverage analysis.
SMRT-SV: Calls insertions, deletions, and inversions using a local assembly approach.
Iso-Con: for targeted Iso-Seq only. IsoCon is a tool for deriving finished transcripts from Iso-Seq reads. Input is a set of full-length-non-chimeric reads in fasta format and the CCS base call values as a bam file. The output is a set of predicted transcripts.
Cupcake: accompanying scripts for official Iso-Seq1, 2, and 3 output analysis.
SQANTI, a Iso-Seq QC and analysis software that can take long read output from either Iso-Seq, IDP, TAPIS, etc, and combine with short read, reference genome, annotations, to give a comprehensive description of the dataset. preprint
TAPPAS for isoform analysis and visualization, to be used after data has been cleaned up with SQANTI.
lncRNA Discovery Pipeline: Python scripts for using two ncRNA classifiers (CPAT and PLEK) for discovering long ncRNAs in Iso-Seq data.
ANGEL: Python library for doing both error-free and error-tolerant Open Reading Frame prediction
Cogent: Genome Reconstruction using Iso-Seq data only, without a reference genome.
SpliceMap-LSC-IDP pipeline: developed by Kin Fai Au's lab, a hybrid (long + short read) error correction and quantification software for transcriptome data.
IDP-fusion: a fusion detection finder using both long & short reads (hybrid).
- bwa-sw: Burrows-Wheeler aligner with Smith-Waterman
- GATK: Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping
- DeepVariant: DeepVariant is an analysis pipeline developed by Google that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
- LoFreq: Low frequency variant caller. Recommended to switch off BAQ computation with -B. Calls all known mutations in the HBV amplicons data-set without false positives starting from 1.15% AF.
- R-kinetics: R package for kinetic analysis
- MotifMaker: bundled in SMRT Analysis since v1.3.3
- motif-finding: R code for motif analysis
- kineticsTools: Python code for kinetic analysis
- IGV: Integrative Genome Viewer from the Broad Institute
- SMRT View: PacBio's Genome Browser for SMRT Sequencing data. Explore and interact with Resequencing, De novo, Base Modification and Identification, Motif Analysis, cDNA, Single Molecule and Barcoding experiment results
- Tablet: Next Generation Sequence Assembly Visualization