Compatible Software

The following software packages are known to be compatible with PacBio® data, in addition to PacBio's own SMRT® Analysis suite. All packages are believed to be open source or freely available for non-commercial use. See the individual project sites for up-to-date license information. A separate page lists commercial software.

Know of any other open source software for PacBio data? Email us.

Software categories:

De novo assembly
Structural Variations Detection
Reference-based alignment
Consensus and variant calling
RNA analysis
Epigenetic base modifications and methylation
Genome Browsers

De novo assembly

Detailed information on Large Genome Assembly with PacBio Long Reads is published here

Falcon: An experimental diploid assembler, tested on ~100 Mb genomes
Canu: Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing
wtdbg2: A fuzzy Bruijn graph approach to long noisy reads assembly
MHAP: This is a reference implementation of a probabilistic sequence overlapping algorithm. Designed to efficiently detect all overlaps between noisy long-read sequence data. It efficiently estimates Jaccard similarity by compressing sequences to their representative fingerprints composed on min-mers (minimum k-mer).
HGAP: hierarchical genome assembler for PacBio long reads only. Bundled in SMRT Analysis since v1.4
HBAR-DTK: Hierarchical-Based AssembleR Development ToolKit, recommended for advanced users only
ALLORA: a long read assembler for PacBio long reads alone. Available only in SMRT Analysis. Since v1.0.
Celera® Assembler: Celera® Assembler 8.1 now offers a way to directly assemble subreads
Sprai: A preassembly-based assembler that aims to generate longer contigs
PBcR self-correction: A mode within PBcR (aka pacBioToCA) to do self-correction in the same style as HGAP. Celera® Assembler 8.2 uses the MHAP algorithm for faster overlap calculation during the self-correction phase.
pacBioToCA + Celera® Assembler: A scalable hybrid assembly to combine PacBio long reads with Illumina®, 454, Sanger, Ion Torrent or CCS. Bundled in SMRT Analysis from v1.3.3
ECTools: A set of tools for hybrid assembly. It that contigs instead of short reads for correction.
SPAdes: True hybrid assembler, PacBio with Illumina or Ion Torrent; small(er) genomes only
Cerulean: Ceruleanis a hybrid assembly. It starts with an assembly graph from Abyss and extends contigs by resolving bubbles in the graph using PacBio long reads. Was successfully run on genomes <100 Mb.
dbg2olc: dbg2olc is a hybrid assembly which uses Illumina contigs as anchors to build an overlap graph with PacBio reads, allowing very fast performance
ALLPATHS-LG: hybrid assembler for PacBio long reads plus Illumina mate pairs plus Illumina jumping libraries
AHA: A hybrid assembler to scaffold existing contigs and fill gaps. Available only in SMRT Analysis. Since v1.0
PBJelly 2: Gap filling and scaffolding for large genomes
MIRA: de novo assembler

Structural Variations Calling

Sniffles: Calls all types of structural variants using evidence from split-read alignments, high-mismatch regions, and coverage analysis.
SMRT-SV: Calls insertions, deletions, and inversions using a local assembly approach.

RNA Analysis

Iso-Con: for targeted Iso-Seq only. IsoCon is a tool for deriving finished transcripts from Iso-Seq reads. Input is a set of full-length-non-chimeric reads in fasta format and the CCS base call values as a bam file. The output is a set of predicted transcripts.
Cupcake: accompanying scripts for official Iso-Seq1, 2, and 3 output analysis.
TAMA: suite of downstream analysis scripts, including collapsing and merging transcript data. See TAMA wiki for more details.
SQANTI, a Iso-Seq QC and analysis software that can take long read output from either Iso-Seq, IDP, TAPIS, etc, and combine with short read, reference genome, annotations, to give a comprehensive description of the dataset. preprint
TAPPAS for isoform analysis and visualization, to be used after data has been cleaned up with SQANTI.
lncRNA Discovery Pipeline: Python scripts for using two ncRNA classifiers (CPAT and PLEK) for discovering long ncRNAs in Iso-Seq data.
ANGEL: Python library for doing both error-free and error-tolerant Open Reading Frame prediction
Cogent: Genome Reconstruction using Iso-Seq data only, without a reference genome.
SpliceMap-LSC-IDP pipeline: developed by Kin Fai Au's lab, a hybrid (long + short read) error correction and quantification software for transcriptome data.
IDP-fusion: a fusion detection finder using both long & short reads (hybrid).

Reference-based alignment

bwa-sw: Burrows-Wheeler aligner with Smith-Waterman

Consensus and variant calling

GATK: Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping
DeepVariant: DeepVariant is an analysis pipeline developed by Google that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
LoFreq: Low frequency variant caller. Recommended to switch off BAQ computation with -B. Calls all known mutations in the HBV amplicons data-set without false positives starting from 1.15% AF.

Epigenetic base modifications and methylation

R-kinetics: R package for kinetic analysis
MotifMaker: bundled in SMRT Analysis since v1.3.3
motif-finding: R code for motif analysis
kineticsTools: Python code for kinetic analysis

Genome Browsers

IGV: Integrative Genome Viewer from the Broad Institute
SMRT View: PacBio's Genome Browser for SMRT Sequencing data. Explore and interact with Resequencing, De novo, Base Modification and Identification, Motif Analysis, cDNA, Single Molecule and Barcoding experiment results
Tablet: Next Generation Sequence Assembly Visualization

Visit the PacBio Developer's Network Website for the most up-to-date links to downloads, documentation and more. Terms of Use | Trademarks | Contact Us

Provide feedback

Saved searches

Use saved searches to filter your results more quickly