SMRT Analysis Release Notes v2.1

fripp edited this page Oct 4, 2013 · 29 revisions
Clone this wiki locally

Introduction

The SMRT® Analysis software suite performs assembly and variant detection analysis of sequencing data generated by the Pacific Biosciences instrument.

Installation

For installation instructions, see SMRT® Analysis Software Installation.

New Features in v2.1

Important Note:

Circular consensus sequencing (CCS) is no longer performed on the Blade Center. Instead, you must run the RS_ReadsOfInsert protocol. That protocol provides more options to determine the highest quality single molecule reads, and includes built-in DNA barcoding support. The Blade Center now processes base calls in real-time, and data is available immediately after a run regardless of insert size and movie time.

BridgeMapper Protocol

  • Use the new BridgeMapper protocol to find sequences that map to multiple parts of a genome assembly, to help with assembly QC and other tasks. For information on BridgeMapper and its specialized SMRT View visualization mode, see the SMRT View online help.

SMRT® Pipe - Consensus

  • Quiver was upgraded to handle diploid variant calling. As a result, the RS_Resequencing_CCS_GATK and RS_Resequencing_GATK protocols were removed in favor of the diploid Quiver option in the RS_Resequencing and RS_ReadsOfInsert_Resequencing protocols. Note that the RS_Resequencing_GATK_Barcode protocol remains for this release, but we intend to remove it in the next version of SMRT® Analysis.

SMRT® Pipe - Mapping

A new BLASR option, -concordant, aligns all subreads of a ZMW to where the longest full pass subread of this ZMW has aligned to. This option is now turned on by default in the following SMRT Pipe protocols:

  • RS_Resequencing
  • RS_Modification_and_motif_Analysis
  • RS_Modification_Detection

SMRT® Pipe - Assembly

  • The new HGAP 2 protocol has significantly increased performance; assembly time and the memory footprint are dramatically reduced. Cluster time is reduced up to 100-fold, and disk space use is reduced from gigabytes to tens of megabytes. Assembly output is also improved with some new chimera detection and sequencing filtering. To help you compare the differences in assembly results, we include both the original HGAP and the new HGAP 2 protocols in SMRT Portal (RS_HGAP_Assembly.1 and RS_HGAP_Assembly.2). The HGAP 2 workflow has been configured for larger genomes with potentially lower coverage, with defaults that may not make sense for high-coverage bacterial genomes. To assemble these high-coverage genomes using HGAP 2, consider changing the default Assembly->Target Coverage parameter from 30 to 15.

  • As CCS is no longer performed on the instrument, we removed options to use CCS reads for pre-assembly and alignment. To use CCS or Reads of Insert for pre-assembly, please use a FASTQ file of reads. To make sure that subreads all align to the same location, resequencing analysis now uses the alignment of the longest subread from a ZMW to constrain the mapping of all other subreads. (See the BLASR -concordant option.)

  • The Allora algorithm and RS_Allora_Assembly protocols were retired in v2.1. Please use the RS_HGAP_Assembly protocol instead.

SMRT® Pipe - Barcoding

  • Analysis jobs with barcoding (such as those using the RS_ReadsOfInsert protocol) now provide a simple barcoding report with the number of reads for each barcode in a table. To help improve this table (or any other features), please send us your feedback at PBFeedback@pacificbiosciences.com.

SMRT® Portal

  • Added a "Forgot your password?" link in the login screen to reset your password.
  • When a job is created, the SMRT Analysis version is now stored. SMRT Portal displays this version number in the View Data tab.
  • Added a new Barcoding report that is generated when running jobs that include barcoding. You can also now specify a score threshold for calling barcodes when setting up a barcoding job.

SMRT® View

  • Now displays BridgeMapper visualizations, used to find sequences that map to multiple parts of a genome assembly. This can help with assembly QC and other tasks. See the SMRT View online help for details.

Bioinformatics Tools

  • pbalign is a tool which aligns Pacific Biosciences' reads in various formats (e.g., bax.h5/plx.h5/ccs.h5/FASTA/fofn) to reference sequences, and produces alignments in SAM or CMP.H5 format. It is designed to help you align Pacific Biosciences' reads and generate alignments in convenient formats for downstream analysis without access to SMRT Portal. For example, you can use pbalign with the --forQuiver option to produce alignments in a CMP.H5 file, which has all the pulse information loaded and can be consumed by Quiver directly. pbalign is available here.

Base Modifications

SMRT® Analysis includes base modification support for P5-C3 (P5 polymerase with C3 sequencing chemistry). The same three types of modification are still supported in identification:

  • N6-methyladenine
  • N4-methylcytosine
  • Tet-converted 5-methylcytosine (5-carboxylcytosine)

Installation/Upgrade

  • Streamlined the directory structure: All administrative scripts are now in the $SMRT_ROOT/admin/bin directory.
  • Streamlined the directory structure: All analysis data are now in the $SMRT_ROOT/userdata directory.
  • Install and upgrade procedures now attempt to automatically detect and propagate system configurations, including SGE environment variables.
  • The software tarball is now a “.run” self-extracting executable.

New Protocols in v2.1

RS_ReadsOfInsert

  • This protocol extracts the biologically meaningful portion of the sequenced read. It replaces CCS on the instrument and produces “reads_of_insert” fasta and fastq files containing reads from the insert sequence of single molecules, optionally splitting by barcode.

RS_LongAmpliconAnalysis

  • This protocol enables haplotype analysis by detecting phased variants in consensus sequences for pooled amplicon data, optionally splitting by barcode.

RS_cDNA_Mapping

  • This protocol now produces a report and summary statistics on the alignment of cDNA transcripts to a genomic DNA reference using the third-party software tool GMAP. Reads are filtered by length and quality and then mapped against the reference using GMAP to span introns.

Fixed Issues in v2.1

SMRT® Pipe - Assembly

  • The AHA algorithm was refactored to provide the pbaha.py executable. The pbaha.py executable allows use of the AHA scaffolding and gapfilling algorithms outside of smrtpipe.py. In SMRT Analysis v2.1.0, pbaha.pyis the only way to execute AHA on Pacific Biosciences' reads in FASTA files. (18912). Note: AHA ("A Hybrid Assembler") is the Pacific Biosciences hybrid assembly algorithm. It is based on the open source assembly software package AMOS, with additional software components tailored to Pacific Biosciences' long reads and error profile.

SMRT® Pipe - Reference Uploader

  • Reference creation now terminates with an error if duplicate sequences exist in any of the input FASTA files. (21947)

SMRT® Analysis Web Services API

  • The Create Job function accepts data via an HTTP POST request. Previously, the job was always saved with the reference embedded in the protocol xml, even if a different reference was specified in the POST data. This issue was fixed so that the reference specified in the POST data takes precedence. (22797)

SMRT® Portal

  • Fixed an issue where only 10 groups were visible in the Add User dialog’s Groups select box. Now all available groups display. (23591)

Known Issues in v2.1

SMRT® Pipe - Mapping

  • BLASR does not process renamed files correctly. (21439)

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2010 - 2013, Pacific Biosciences of California, Inc. All rights reserved. Information in this document is subject to change without notice. Pacific Biosciences assumes no responsibility for any errors or omissions in this document. Certain notices, terms, conditions and/or use restrictions may pertain to your use of Pacific Biosciences products and/or third party products. Please refer to the applicable Pacific Biosciences Terms and Conditions of Sale and to the applicable license terms at http://www.pacificbiosciences.com/licenses.html. P/N 100-262-000