Barcoding with SMRT Analysis 2.3

jrharting edited this page Oct 16, 2014 · 7 revisions

Scoring Modes

SMRT® Analysis v2.3 barcoding protocol includes two scoring modes which can be used according to study design and needs. The scoring modes are determined by whether the barcode sequences on either end of the insert are the same or different.

  • Symmetric Mode: Barcode sequences are the same on both sides of the insert.
  • Asymmetric Mode: Different barcode sequences on either end of the insert (a.k.a 'paired' mode).

Symmetric Barcodes

This model assumes the same barcode sequence is appended to both ends of an insert sequence, e.g. 0001_Forward--0001_Forward. Symmetric barcode scoring can be used with barcode-tailed PCR primers. The reverse primer barcode is the reverse compliment of the forward primer barcode.

Symmetric barcodes may also be used with barcoded adapters which are appended to inserts during ligation.

The barcode FASTA for secondary analysis lists each barcode in the sample a single time.

>0001_Forward
TCAGACGATGCGTCAT
>0002_Forward
CTATACATGACTCTGC
>0003_Forward
TACTAGAGTAGCACTC
...

Symmetric Barcoding Schematic: Symmetric Barcoding Schematic.

Asymmetric Barcodes

In this model a pair of barcodes occurs together, one on each end of the insert, each having a different sequence. Asymmetric barcodes should be listed in pairwise order in the barcode FASTA file.

>0001_Forward
TCAGACGATGCGTCAT
>0002_Reverse
CTATACATGACTCTGC
>0003_Forward
TACTAGAGTAGCACTC
>0004_Reverse
TGTGTATCAGTACATG
...

In the above case, 0001_Forward is paired with 0002_Reverse (0001_Forward--0002_Reverse), 0003_Forward is paired with 0004_Reverse (0003_Forward--0004_Reverse), and so on. The final output label will depend on the highest scoring pair of barcodes. Also, please note that the above format listing both sequences in forward orientation is correct for secondary analysis; however, when ordering primers the reverse compliment of the second barcode in each pair should be used. Please see sample prep guidelines for more detailed information.

Asymmetric Barcodes with mixed pairs

Asymmetric barcoded amplicons with mix-and-match sets of forward and reverse barcodes can be demultiplexed by generating a barcode FASTA with the expected pairs of barcodes and passing this to the paired model described above. For example, if you have ordered three sets of forward and reverse primers with barcodes as follows:

Forward:
0001_Forward,0002_Forward,0003_Forward

Reverse:
0004_Reverse,0005_Reverse,0006_Reverse

then the barcode FASTA for analysis with this set is the following:

>0001_Forward
TCAGACGATGCGTCAT
>0004_Reverse
TGTGTATCAGTACATG
>0001_Forward
TCAGACGATGCGTCAT
>0005_Reverse
ACACGCATGACACACT
>0001_Forward
TCAGACGATGCGTCAT
>0006_Reverse
GATCTCTACTATATGC
>0002_Forward
CTATACATGACTCTGC
>0004_Reverse
TGTGTATCAGTACATG
>0002_Forward
CTATACATGACTCTGC
>0005_Reverse
ACACGCATGACACACT
>0002_Forward
CTATACATGACTCTGC
>0006_Reverse
GATCTCTACTATATGC
>0003_Forward
TACTAGAGTAGCACTC
>0004_Reverse
TGTGTATCAGTACATG
>0003_Forward
TACTAGAGTAGCACTC
>0005_Reverse
ACACGCATGACACACT
>0003_Forward
TACTAGAGTAGCACTC
>0006_Reverse
GATCTCTACTATATGC

This FASTA can be generated by selecting barcodes using this HTML tool.

Asymmetric Barcoding Schematic: Asymmetric Barcoding Schematic

Note: Sequenced molecules where only one barcode was read will still be labeled with a barcode based on the single score. In the general asymmetric mode where forward and/or reverse barcodes are re-used in different combinations, single-barcode binning is "undefined" in that they will end up being placed with the first pair in the list containing the sequenced barcode. Such molecules can be filtered in SMRT Portal by pass number (ReadsOfInsert) or directly by restricting the minimum number of adapters on the command line (see below).

Note: Uploading reference FASTA files to the reference repository with repeated sequences will fail. Barcode FASTA files for SMRT Portal protocols can be located anywhere in your file system where they are visible to the SMRT Analysis install. Make sure to include the full path description in the barcode FASTA protocol details.

First Generation "pacbio_barcodes_paired"

We are currently recommending users to utilize to our new set of 384 barcodes. The new set is larger, can be used symmetrically, and has been designed with greater edit distance between barcodes. For users with older datasets and/or stock of primers with the first generation set of 48 barcode pairs, all secondary analyses will be the same as above using the asymmetric mode. One notable difference is the presence of a constant 5-bp padding sequence appended to the barcodes. This does not alter de-multiplexing results. For clarity, a FASTA example and schematic for the older barcodes are provided below.

>F_1
GGTAGgcgctctgtgtgcagc
>R_1
agagtactacatatgaGATGG
>F2
GGTAGtcatgagtcgacacta
>R2
cgtgtgcatagatcgcGATGG
...

First-generation Paired Barcoding with Padding Schematic: First-generation Paired Barcoding Schematic.

SMRT® Portal Barcode Protocols in version 2.3

SMRT Portal includes four Protocols which make use of the barcoding module for demultiplexing datasets, as well as basic filtering functionality for ensuring high-fidelity binning of outputs. For more fine-tuned filtering, see command-line options below.

Barcoding Protocols:

  • RS_Subreads
    • Generate barcode scoring file and export fastq per barcode.
  • RS_ReadsOfInsert
    • Generalized CCS protocol. Use this protocol for single-molecule consensus with barcoding
  • RS_Resequencing_Barcode
    • Generate alignments labeled by barcode. Quiver called on whole dataset -- for per-barcode variant calls using Quiver and/or MinorVariants, please see instructions on previous page.
  • RS_Long_Amplicon_Analysis
    • Clustering and phasing tool for getting consensuses on clusters of reads.

###Enable Barcode Module

Barcoding is optional in the RS_Subreads, RS_ReadsOfInsert, and RS_Long_Amplicon_Analysis protocols. To enable barcoding, make sure to select the Barcoding.1.xml module in the Protocol Details pane:

Figure 1. Select optional barcoding module.

###Set Barcode Module Parameters

Figure 2. Barcode parameters.

Barcode Structure

Select from the barcode modes described above.

Barcode FASTA file

The default in 2.3 is pre-set to point to the new set of 384 16bp barcodes in the reference directory. User-defined FASTA files of barcode sequences do not need to be imported to the reference database as long as the file is stored in a location accessible by the SMRT® Portal installation. Be sure to use the full path to the barcode FASTA file.

FASTA file for the new set of 384 16bp barcodes

FASTA file for the 16bp barcode adapters

FASTA file for the 7bp barcode adapters

Minimum Barcode Score

This parameter will filter outputs (fastq and aligned reads) by the minimum average barcode score for each molecule. The maximum possible score is 2 x (length of sequence in barcode FASTA). In the EGFR-MET test dataset using 21 bp padded barcodes, 99.5% calling accuracy is achieved at a minimum barcode score of 30. The same calling accuracy with 16bp unpadded barcodes can be achieved using a minimum barcode score of 23.

SMRT® Portal Barcode Outputs

  • RS_Subreads
    • Barcoded Reads: compressed archive of fastq subread files, one per barcode.
  • RS_ReadsOfInsert
    • Barcoded Reads: compressed archive of fastq ReadsOfInsert (CCS) files, one per barcode.
  • RS_Resequencing_Barcode
    • Barcoded Reads: compressed archive of fastq subread files, one per barcode.
    • Aligned Reads: single alignment file(s) with barcode labels.
  • RS_Long_Amplicon_Analysis
    • Amplicon Sequences: fasta/fastq file with consensus sequences by barcode, cluster, and phase.

Command-line pbbarcode

Users are encouraged to access demultiplexed results via SMRT Portal. However, some options in pbbarcode are only available at the command line. Extra help for the following tools can be found with the following:

pbbarcode <subtool> --help

Detailed information on the pbbarcode subparsers and options can be found here.

Subtools

  • pbbarcode labelZmws
    • Scores raw data against barcode FASTA.
  • pbbarcode emitFastqs
    • Generates fastq/fasta files per barcode with optional filtering.
    • Note: Use input.fofn containing list of <movie>.ccs.h5 files from RS_ReadsOfInsert to produce CCS reads.
  • pbbarcode labelAlignments
    • Labels alignments in aligned_reads.cmp.h5 file with optional filtering.
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.