Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Barcoding with SMRT Analysis 2.3
SMRT® Analysis v2.3 barcoding protocol includes two scoring modes which can be used according to study design and needs. The scoring modes are determined by whether the barcode sequences on either end of the insert are the same or different.
- Symmetric Mode: Barcode sequences are the same on both sides of the insert.
- Asymmetric Mode: Different barcode sequences on either end of the insert (a.k.a 'paired' mode).
This model assumes the same barcode sequence is appended to both ends of an insert sequence, e.g. 0001_Forward--0001_Forward. Symmetric barcode scoring can be used with barcode-tailed PCR primers. The reverse primer barcode is the reverse compliment of the forward primer barcode.
Symmetric barcodes may also be used with barcoded adapters which are appended to inserts during ligation.
The barcode FASTA for secondary analysis lists each barcode in the sample a single time.
>0001_Forward TCAGACGATGCGTCAT >0002_Forward CTATACATGACTCTGC >0003_Forward TACTAGAGTAGCACTC ...
Symmetric Barcoding Schematic:
In this model a pair of barcodes occurs together, one on each end of the insert, each having a different sequence. Asymmetric barcodes should be listed in pairwise order in the barcode FASTA file.
>0001_Forward TCAGACGATGCGTCAT >0002_Reverse CTATACATGACTCTGC >0003_Forward TACTAGAGTAGCACTC >0004_Reverse TGTGTATCAGTACATG ...
In the above case, 0001_Forward is paired with 0002_Reverse (0001_Forward--0002_Reverse), 0003_Forward is paired with 0004_Reverse (0003_Forward--0004_Reverse), and so on. The final output label will depend on the highest scoring pair of barcodes. Also, please note that the above format listing both sequences in forward orientation is correct for secondary analysis; however, when ordering primers the reverse compliment of the second barcode in each pair should be used. Please see sample prep guidelines for more detailed information.
Asymmetric Barcodes with mixed pairs
Asymmetric barcoded amplicons with mix-and-match sets of forward and reverse barcodes can be demultiplexed by generating a barcode FASTA with the expected pairs of barcodes and passing this to the paired model described above. For example, if you have ordered three sets of forward and reverse primers with barcodes as follows:
Forward: 0001_Forward,0002_Forward,0003_Forward Reverse: 0004_Reverse,0005_Reverse,0006_Reverse
then the barcode FASTA for analysis with this set is the following:
>0001_Forward TCAGACGATGCGTCAT >0004_Reverse TGTGTATCAGTACATG >0001_Forward TCAGACGATGCGTCAT >0005_Reverse ACACGCATGACACACT >0001_Forward TCAGACGATGCGTCAT >0006_Reverse GATCTCTACTATATGC >0002_Forward CTATACATGACTCTGC >0004_Reverse TGTGTATCAGTACATG >0002_Forward CTATACATGACTCTGC >0005_Reverse ACACGCATGACACACT >0002_Forward CTATACATGACTCTGC >0006_Reverse GATCTCTACTATATGC >0003_Forward TACTAGAGTAGCACTC >0004_Reverse TGTGTATCAGTACATG >0003_Forward TACTAGAGTAGCACTC >0005_Reverse ACACGCATGACACACT >0003_Forward TACTAGAGTAGCACTC >0006_Reverse GATCTCTACTATATGC
This FASTA can be generated by selecting barcodes using this HTML tool.
Asymmetric Barcoding Schematic:
Note: Sequenced molecules where only one barcode was read will still be labeled with a barcode based on the single score. In the general asymmetric mode where forward and/or reverse barcodes are re-used in different combinations, single-barcode binning is "undefined" in that they will end up being placed with the first pair in the list containing the sequenced barcode. Such molecules can be filtered in SMRT Portal by pass number (ReadsOfInsert) or directly by restricting the minimum number of adapters on the command line (see below).
Note: Uploading reference FASTA files to the reference repository with repeated sequences will fail. Barcode FASTA files for SMRT Portal protocols can be located anywhere in your file system where they are visible to the SMRT Analysis install. Make sure to include the full path description in the barcode FASTA protocol details.
First Generation "pacbio_barcodes_paired"
We are currently recommending users to utilize to our new set of 384 barcodes. The new set is larger, can be used symmetrically, and has been designed with greater edit distance between barcodes. For users with older datasets and/or stock of primers with the first generation set of 48 barcode pairs, all secondary analyses will be the same as above using the asymmetric mode. One notable difference is the presence of a constant 5-bp padding sequence appended to the barcodes. This does not alter de-multiplexing results. For clarity, a FASTA example and schematic for the older barcodes are provided below.
>F_1 GGTAGgcgctctgtgtgcagc >R_1 agagtactacatatgaGATGG >F2 GGTAGtcatgagtcgacacta >R2 cgtgtgcatagatcgcGATGG ...
First-generation Paired Barcoding with Padding Schematic:
SMRT® Portal Barcode Protocols in version 2.3
SMRT Portal includes four Protocols which make use of the barcoding module for demultiplexing datasets, as well as basic filtering functionality for ensuring high-fidelity binning of outputs. For more fine-tuned filtering, see command-line options below.
- Generate barcode scoring file and export fastq per barcode.
- Generalized CCS protocol. Use this protocol for single-molecule consensus with barcoding
- Generate alignments labeled by barcode. Quiver called on whole dataset -- for per-barcode variant calls using Quiver and/or MinorVariants, please see instructions on previous page.
- Clustering and phasing tool for getting consensuses on clusters of reads.
###Enable Barcode Module
Barcoding is optional in the RS_Subreads, RS_ReadsOfInsert, and RS_Long_Amplicon_Analysis protocols. To enable barcoding, make sure to select the Barcoding.1.xml module in the Protocol Details pane:
###Set Barcode Module Parameters
Select from the barcode modes described above.
Barcode FASTA file
The default in 2.3 is pre-set to point to the new set of 384 16bp barcodes in the reference directory. User-defined FASTA files of barcode sequences do not need to be imported to the reference database as long as the file is stored in a location accessible by the SMRT® Portal installation. Be sure to use the full path to the barcode FASTA file.
Minimum Barcode Score
This parameter will filter outputs (fastq and aligned reads) by the minimum average barcode score for each molecule. The maximum possible score is 2 x (length of sequence in barcode FASTA). In the EGFR-MET test dataset using 21 bp padded barcodes, 99.5% calling accuracy is achieved at a minimum barcode score of 30. The same calling accuracy with 16bp unpadded barcodes can be achieved using a minimum barcode score of 23.
SMRT® Portal Barcode Outputs
- Barcoded Reads: compressed archive of fastq subread files, one per barcode.
- Barcoded Reads: compressed archive of fastq ReadsOfInsert (CCS) files, one per barcode.
- Barcoded Reads: compressed archive of fastq subread files, one per barcode.
- Aligned Reads: single alignment file(s) with barcode labels.
- Amplicon Sequences: fasta/fastq file with consensus sequences by barcode, cluster, and phase.
Users are encouraged to access demultiplexed results via SMRT Portal. However, some options in pbbarcode are only available at the command line. Extra help for the following tools can be found with the following:
pbbarcode <subtool> --help
Detailed information on the pbbarcode subparsers and options can be found here.
- pbbarcode labelZmws
- Scores raw data against barcode FASTA.
- pbbarcode emitFastqs
- Generates fastq/fasta files per barcode with optional filtering.
- Note: Use input.fofn containing list of <movie>.ccs.h5 files from RS_ReadsOfInsert to produce CCS reads.
- pbbarcode labelAlignments
- Labels alignments in aligned_reads.cmp.h5 file with optional filtering.