Nanopore read de-multiplexer (read demux -> readux -> readucks, innit).
This package is inspired by the demultiplexing options in
porechop (by Ryan Wick) but without the adapter trimming options - it just demuxes. It uses the
parasail library with its Python bindings to do pairwise alignment which provides a considerable speed up over the
seqan library used by
porechop due to its low-level use of vector processor instructions.
Additional speed-ups come from specifying exactly which barcodes are present so it limits searching only to those (this is usually something you know, after all).
There is also more flexibility with how double barcodes are called (i.e., when you only call a read's barcode if it has the matching barcode sequence at both ends to reduce the chance of mis-called reads). In
readucks you can specify a lower identity threshold for the secondary barcode than the primary. The primary barcode is defined as the one with the highest match (either at the start or end of the read). The secondary barcode is then the other one. The secondary barcode must match the primary (i.e., be from the same pair) but may have more read errors.
We are currently investigating the optimal settings for this to trade off between sensitivity and specificity for double barcoding.s
One source file
misc.py is from
porechop and is provided with its original licencing information.
git clone https://github.com/rambaut/readucks.git
pip install biopython pip install parasail
cd readucks python setup.py install
Command line options:
usage: readucks -i INPUT_PATH [-o OUTPUT_DIR] [-b] [-a] [-e] [-p PREFIX] [-t THREADS] [-v VERBOSITY] [--single] [--native_barcodes] [--pcr_barcodes] [--rapid_barcodes] [--limit_barcodes_to LIMIT_BARCODES_TO [LIMIT_BARCODES_TO ...]] [--threshold THRESHOLD] [--secondary_threshold SECONDARY_THRESHOLD] [--scoring_scheme SCORING_SCHEME] [-h] [--version]
Provide a path to a input file or directory of input files to be processed. These should be either
FASTA files with appropriate file extensions (
Provide the path of a directory into which output files will be placed. If this is not specified then the output files will go into the current working directory (except for annotation CSV files - these will be placed along side their matching read files).
This option will bin reads into files according to their assigned barcodes. One file (either FASTQ or FASTA depending on the input file types) will be produced for each barcode that is called (and one for unassigned reads).
This option writes a CSV file for each input file (with a corresponding file name) containing the barcode assignment for each read.
When provided with the
-a option, this writes much more detailed information about the barcode matches to the annotation (CSV) file.
Give an optional prefix string that will be prepended to each output file.
The number of parallel threads to use (1 to turn off multithreading) (default: automatic)
Specify the level of output information: 0 = none, 1 = some, 2 = lots (default: 1)
Only attempts to match a single barcode at one end (default double)
Only attempts to match the 24 native nanopore barcodes (default)
Only attempts to match the 96 PCR barcodes
Only attempts to match the 12 rapid barcodes
--limit_barcodes_to LIMIT_BARCODES_TO [LIMIT_BARCODES_TO ...]
Specify a list of barcodes to look for (numbers, indexed from 1, refer to native, PCR or rapid barcodes as specified)
Barcode search settings:
A read must have at least this percent identity to a barcode (default: 90.0)
When double barcoding the second barcode (the one with the lower identity of the two) must have at least this percent identity (and match the first one) (default: 70.0)
Scoring scheme for the pairwise alignment. A comma-delimited string of alignment scores: match, mismatch, gap open, gap extend (default: 3,-6,-5,-2)
readucks -i my_reads.fastq -o demuxed/ -b --native_barcodes --verbosity 1
This will demux the reads in
my_reads.fastq, producing bin files in a directory called
demuxed (which must already exist), giving some feedback information to the screen.