-
Notifications
You must be signed in to change notification settings - Fork 0
lr_lordec_contam_filter
What is LoRDEC?
The first hybrid long read correction tool that made use of the de Bruijn graph (DBG) approach. (Noisy long reads are corrected by aligning them to a path in a de Bruin graph constructed from the k-mers of accurate short reads)
The answer is yes. But it also is no. Some of the newer tools show higher performance and accuracy for genomic long reads. However, it can be used to correct transcriptomic long reads, either on its own, or to further polish RNA-Bloom corrected long reads.
LoRDEC corrected reads have their corrected bases denoted with upper case characters while uncorrected bases are denoted with lower case characters. Using filtered short reads, e.g. by Kraken2, we can remove potential contamination in the long reads by removing those that are uncorrected, i.e. that do not have short read equivalents.
RNA-Bloom also follows the DBG approach, however there are two important differences
- DBG construction uses k-mers from long reads and if provided additionally from short reads
- Alignment of long reads to the DBG is expression-level-aware
- It trims uncorrected ends
RNA-Bloom is generally able to correct a larger fractions of the long reads as it depends less on short read coverage and is less likely to alter the transcript isoform of the long read. At the same time, its error rate is slightly increased due to the use of long read k-mers. Long reads with sufficient short-read coverage can be further polished by LoRDEC. Read ends that are not corrected by LoRDEC are still of decent quality, so we do not necessarily want to remove them.
usage: lr_lordec_contam_filter [-h] [-m] [-v] longreads
A simple script facilitating the filtering contaminations from long reads corrected with
LoRDEC using Kraken2-filtered short reads. (It removes long reads that were not corrected
by LoRDEC)
positional arguments:
longreads
options:
-h, --help show this help message and exit
-m M Minimum number of corrected bases in a long reads to accept [default: 21]
-v V Report progress every v processed long reads [default: 100,000]