Skip to content

lr_lordec_contam_filter

Simon Hegele edited this page Aug 20, 2025 · 8 revisions

What is LoRDEC?

The first hybrid long read correction tool that made use of the de Bruijn graph (DBG) approach. (Noisy long reads are corrected by aligning them to a path in a de Bruin graph constructed from the k-mers of accurate short reads)

Isn't LoRDEC outdated?

The answer is yes. But it also is no. Some of the newer tools show higher performance and accuracy for genomic long reads. However, it can be used to correct transcriptomic long reads, either on its own, or to further polish RNA-Bloom corrected long reads.

How can we use LoRDEC to filter contamination?

LoRDEC corrected reads have their corrected bases denoted with upper case characters while uncorrected bases are denoted with lower case characters. Using filtered short reads, e.g. by Kraken2, we can remove potential contamination in the long reads by removing those that are uncorrected, i.e. that do not have short read equivalents.

Why this script instead of just using lordec-trim?

RNA-Bloom also follows the DBG approach, however there are two important differences

  1. DBG construction uses k-mers from long reads and if provided additionally from short reads
  2. Alignment of long reads to the DBG is expression-level-aware
  3. It trims uncorrected ends

RNA-Bloom is generally able to correct a larger fractions of the long reads as it depends less on short read coverage and is less likely to alter the transcript isoform of the long read. At the same time, its error rate is slightly increased due to the use of long read k-mers. Long reads with sufficient short-read coverage can be further polished by LoRDEC. Read ends that are not corrected by LoRDEC are still of decent quality, so we do not necessarily want to remove them.

usage: lr_lordec_contam_filter [-h] [-m] [-v] longreads

A simple script facilitating the filtering contaminations from long reads corrected with
LoRDEC using Kraken2-filtered short reads. (It removes long reads that were not corrected
by LoRDEC)

positional arguments:
  longreads

options:
  -h, --help  show this help message and exit
  -m M        Minimum number of corrected bases in a long reads to accept [default: 21]
  -v V        Report progress every v processed long reads [default: 100,000]

Clone this wiki locally