Skip to content

A wrapper to Heng Li's kseq/readfq, an efficient FastQ/Fasta parser

License

Notifications You must be signed in to change notification settings

andreas-wilm/nimreadfq

Repository files navigation

Note

This repository was started before Heng Li wrote his article "Fast high-level programming languages", which contains a native Nim implementation (see klib below), which is just as fast as the implementation here (depending on whether you reuse memory or not) and could simply be used instead.

nimreadfq

A Nim wrapper for Heng Li's kseq/readfq, an efficient and fast parser for FastQ and Fasta files. nimreadfq supports reading of FastQ and Fasta files from stdin (use "-"), gzipped or flat files and is fast (see benchmark below).

The main function is readFQ(), an iterator that yields FQRecord(s). An alternative is readFQPtr(), which returns FQRecordPtr(s). The difference is that the latter uses ptr char instead of strings and is thus potentially faster but memory is reused during iterations.

See example.nim and tests/tester.nim for code examples.

The initial Nim integration (and hard work) was done by Haibao Tang as part of his bio-pipeline repo. Haibao generously granted full rights to his code base, after which I started this separate package called nimreadfq for integration into nimble.

Benchmark

nimreadfq is significantly faster than packages with similar functionality. Below are example timings for reading 5,682,010 sequences from M_abscessus_HiSeq.fq (source; see also ./benchmark/get_fq.sh) run on my MacBook Pro 2019:

fastq:

  • readfqPtr: 2.3s
  • klib: 7.0s
  • readfq: 7.6s
  • fastx: 39.6s
  • bioseq: 42.1s

fastq.gz:

  • readfq gz: 15.6s
  • klib gz: 15.8s
  • bioseq gz: 150.0s

How to reproduce results:

cd ./benchmark
nimble build
./benchmark

About

A wrapper to Heng Li's kseq/readfq, an efficient FastQ/Fasta parser

Resources

License

Stars

Watchers

Forks

Packages

No packages published