RabbitQCPlus 2.0 Features

A modern quality control tool for sequencing data.

Single-threaded performance is improved by a factor of 2 at least.
Solving the performance issues when processing gz files (more than 4x speedups compared with SOAPnuke).
Improving the efficiency of the time-consuming over-representation module by a factor of 5.
Integrated and optimized the CARE error correction engine, achieving a speedup of 1.3 times.

Installation

RabbitQCPlus can only support 64-bit Linux Systems.

Bioconda

RabbitQCPlus is available from Bioconda.

conda create -n rabbitqcplus rabbitqcplus

conda activate rabbitqcplus

It's important to note that the version on Bioconda uses the SSE4.2 instruction set for automatic vectorization to enhance robustness, which may lead to a performance trade-off. For optimal performance, compiling from the source code is recommended.

From source

Dependancy

gcc 9.4.0 or newer
zlib

Compilation

git clone https://github.com/RabbitBio/RabbitQCPlus.git
cd RabbitQCPlus
make -j4

To improve the robustness of the software, we have implemented different software versions for different vectorized instruction sets. RabbitQCPlus can automatically detect the system CPU instruction set and compiler version at compile time to select the appropriate software version.

You can also specify the instruction set you want to use by manually modifying the InstructSet in the Makefile. -DVec512 means using the avx512 instruction set, and -DVec256 means using the avx2 instruction set; otherwise, let the compiler choose.

Simple usage

For next generation sequencing data

For SE (not compressed)

./RabbitQCPlus -w 8 -i in1.fastq -o p1.fastq

For SE (gzip compressed)

./RabbitQCPlus -w 8 -i in1.fastq.gz -o p1.fastq.gz

For PE (not compressed)

./RabbitQCPlus -w 8 -i in1.fastq -I in2.fastq -o p1.fastq -O p2.fastq

For PE (gzip compressed)

./RabbitQCPlus -w 16 -i in1.fastq.gz  -I in2.fastq.gz -o p1.fastq.gz -O p2.fastq.gz

For SE (use CARE engine to correct errors)

./RabbitQCPlus -w 32 -i in1.fastq -o p1.fastq --correctWithCare --coverage 30 --pairmode SE

For PE (use CARE engine to correct errors)

./RabbitQCPlus -w 32 -i in1.fastq -I in2.fastq -o p1.fastq -O p2.fastq --correctWithCare --coverage 30 --pairmode PE

For third generation sequencing data

not compressed

./RabbitQCPlus -w 4 -i in.fastq --TGS

gzip compressed

./RabbitQCPlus -w 6 -i in.fastq.gz --TGS

Options

For more help information, please refer to ./RabbitQCPlus -h.

Performance results

Experiments have been conducted on a Linux server with 2 Intel Xeon Platinum 8260 CPUs, 1.5T RAM, 2T SSD, running Ubuntu 20.04 and GCC9. We have compared the performance of RabbitQCPlus with RabbitQC (v0.0.1), fastp (v0.23.2), SOAPnuke (v2.1.7), Trimmomatic (v0.40), CARE (v2.0.0, CPU version) and FASTQC (v0.11.9) using 370 million Illumina sequencing reads (SRR7963242).

We use default parameters of each application except for the number of threads on both plain and gzip-compressed FASTQ files. When using default parameters, RabbitQCPlus performs the same or more comprehensive quality control operations compared to other applications.

Plain FASTQ files

Gzip-compressed FASTQ files

With error correction module on

Visual output

We visualized the information before and after data filtering, and here is an example.

Citation

Lifeng Yan, Zekun Yin, Hao Zhang, Zhan Zhao, Mingkai Wang, André Müller, Felix Kallenborn et al. "RabbitQCPlus 2.0: More efficient and versatile quality control for sequencing data." Methods 216 (2023): 39-50.

Lifeng Yan, Zekun Yin, Hao Zhang, Zhan Zhao, Mingkai Wang, André Müller, Robin Kobus, Yanjie Wei, Beifang Niu, Bertil Schmidt, Weiguo Liu. "RabbitQCPlus: More Efficient Quality Control for Sequencing Data," 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, NV, USA, 2022, pp. 619-626, doi: 10.1109/BIBM55620.2022.9995332.

Zekun Yin, Hao Zhang, Meiyang Liu, Wen Zhang, Honglei Song, Haidong Lan, Yanjie Wei, Beifang Niu, Bertil Schmidt, Weiguo Liu, RabbitQC: High-speed scalable quality control for sequencing data, Bioinformatics, , btaa719, https://doi.org/10.1093/bioinformatics/btaa719

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
common		common
dependencies/thrust-1.17.0		dependencies/thrust-1.17.0
include		include
lib		lib
obj		obj
pics		pics
src		src
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
example.html		example.html
use_igzip		use_igzip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RabbitQCPlus 2.0 Features

Installation

Bioconda

From source

Dependancy

Compilation

Simple usage

For next generation sequencing data

For third generation sequencing data

Options

Performance results

Plain FASTQ files

Gzip-compressed FASTQ files

With error correction module on

Visual output

Citation

About

Releases 14

Packages

Contributors 3

Languages

License

RabbitBio/RabbitQCPlus

Folders and files

Latest commit

History

Repository files navigation

RabbitQCPlus 2.0 Features

Installation

Bioconda

From source

Dependancy

Compilation

Simple usage

For next generation sequencing data

For third generation sequencing data

Options

Performance results

Plain FASTQ files

Gzip-compressed FASTQ files

With error correction module on

Visual output

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 14

Packages 0

Contributors 3

Languages

Packages