Skip to content

ALSER-Lab/fastrgz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fastrgz

FASTR logo, CanvaAI-generated

fastrgz is a fast and efficient gzip compression tool for sequencing data (FASTQ).

fastrgz reads.fastq makes reads.fastq.gz, and fastrgz -d reads.fastq.gz restores reads.fastq.

fastrgz uses the FASTR (version 2.0.0) as its transform stage. Instead of deflating the raw FASTQ text, each worker thread first converts its chunk to FASTR — a compact, scalar/integer representation of the bases and quality scores — and then gzip-deflates that chunk. An order-preserving writer merges the compressed blocks into one standard gzip stream. Because the FASTR transform shrinks the data before deflate runs, the resulting .gz is smaller than gzipping the FASTQ directly, while remaining readable by ordinary gunzip/zcat (the bytes inflate back to FASTR, and fastrgz reconstructs the FASTQ from there).

Highlights

  • Maintains gzip/pigz flags — compress and decompress with gzip-style flags: -d, -c, -k, -f, -o.
  • Multithreaded and super fast — You can observe up to 10x compression speedup (compared to pigz using the same number of threads) on many tested SRA's FASTQ files. The tool leverages quality score ranges. The narrower the range of your FASTQ's quality scores, the faster it runs. But even with the full range, you still see a large speedup!
  • Smaller than plain gzip — You will always observe a drop (up to 24%) in the compressed file size.
  • Lossless — It is currently 100% lossless (supporting up to 94 arbitrary scores), but we’ve also built in lossy binning algorithms for even higher speedups and space savings.

Building

You need a C compiler (gcc), make, and the zlib development headers (zlib.h / -lz). On Debian/Ubuntu: sudo apt install build-essential zlib1g-dev.

From the project root (where the Makefile and fastrgz.c live):

git clone https://github.com/ALSER-Lab/fastrgz.git
cd fastrgz
make

Portability note. The build tunes for a modern x86-64 baseline (-march=x86-64-v3). If you run the binary on an older CPU and see Illegal instruction, rebuild on that machine or lower ARCHFLAGS in the Makefile (e.g. -march=x86-64-v2).


Usage

./fastrgz                                # print a short usage summary
fastrgz [options] input.fastq            #compress   -> input.fastq.gz
fastrgz -d [options] input.fastq.gz      #decompress -> input.fastq
fastrgz -dc input.fastq.gz               #decompress to stdout

Examples

Compress

# Simplest: reads.fastq -> reads.fastq.gz (input removed on success)
./fastrgz reads.fastq

# Keep the original, and use all cores explicitly
./fastrgz -k --threads 8 reads.fastq

# Illumina data, default mode 2, maximum gzip level, keep input
./fastrgz -k --seq_type illumina --level 9 reads.fastq

# ONT data, choose an explicit output name, overwrite if it exists
./fastrgz -k -f --seq_type ont -o ont_reads.fastq.gz ont_reads.fastq

# Compress to stdout (e.g. to pipe or redirect yourself)
./fastrgz -c reads.fastq > reads.fastq.gz

Decompress

# reads.fastq.gz -> reads.fastq (input removed on success)
./fastrgz -d reads.fastq.gz

# Keep the .gz and write to an explicit path
./fastrgz -d -k -o restored.fastq reads.fastq.gz

# Decompress straight to stdout (does not touch the input)
./fastrgz -dc reads.fastq.gz | head

# Decompressing mode-3 data needs the external headers file
./fastrgz -d --headers_file reads_headers.txt -o reads.fastq reads.fastq.gz

Citation:

If you use FASTR in your work, please cite:

Adrian Tkachenko, Sepehr Salem, Ayotomiwa Ezekiel Adeniyi, Zulal Bingol, Mohammed Nayeem Uddin, Akshat Prasanna, Alexander Zelikovsky, Serghei Mangul, Can Alkan and Mohammed Alser. "FASTR: Reimagining FASTQ via Compact Image-inspired Representation" bioRxiv (2026). link.

Below is bibtex format for citation.

About

fastrgz is a fast and efficient gzip compression tool for sequencing data (FASTQ)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors