Fqutils is a collection of command line tools for working with files in FASTQ format, and complements the Fastx Toolkit developed by Greg Hannon's lab at Cold Spring Harbor. Fqutils is intended to be useful as part of the early portions of postsequencing pipelines and quality assessment processes. Note that Fqutils correctly parses FASTQ as specified in the standard, and as a convenience produces Lazy-FASTQ output -- four line (header1-sequence-header2-quality) records. At present the package provides the following:
|fqu_are_paired||Checks if two files are associated pairwise: Does the first record appearing in FASTQ file_1 have the same identifier as the first record in FASTQ file_2? ..., etc.|
|fqu_cat||Like Unix cat, but for files in FASTQ format. Assuming well-formed input the output is guaranteed to be in lazy-FASTQ records. Especially useful when your own parsers assume this simpler format.|
|fqu_cull||Reports a subset of a FASTQ stream, based on a collection of identifiers.|
|fqu_degen||Replaces A, C, G, T with degenerate codes. Options include conversion to purines and pyrimidines (A, G -> R and C, T -> Y) and to a GC representation (C, G -> S and A, T -> W).|
|fqu_rselect||Randomly select a subset of records from FASTQ input.|
|fqu_splitq||Segregate FASTQ input into two output files, based on a collection of identifiers.|
|fqu_summary||Generate summary statistics for FASTQ input. These include total record (sequence) count, maximum sequence size, a estimation of the quality offset (33 or 64), basecounts per position, a GC content histogram, and per position quartiles of the qualities.|
|fqu_tidy||Write FASTQ output based on FASTQ input. Like fqu_cat, output is guaranteed to be in four line records. Provides a "--squash" option for reducing the size of each record, currently by ignoring the second header, and a "--tab-delimited" option if you happen to want the output stream in a four column format. These single line records can be particularly convenient for internal processing pipelines.|
|fqu_wc||Generates word (kmer) counts for FASTQ input.|
The package is written in Standard C, and ought to be easily portable to all Unix-like systems. If you find this isn't the case please let me know. Fqutils requires Jean-Loup Gailly's and Mark Adler's Zlib 1.2.4 or later, which in turn allows the tools to work transparently with text FASTQ as well as gzip-compressed FASTQ. It also makes use of Troy Hanson's convenient uthash macro collection, which is included.
### Chdir into the source directory, then configure-make ... $ ./configure --prefix=/path/to/install_dir $ make $ make check $ make install
The "make check" step ought to run rather quickly since at present it doesn't do any testing at all. If your Zlib installation is in a nonstandard location, provide guidance to the configure script using the standard environment variables; for instance:
$ LDFLAGS=-L/opt/zlib-1.2.4/lib ./configure --prefix=/path/to/install_dir
### fqu_tidy $ cat foo.fq | fqu_tidy $ gunzip -c foo.fq.gz | fqu_tidy $ cat foo.fq.gz | fqu_tidy
The third example above demonstrates that gzipped FASTQ streams can be handled directly without explicit uncompression.
### fqu_summary $ fqu_summary foo.fq.gz > foo.fqsum $ cat foo.fq | fqu_summary > foo.fqsum ### fqu_splitq $ cat foo.fq | fqu_splitq --out baz idfile1.txt idfile2.txt $ cat foo.fq.gz | fqu_splitq --out baz idfile1.txt idfile2.txt
Special thanks to:
- Laurent Guerby and the good folks at the GCC Compile Farm for providing an environment for testing across multiple systems.
See the LICENSE file in the distribution. Fqutils is distributed under the zlib/libpng license.
Note that Fqutils is provided by the copyright holders "as is" and without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.
Copyright 2012 by John A. Crow and the National Center for Genome Resources. All rights reserved.
If you find what appears to be a bug, send me the smallest possible test set you can pull together that demonstrates the problem.
If you find Fqutils useful, or have ideas for improvement, drop me a note or better yet open an issue at the Fqutils GitHub page.
John A. Crow, firstname.lastname@example.org.