Skip to content
forked from OpenGene/fastp

A Fast and All-in-one Preprocessor for FastQ files

License

Notifications You must be signed in to change notification settings

BioInfoTools/fastp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fastp

This tool is designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance. It has following features:

  • filter out bad reads (too low quality, too short, or too many N...)
  • trim all reads in front and tail
  • cut low quality bases for per read in its 5' and 3' by evaluating the mean quality from a sliding window (like Trimmomatic but faster).
  • cut adapters (for paired end data it's automatic, for single end data adapter sequence should be provided).
  • report JSON format result for further interpreting.
  • visualize quality control and filtering results on a single HTML page (like FASTQC but faster and more informative).
  • split the output to multiple files (0001.R1.gz, 0002.R1.gz...) to support parallel processing.
  • ...

This tool is being intensively developed, and new features can be implemented soon if they are considered useful. If you have any additional requirement for fastp, please file an issue:https://github.com/OpenGene/fastp/issues/new

examples of report

fastp creates reports in both HTML and JSON format.

Get fastp

Download

Get latest

# download by http
https://github.com/OpenGene/fastp/archive/master.zip

# or clone by git
git clone https://github.com/OpenGene/fastp.git

Get the stable releases
https://github.com/OpenGene/fastp/releases/latest

Build

fastp only depends on libz, which is always available on Linux or Mac systems. If your system has no libz, install it first.

cd fastp
make

Install

After build is done, run

sudo make install

usage

usage: fastp -i <in1> -o <out1> [-I <in1> -O <out2>] [options...]
options:
  # I/O options
  -i, --in1                          read1 input file name (string)
  -o, --out1                         read1 output file name (string [=])
  -I, --in2                          read2 input file name (string [=])
  -O, --out2                         read2 output file name (string [=])
  -6, --phred64                      indicates the input is using phred64 scoring (it'll be converted to phred33, so the output will still be phred33)
  -z, --compression                  compression level for gzip output (1 ~ 9). 1 is fastest, 9 is smallest, default is 2. (int [=2])
  
  # adapter trimming options
  -A, --disable_adapter_trimming     adapter trimming is enabled by default. If this option is specified, adapter trimming is disabled
  -a, --adapter_sequence             for single end data, adapter sequence is required for adapter trimming (string [=])
  
  # global trimming options
  -f, --trim_front1                  trimming how many bases in front for read1, default is 0 (int [=0])
  -t, --trim_tail1                   trimming how many bases in tail for read1, default is 0 (int [=0])
  -F, --trim_front2                  trimming how many bases in front for read2. If it's not specified, it will follow read1's settings (int [=0])
  -T, --trim_tail2                   trimming how many bases in tail for read2. If it's not specified, it will follow read1's settings (int [=0])
  
  # per read cutting by quality options
  -5, --cut_by_quality5              enable per read cutting by quality in front (5'), default is disabled (WARNING: this will interfere deduplication for both PE/SE data)
  -3, --cut_by_quality3              enable per read cutting by quality in tail (3'), default is disabled (WARNING: this will interfere deduplication for SE data)
  -W, --cut_window_size              the size of the sliding window for sliding window trimming, default is 4 (int [=4])
  -M, --cut_mean_quality             the bases in the sliding window with mean quality below cutting_quality will be cut, default is Q20 (int [=20])
  
  # quality filtering options
  -Q, --disable_quality_filtering    quality filtering is enabled by default. If this option is specified, quality filtering is disabled
  -q, --qualified_quality_phred      the quality value that a base is qualified. Default 15 means phred quality >=Q15 is qualified. (int [=15])
  -u, --unqualified_percent_limit    how many percents of bases are allowed to be unqualified (0~100). Default 40 means 40% (int [=40])
  -n, --n_base_limit                 if one read's number of N base is >n_base_limit, then this read/pair is discarded. Default is 5 (int [=5])
  
  # length filtering options
  -L, --disable_length_filtering     length filtering is enabled by default. If this option is specified, length filtering is disabled
  -l, --length_required              reads shorter than length_required will be discarded. (int [=30])
  
  # reporting options
  -j, --json                         the json format report file name (string [=fastp.json])
  -h, --html                         the html format report file name (string [=fastp.html])
  
  # thread options
  -w, --thread                       worker thread number, default is 3 (int [=3])
  
  # output splitting options
  -s, --split                        if this option is specified, the output will be split to multiple (--split) files (i.e. 0001.out.fq, 0002.out.fq...).  (int [=0])
  -d, --split_prefix_digits          the digits for the slice number padding (1~10), default is 4, so the filename will be padded as 0001.xxx, 0 to disable padding (int [=4])
  
  # help
  -?, --help                         print this message

About

A Fast and All-in-one Preprocessor for FastQ files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 52.5%
  • C++ 47.3%
  • Makefile 0.2%