-
Notifications
You must be signed in to change notification settings - Fork 0
Tutorial
java -jar FastQParse.jar
And then add commands right after, each separated by a single space.
These are sample use cases.
java -jar FastQParse.jar -r /input/file.fastq.gz -o /output/dir/
-s /sample/file.txt -gz
Providing a sample file will automatically tell the program to demultiplex.
java -jar FastQParse.jar -r /input/file1.fastq /input/file2.fastq -o /output/dir/
-s /sample/file.txt -gz -m
Filter Low Quality Reads, Trim 'N', Quality Trim, Remove Adapters, Demultiplex Paired-Ends, and Output .gz Files (in Parallel!)
java -jar FastQParse.jar -r /input/file1.fastq.gz /input/file2.fastq.gz -o /output/dir/
-s /sample/file.txt -gz -Q 20 -n -q 10 -a ATCG -z GTCA -P
The quality filter by average quality score threshold will be 20. 'N' will be trimmed with a default threshold of 50% (to specify a custom threshold, add a percentage as a decimal after). The 5' end of forwards reads and the 3' end of reversed reads will be trimmed by a quality score threshold of 10. The 5' adapter to remove for forwards reads is 'ATCG', and the 3' adapter to remove for reversed reads is 'GTCA'.
java -jar FastQParse.jar -r /input/file.fastq -i /index/file.fastq -o /output/dir/
-s /sample/file.txt -dB
The random UMI length will be 12 by default.
java -jar FastQParse.jar -r /input/file.fastq -o /output/dir/
-s /sample/file.txt
Make sure that the sample file contains 'NA' for the enzyme name to not match for enzymes. FASTX-Toolkit counts 'N' (undetermined bp) as a mismatch, and its semi-global alignment is different from FastQParse's.
java -jar FastQParse.jar -r /input/file.fastq -o /output/dir/
-s /sample/file.txt -S
GBSX counts 'N' (undetermined bp) as a mismatch and removes reads that matches more than one barcodes. GBSX has a built in adapter trimming feature that needs to be disabled before comparing with FastQParse using the commands above.
java -jar FastQParse.jar -r /input/file.fastq -o /output/dir/ -q 20
--alt-quality-trim
Quality trim using a threshold of 20. Since there are two algorithms for quality trimming and the second one is more similar to Cutadapt's quality trimming, --alt-quality-trim
needs to be specified.
java -jar FastQParse.jar -r /input/file.fastq -o /output/dir/ -A ATCG
Cut adapters from the 3' end of each read, with 'ATCG' being the adapter sequence.
java -jar FastQParse.jar -r /input/file.fastq -o /output/dir/ -A ATCG -pA 0.7
Cut adapters from the 3' end of each read, with 'ATCG' being the adapter sequence. The prior probability of a read having an adapter is 70%.
SampleName1 ATCG PstI
SampleName2 GTCA PstI
SampleName3 ATCGATCG PstI
Column names, from left to right: sample name, sample barcode, enzyme name.
SampleName1 ATCG PstI PstI GTCA
SampleName2 GTCA PstI PstI ATCG
SampleName3 ATCGATCG PstI PstI GTCAGTCA
Column names, from left to right: sample name, forwards read barcode, forwards read enzyme name, reversed read enzyme name, reversed read barcode. To actually make the program match reversed enzymes and barcodes, specify -R
as one of the commands. use the -E
command to specify more enzymes if needed.
- Removed reads will be put into the undetermined file(s). The only exception to this is deduplicated reads. They are deleted unless
--save-dup
is specified. - All percentages should be decimals. (50% would be 0.5, 100% would be 1)
To download the binaries (.jar), go to Download.
For a tutorial on how to use this program, go to Tutorials.
For a list of commands, go to Commands.
For a description of the algorithms used, go to Algorithms and Implementation Details.
Full source code available in the Github repo.