Skip to content

Tutorial

c0deb0t edited this page Jan 30, 2018 · 10 revisions

How to run the compiled binaries (.jar):

java -jar FastQParse.jar

And then add commands right after, each separated by a single space.

Recipes:

These are sample use cases.

Demultiplex and Output .gz Files

java -jar FastQParse.jar -r /input/file.fastq.gz -o /output/dir/
-s /sample/file.txt -gz

Providing a sample file will automatically tell the program to demultiplex.

Demultiplex, Merge Paired-Ends, and Output .gz Files

java -jar FastQParse.jar -r /input/file1.fastq /input/file2.fastq -o /output/dir/
-s /sample/file.txt -gz -m

Filter Low Quality Reads, Trim 'N', Quality Trim, Remove Adapters, Demultiplex Paired-Ends, and Output .gz Files (in Parallel!)

java -jar FastQParse.jar -r /input/file1.fastq.gz /input/file2.fastq.gz -o /output/dir/
-s /sample/file.txt -gz -Q 20 -n -q 10 -a ATCG -z GTCA -P

The quality filter by average quality score threshold will be 20. 'N' will be trimmed with a default threshold of 50% (to specify a custom threshold, add a percentage as a decimal after). The 5' end of forwards reads and the 3' end of reversed reads will be trimmed by a quality score threshold of 10. The 5' adapter to remove for forwards reads is 'ATCG', and the 3' adapter to remove for reversed reads is 'GTCA'.

Demultiplex and Deduplicate UMI (keep highest quality read in case of duplicate)

java -jar FastQParse.jar -r /input/file.fastq -i /index/file.fastq -o /output/dir/
-s /sample/file.txt -dB

The random UMI length will be 12 by default.

Demultiplex, Similar to FASTX-Toolkit (without semi-global alignment)

java -jar FastQParse.jar -r /input/file.fastq -o /output/dir/
-s /sample/file.txt

Make sure that the sample file contains 'NA' for the enzyme name to not match for enzymes. FASTX-Toolkit counts 'N' (undetermined bp) as a mismatch, and its semi-global alignment is different from FastQParse's.

Demultiplex, Similar to GBSX (without adapter trimming)

java -jar FastQParse.jar -r /input/file.fastq -o /output/dir/
-s /sample/file.txt -S

GBSX counts 'N' (undetermined bp) as a mismatch and removes reads that matches more than one barcodes. GBSX has a built in adapter trimming feature that needs to be disabled before comparing with FastQParse using the commands above.

Quality Trim (Similar to Cutadapt)

java -jar FastQParse.jar -r /input/file.fastq -o /output/dir/ -q 20
--alt-quality-trim

Quality trim using a threshold of 20. Since there are two algorithms for quality trimming and the second one is more similar to Cutadapt's quality trimming, --alt-quality-trim needs to be specified.

Cut 3' Adapters, Edit Based

java -jar FastQParse.jar -r /input/file.fastq -o /output/dir/ -A ATCG

Cut adapters from the 3' end of each read, with 'ATCG' being the adapter sequence.

Cut 3' Adapters, Probability Based (Similar to Scythe, using filter command)

java -jar FastQParse.jar -r /input/file.fastq -o /output/dir/ -A ATCG -pA 0.7

Cut adapters from the 3' end of each read, with 'ATCG' being the adapter sequence. The prior probability of a read having an adapter is 70%.

Sample File Example 1

SampleName1  ATCG      PstI
SampleName2  GTCA      PstI
SampleName3  ATCGATCG  PstI

Column names, from left to right: sample name, sample barcode, enzyme name.

Sample File Example 2

SampleName1  ATCG      PstI  PstI  GTCA
SampleName2  GTCA      PstI  PstI  ATCG
SampleName3  ATCGATCG  PstI  PstI  GTCAGTCA

Column names, from left to right: sample name, forwards read barcode, forwards read enzyme name, reversed read enzyme name, reversed read barcode. To actually make the program match reversed enzymes and barcodes, specify -R as one of the commands. use the -E command to specify more enzymes if needed.

Other Information:

  • Removed reads will be put into the undetermined file(s). The only exception to this is deduplicated reads. They are deleted unless --save-dup is specified.
  • All percentages should be decimals. (50% would be 0.5, 100% would be 1)