The GBS SNP Calling Reference Optional Pipeline (GBS-SNP-CROP) is executed via a sequence of seven Perl scripts that integrate custom parsing and filtering procedures with well-known, vetted bioinformatic tools, giving the user full access to all intermediate files. By employing a novel strategy of variant (SNPs and indels) calling based on the correspondence of within-individual to across-population patterns of polymorphism, the pipeline is able to identify and distinguish high-confidence variants from both sequencing and PCR errors, whether or not a reference genome is available. In the latter case, the pipeline adopts a clustering strategy to build a population-tailored "Mock Reference" using the same GBS data for downstream calling and genotyping. Designed for libraries of either paired-end (PE) or single-end (SE) reads of arbitrary lengths, GBS-SNP-CROP maximizes data usage by eliminating unnecessary data culling due to imposed length uniformity requirements. GBS-SNP-CROP is a complete bioinformatics pipeline developed primarily to support curation, research, and breeding programs wishing to utilize GBS for the cost-effective genome-wide characterization of plant genetic resources.
Stage 1. Process the raw GBS data
- Step 1: Parse the raw reads
- Step 2: Trim based on quality and adaptors
- Step 3: Demultiplex
Stage 2. Build the Mock Reference
- Step 4: Cluster reads and assemble the Mock Reference
Stage 3. Map the processed reads and generate standardized alignment files
- Step 5: Align with BWA-mem and process with SAMtools
- Step 6: Parse mpileup outputs and produce the variants discovery matrix
Stage 4. Call Variants and Genotypes
- Step 7: Filter variants and call genotypes
PLEASE NOTE: GBS-SNP-CROP is an intentionally modular and flexible pipeline. If your data are already demultiplexed and filtered, simply skip Stage 1 and enter the pipeline at Stage 2. If you have a reference genome and no need for a Mock Reference, simply skip Stage 2 and go directly to Stage 3. Refer to the User Manual for input file naming conventions for each Step.
Below is a schematic of the workflow, with inputs and outputs (boxes) indicated for each Step (arrows).
v.4.1: Released on 10/6/2019
v.4.0: Released on 10/22/2018
v.3.0: Released on 2/8/2018
v.2.0: Released on 2/22/2017
v.1.1: Released on 3/11/2016
v.1.0: Released on 1/12/2016
Begin by carefully going through the GBS-SNP-CROP User manual. Before posting a question or starting a discussion, please first refer to the FAQ page. Also, please check your barcode ID file for empty characters or blank spaces and verify that it was saved as a tab-delimited file. If you're still facing an issue or have suggestions for improving this tool, kindly submit your question or comment to our Google groups page.
- Java 7 or higher - The latest version of GBS-SNP-CROP (v.4.1) was tested using Java 8 (update 221)
- Trimmomatic Latest version tested using v.0.39 (Bolger et al., 2014)
- PEAR Latest version tested with v.0.9.11 (Zhang et al., 2014)
- VSEARCH Latest version tested with v2.13.7 (Rognes et al., 2016)
- BWA aligner Latest version tested with v.0.7.12 (Li & Durbin, 2009)
- SAMTools Latest version tested with v.1.7 (Li et al., 2009)
- The following five CPAN modules also need to be installed: Getopt::Long, IO::Zlib, List::Util, List::MoreUtils, Parallel::ForkManager