GRASPER: Genome Rearrangement Analysis using Short Paired-End Reads
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Heewook Lee


GRASPER (Genome Rearrangement Analysis using Short Paired-End Reads) is a de novo structural variation (SV) calling software that is capable of detecting repetitive SVs. 

It uses (BLAST to A-Bruijn program) to construct A-Bruijn graphs of a given refernece genome to capture approximate repeats (e.g. 95% sequence similarity or higher), then SVs are detected on the graphs. 

GRASPER requires a reference genome sequence in a FASTA formatted file along with a Illumina paired-end sequencing data of a sample genome.

Currently, it supports 

1) Duplicative transposition
2) Deletion of non-repetitive region
3) Deletion of repetitive region
4) Deletion of non-repetitive region bounded by repeats (via homologous recombination)
5) Inversion
6) Tandem-duplication

Unsupported events are still reported in the form of breakpoints. GRASPER first calls breakpoints then assign SV events based on the well known paired SV signatures along with read-depth information. Any breakpoint event without a SV event assignment is reported separately.

To build and run GRASPER, the following are required:

- JDK 1.6 or higher

- Unix-like OS (Linux, Mac OS X, ... )

- Legacy BLAST (available from , more information on We used version 2.2.25 which can be downloaded from ( )

- Burrows-Wheeler Aligner by Heng Li (version 0.7.9 or higher)

- BLAST to A-Bruijn graph package (available from )

- Illumina or Illumina-like paired-end reads (whole-genome sequencing)

- a reference genome sequence

- as of v0.1.1, .medMAD file is generated AUTOMATICALLY from RepGraph (v 0.1.1). This file contains meadian and Median Absolute Deviation (MAD) values for library insert size. 1 SD ~ 1.4826 MAD ( under normal distribution. This file contains single line of 2 values delimited by a tab.


After downloading the GRASPER source distribution and unpacking it, change into the top-level directory:

> cd grasper

Then, compile and create .jar files

> make

This will create a new directory "bin" under the grasper directory with the following jar file:


       Config file
Configuration file contains parameters that GRASPER/RepGraph/BLAST/bwa need.

An example configuration file can be found in "test_data" directory.

      How to run

Although grasper can run as a stand-along program, it first needs A-Bruijn graph representation of reference genome which is generated by RepGraph package as well as SAM formatted alignment of paired-end reads. For this reason, is provided to tie all these dependencies together in a single script. 

Here are the list of commands when running on test_data

1. Move into test_case directory under GRASPER directory

2. Indexing for BLAST and bwa (ONLY needs to be run once for a reference genome)
> ../ I example_config.txt

3. Run pair-wise BLASTN on a given reference genome and construct A-Bruijn graphs (ONLY needs to be run once for a reference genome)
> ../ G example_config.txt

4. Align via BWA
> ../ A example_config.txt 20Insertions_per_element_1TH_pIRS_20X_11_90_470_1.fq.gz 20Insertions_per_element_1TH_pIRS_20X_11_90_470_2.fq.gz

5.Depth Serialization, mid-sroting, discordant pair removal, SV detection
> ./ DS example_config.txt

Note that command ADS can be run separately or combined all together. run without any parameters to see more explanation.
> ./

Screen dump of running on test_data can be found on test_data/test_data.screendump

*.thread : A-Bruijn graphs threading information

*.depth : .depth file contains the serialization of depth arrays. 

*.discordant.midsorted : midpoint-sorted SAM file containing only the discordant mappings

*.SV : this file contains the SV calls from GRASPER

       .SV file
2 breakpoint events (TRANSPOSITION or INVERSION) have 23 columns and 1 breakpoint events only have the first 13 columns

*** COLUMNS ***
Column 1 : Event  ( (I) means inverted )
Column 2 : event classifier (internal purpose)

Column 3/5/20/22 : These columns indicate #reads in cluster
Column 4/6/21/23 : These columns indicate # of instances these clusters can map on linear reference. Clusters on graph that are on repetitive paths will have numbers > 1 to indicate their multiplicities.

Column ( 7-8-9 / 10-11-12 / 14-15-16 / 17-18-19 ) : One triplet indicates 5'boundary-3'boundary-ClusteringDirection of a cluster of reads

Column 3-4-7-8-9 indicates single cluster (meaning the boundary and direction is described by columns 7-8-9 and #reads and multiplicity information of this clusters are in columns 3-4.)
Columm 5-6-10-11-12 indicates single cluster.
Column 20-21-14-15-16 indicates single cluster.
Column 22-23-17-18-19 indicates single cluster.

Clusters that cannot be assigned to a specific event are appended at the end under "#		UNASSIGNED CLUSTERS" section.

**** Event Boundaries ***
1) Deletion: Deletion boundaries are roughly defined by [column8, column10] (Direction of clusters : --> <--)

2) Inversion: Inversion boundaries are roughly defined by [column8/column10 , column15/column16] 

3) Transposition: Segment that is being transposed is roughly defined by [column3, column7] (<--- --->) and it's being transposed to the target location, roughly around column15/column16 (---> <---). A midpoint of column 15 and column16 is probably a resonable guess.

4) Tandem duplication: Segment that is being tandemly duplicated is roughly defined by [column7, column11] (Direction: <--- --->)