Toolkit for automated and rapid discovery of structural variants
C Makefile
Switch branches/tags
Clone or download
Latest commit d3cb1be May 12, 2018
Permalink
Failed to load latest commit information.
docs read the docs initial files fix May 25, 2017
htslib @ 6bed35a version 0.2 with SONIC. Universe Reboot Edition Jun 26, 2017
sonic @ b64858e Updated sonic Feb 15, 2018
vh Added missing duplication files May 9, 2018
.gitmodules version 0.2 with SONIC. Universe Reboot Edition Jun 26, 2017
CREDITS Updated Makefile and Credits May 9, 2018
LICENSE Create LICENSE Aug 6, 2017
Makefile version number fix May 9, 2018
README.md Update README.md May 12, 2018
bamonly.c Updated with interspersed segmental duplications May 9, 2018
bamonly.h Updated with interspersed segmental duplications May 9, 2018
cmdline.c Updated with interspersed segmental duplications May 9, 2018
cmdline.h first release Apr 28, 2017
common.c Updated with interspersed segmental duplications May 9, 2018
common.h Updated with interspersed segmental duplications May 9, 2018
config.c Updated with interspersed segmental duplications May 9, 2018
config.h Updated with interspersed segmental duplications May 9, 2018
external.c Updated with interspersed segmental duplications May 9, 2018
external.h Various bug fixes Mar 14, 2018
free.c Updated with interspersed segmental duplications May 9, 2018
free.h Updated with interspersed segmental duplications May 9, 2018
mappings.c Updated with interspersed segmental duplications May 9, 2018
mappings.h Updated with interspersed segmental duplications May 9, 2018
processbam.c Updated with interspersed segmental duplications May 9, 2018
processbam.h Updated with interspersed segmental duplications May 9, 2018
processfq.c Updated with interspersed segmental duplications May 9, 2018
processfq.h Updated with interspersed segmental duplications May 9, 2018
processrefgen.c Updated with interspersed segmental duplications May 9, 2018
processrefgen.h Updated with interspersed segmental duplications May 9, 2018
quotes.h first release Apr 28, 2017
splitread.c Updated with interspersed segmental duplications May 9, 2018
splitread.h Updated with interspersed segmental duplications May 9, 2018
tardis.c Updated with interspersed segmental duplications May 9, 2018
tardis.h first release Apr 28, 2017
variants.c Updated with interspersed segmental duplications May 9, 2018
variants.h Updated with interspersed segmental duplications May 9, 2018

README.md

tardis

Toolkit for Automated and Rapid DIscovery of Structural variants

Soylev, A., Kockan, C., Hormozdiari, F., & Alkan, C. (2017). Toolkit for automated and rapid discovery of structural variants. Methods, 129, 3-7. https://doi.org/10.1016/j.ymeth.2017.05.030

Requirements

Fetching TARDIS

git clone https://github.com/BilkentCompGen/tardis.git --recursive

Compilation

Type:

make libs
make
cp tardis /path/to/your/favorite/binaries

SONIC file (annotations container)

SONIC files for some human genome reference versions are available at external repo: https://github.com/BilkentCompGen/sonic-prebuilt

  • human_g1k_v37.sonic: SONIC file for Human Reference Genome GRCh37 (1000 Genomes Project version)
    • Also download the reference genome at: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz.
  • ucsc_hg19.sonic: SONIC file for the human reference genome, UCSC version build hg19.
  • ucsc_hg38.sonic: SONIC file for the human reference genome build 38.

Make sure that the same reference was used to align the reads beforehand (BAM file) and to create the SONIC file. The SONIC files and the reference FASTA files linked above are compatible.

Building the SONIC file

Please refer to the SONIC development repository: https://github.com/calkan/sonic/

However, you can still build the SONIC file using TARDIS:

tardis --ref human_g1k_v37.fasta --make-sonic human_g1k_v37.sonic \
	--dups human_g1k_v37.segmental_duplications.bed \
	--gaps human_g1k_v37.assembly_gaps.bed \
	--reps human_g1k_v37.repeatmasker.out 

Running TARDIS - QUICK mode

tardis -i myinput.bam --ref human_g1k_v37.fasta --sonic human_g1k_v37.sonic  \
	--out myoutput

Additional parameters in SENSITIVE mode, helpful when debugging:

--skip-fastq --skip-sort --skip-remap

All parameters

--bamlist   [bamlist file] : A text file that lists input BAM files one file per line.
--input [BAM files]        : Input files in sorted and indexed BAM format. You can pass multiple BAMs using multiple --input parameters.
--out   [output prefix]    : Prefix for the output file names.
--ref   [reference genome] : Reference genome in FASTA format.
--sonic [sonic file]       : SONIC file that contains assembly annotations.
--mei   ["Alu:L1:SVA"]     : List of mobile element names.
--no-soft-clip             : Skip soft clip remapping.
--first-chr [chr_index]	   : Start running from a specific chromosome [index in ref]
--last-chr  [chr_index]	   : Run up to a specific chromosome [index in ref]

Additional parameters for sensitive mode:

--sensitive                : Sensitive mode that uses all map locations. Requires mrFAST remapping.
--skip-fastq               : Skip FASTQ dump for discordants. Use this only if you are regenerating the calls. Sensitive mode only.
--skip-sort                : Skip FASTQ sort for discordants. Use this only if you are regenerating the calls. Sensitive mode only. 
--skip-remap               : Skip FASTQ remapping for discordants. Use this only if you are regenerating the calls. Sensitive mode only
--threads                  : Number of threads for mrFAST to remap discordant reads.

Additional parameters to build SONIC file within TARDIS:

--make-sonic [sonic file]  : SONIC file that will contain the assembly annotations.
--sonic-info [\"string\"]  : SONIC information string to be used as the reference genome name.
--gaps  [gaps file]        : Assembly gap coordinates in BED3 format.
--dups  [dups file]        : Segmental duplication coordinates in BED3 format.
--reps  [reps file]        : RepeatMasker annotation coordinates in RepeatMasker format. See manual for details.

Information:
--version                  : Print version and exit.
--help                     : Print this help screen and exit.

Converting output VCF file to BED

awk '! /\#/' out.vcf |\
awk '{print $1"\t"($2-1)"\t"(substr($8,match($8,/SVLEN=[0-9]+/)+length("SVLEN="),RLENGTH-length("SVLEN="))+$2-1)}' > out.bed

Alternatively, use VCFlib: https://github.com/vcflib/vcflib