LAVA: Lightweight Assignment of Variant Alleles
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
include
obj
src
.gitignore
LICENSE
Makefile
README.md

README.md

LAVA: Lightweight Assignment of Variant Alleles

LAVA is an NGS-based computational SNP array. LAVA is able to call with high accuracy the vast majority of SNPs in dbSNP and Affymetrix’s Genome-Wide Human SNP Array 6.0, while performing 4-7 times faster than a standard NGS genotyping pipeline. As such, it is a flexible and scalable replacement for SNP arrays, for which the set of variants assayed both can be modified in silico without having to redesign an array and is not bounded in number by the physical limits of a chip.

Usage

Preprocessing
lava dict <input FASTA> <input SNP list> <output ref dict> <output SNP dict>

The inputted FASTA file is the reference sequence. The inputted SNP list should be in UCSC's txt-based format.

Processing
lava lava <input ref dict> <input SNP dict> <input FASTQ> <chrlens file> <output file>

The "chrlens file" is generated in the preprocessing stage, and should have a name of ref_file.fa.chrlens where ref_file.fa is the reference sequence FASTA file.

Requirements

  • ~60 gigabytes of RAM for typical reference genomes
  • GCC 4.8.4 or later (not tested on earlier versions)
  • make

TODO

  • Make error rate and average coverage parameters user-specified. For now they are constants in lava.h.
  • Multithreading