LAVA: Lightweight Assignment of Variant Alleles
LAVA is an NGS-based computational SNP array. LAVA is able to call with high accuracy the vast majority of SNPs in dbSNP and Affymetrix’s Genome-Wide Human SNP Array 6.0, while performing 4-7 times faster than a standard NGS genotyping pipeline. As such, it is a flexible and scalable replacement for SNP arrays, for which the set of variants assayed both can be modified in silico without having to redesign an array and is not bounded in number by the physical limits of a chip.
lava dict <input FASTA> <input SNP list> <output ref dict> <output SNP dict>
The inputted FASTA file is the reference sequence. The inputted SNP list should be in UCSC's txt-based format.
lava lava <input ref dict> <input SNP dict> <input FASTQ> <chrlens file> <output file>
The "chrlens file" is generated in the preprocessing stage, and should have a name of
ref_file.fa is the reference sequence FASTA file.
- ~60 gigabytes of RAM for typical reference genomes
- GCC 4.8.4 or later (not tested on earlier versions)
- Make error rate and average coverage parameters user-specified. For now they are constants in