Skip to content

01 Overview

Tim Dunn edited this page Feb 10, 2024 · 4 revisions

vcfdist evaluates the correctness of a set of phased variant calls (query VCF) relative to a set of phased ground truth variant calls (truth VCF) for a subset (regions BED) of the desired genome (reference FASTA). vcfdist was designed to evaluate human genomes, but should work on other monoploid and diploid species. It can evaluate variants of any type, including STRs (simple tandem repeats) and CNVs (copy number variants), but vcfdist classifies variants into SNPs (single nucleotide polymorphisms), INDELS (insertions and deletions), and SVs (structural variants) during evaluation. Evaluating variants larger than 10,000 bases is not recommended at the moment, as it will require large amounts of memory (over 50GB RAM). Below is a diagrammatic overview of vcfdist. Inputs are shown in red, internal steps in yellow, and optional steps in gray.

overview

Index

Repository Structure

Folder Description
src contains all C++ source code for vcfdist
demo contains a simple self-contained vcfdist example script, including inputs and expected output
analysis contains analysis scripts for "vcfdist: accurately benchmarking phased small variant calls"
analysis-v2 contains analysis scripts for "Jointly benchmarking phased small and structural variant calls with vcfdist"
docs contains old wiki documentation