viterbi

C++ code for finding shared haplotypes between individuals under the Li and Stephens Model (2003) using the Viterbi algorithm. The program is designed to find long shared haplotypes between a given individual and larger reference panel.

Usage

To compile, you should just be able to type 'make'. This will create an executable called viterbi.

Usage is as follows:

./viterbi --gzvcf <vcf_1> --gzvcf <vcf_2> --chr <chr> --test-indv <indv_ID> --error <error_rate> > output_file.txt

Parameters
- --gzvcf : Specify (phased) input VCF file. Can be used more than once, which merged the VCFs internally at biallelic sites with matching alleles. The files must not include overlapping individuals.
- --chr : Specify the chromosome on which to run the algorithm.
- --test-indv : Specify the individual for which you want to find the shared haplotypes. (Can be used more than once to specify multiple individuals).
- --error-rate : Specify the probability of an allele mismatch between shared haplotypes (default: 0.0).
- --recomb : Specify the assumed recombination rate in cM/Mb (default: 1.0 cM/Mb).
- --Ne : Specify the effective population size (default: 10000).
- --out : Specify prefix of output files.
Output

The output file has the following columns:

HAP : The haplotype in the test individual. The haplotype is suffixed with _1 for the individual's first haplotype, and _2 for the second.
COPY : The haplotype being copied in the reference panel.
CHR : Chromosome.
START : Haplotype start position.
END : Haplotype end position.
N_SNPS : Number of SNPs in the haplotype.
MIN_ALLELE_COUNT : The minimum allele count of alleles present on the haplotype.
MIN_ALLELE_POS : The position of the allele with the minimum sample count.
EXACT_HAP_COUNT : The count of the current haplotype in the whole sample.
HAP_COUNT_1_PRCT_MISMATCH : The count of the current haplotype in the whole sample, allowing a 1% mismatch.

Notes

This program can use a fair amount of resources. For chr22 on 1000 Genomes Phase 3 data, it needs a few Gb of RAM. For chr1, it needs ~40Gb. A run on chr22 should take about 4 hours on a 3.4Ghz machine, whereas chr1 will take closer to 24 hours.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Makefile		Makefile
README.md		README.md
bcf_entry.cpp		bcf_entry.cpp
bcf_entry.h		bcf_entry.h
bcf_entry_setters.cpp		bcf_entry_setters.cpp
bcf_file.cpp		bcf_file.cpp
bcf_file.h		bcf_file.h
bgzf.c		bgzf.c
bgzf.h		bgzf.h
entry.cpp		entry.cpp
entry.h		entry.h
entry_filters.cpp		entry_filters.cpp
entry_getters.cpp		entry_getters.cpp
entry_setters.cpp		entry_setters.cpp
header.cpp		header.cpp
header.h		header.h
khash.h		khash.h
knetfile.c		knetfile.c
knetfile.h		knetfile.h
output_log.cpp		output_log.cpp
output_log.h		output_log.h
parameters.cpp		parameters.cpp
parameters.h		parameters.h
variant_file.cpp		variant_file.cpp
variant_file.h		variant_file.h
variant_file_filters.cpp		variant_file_filters.cpp
vcf_entry.cpp		vcf_entry.cpp
vcf_entry.h		vcf_entry.h
vcf_entry_setters.cpp		vcf_entry_setters.cpp
vcf_file.cpp		vcf_file.cpp
vcf_file.h		vcf_file.h
viterbi.cpp		viterbi.cpp
viterbi.h		viterbi.h

auton1/viterbi

Folders and files

Latest commit

History

Repository files navigation

viterbi

Usage

Notes

About

Resources

Stars

Watchers

Forks

Languages