Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Support for gVCF merging and genotyping (e.g. CombineGVCFs and GenotypeGVCFs) #1312
Are you planning to support gVCF merging and genotyping on Spark / Adam?
As far as I know the only way to variant call 100K samples is trough creating gVCF files per sample and subsequent gVCF merging and genotyping.
The most well known / production ready implementation of this is from the Broad in GATK:
For variant calling of the the 100K+ samples in Exac/GnomAD the first merge step was replaced with GenomicsDB from intel. As I understand it GenomicsDB efficiently stores per sample gVCF tracks and can then efficiently stream merged VCF into GenotypeGVCFs
Genomics DB is based on Intel TileDB
Something similar to CombineGVCFs/GenotypeGVCFs/GenomicsDB is being developed by DNAnexus that also supports on demand joint genotyping from Freebayes gVCF:
Are you also planning scalable gVCF storage and on demand gVCF merge and joint genotyping on top of Spark / Adam?
added a commit
Oct 15, 2017
Is this helpful?