Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: GenotypeGVCFs --max_genotype_count needs to handle large ploidy #2946

Open
droazen opened this issue Jun 5, 2017 · 0 comments
Open
Assignees

Comments

@droazen
Copy link
Collaborator

droazen commented Jun 5, 2017

@sooheelee commented on Fri Feb 17 2017

A fix was implemented for HaplotypeCaller but not ported to GenotypeGVCFs nor CombineGVCFs nor CombineVariants. Although user is using v3.7-0-gcfedb67, my understanding is that these types of fixes will only be worked on in GATK4.


Test data submitted by user can be found at

/humgen/gsa-scr1/pub/incoming/bugrep_jgeibel_1.tar.gz

This includes chicken reference files.


The command the user uses to generate the error is very long because we have many vcfs:

Program Args: -T GenotypeGVCFs -R /usr/users/geibel/chicken/chickenrefgen/galGal5_Dec2015/galGal5.fa --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72631_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72632_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72633_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72634_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72635_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72636_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72637_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72638_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72639_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72640_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72641_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72642_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72643_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72644_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72645_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72646_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72647_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72648_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72649_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72650_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72651_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72652_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72653_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72654_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_WL_72655_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72656_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72657_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72658_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72659_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72660_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72661_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72662_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72663_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72664_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72665_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72666_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72667_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72668_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72669_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72670_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72671_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72672_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72673_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72674_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72675_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72676_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72678_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72680_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72682_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/i_BL_72683_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_AB_0001_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_AR_0002_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_AS_0003_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_BA_0004_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_BK_0005_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_BH_0006_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_CG_0007_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_CS_0008_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_DG_0009_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_FG_0010_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_HO_0011_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_GG_0012_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_GS_0013_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_KW_0014_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_IT_0015_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_KA_0016_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_KG_0017_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_PA_0018_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_LE_0019_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_MA_0020_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_MR_0021_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_OH_0022_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_OR_0023_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_OM_0024_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_NH_0025_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_SB_0026_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_SE_0027_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_SH_0028_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_SA_0029_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_SN_0030_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_TO_0031_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_WT_0032_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_WY_0033_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_YO_0034_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_ZC_0035_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_GV_0036_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_CW_0037_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_DL_0038_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_KS_0039_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_OF_0040_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_WR_0041_chr26.raw.g.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/gVCF/pl_RI_0042_chr26.raw.g.vcf -nt 10 --max_genotype_count 1024 -L chr26 --dbsnp /usr/users/geibel/chicken/chickenrefgen/ENSEMBL_20170106/Gallus_gallus.updated.vcf -o /usr/users/geibel/chicken/pool_sequence_nov2016/data/rawVCF/IndandPool_chr26.raw.vcf 

The user actually includes a shell script in the test data bundle called JointGenotyping_chr26.sh.


The error shows:

##### ERROR --
##### ERROR stack trace 
java.lang.IllegalArgumentException: the number of genotypes is too large for ploidy 20 and allele 16: approx. 3247943160
	at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypeLikelihoodCalculators.getInstance(GenotypeLikelihoodCalculators.java:319)
	at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.mergeRefConfidenceGenotypes(ReferenceConfidenceVariantContextMerger.java:461)
	at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:164)
	at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:302)
	at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:135)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
	at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
	at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
	at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
	at org.broadinstitute.gatk.engine.executive.ShardTraverser.call(ShardTraverser.java:98)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: the number of genotypes is too large for ploidy 20 and allele 16: approx. 3247943160
##### ERROR ------------------------------------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants