-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop the incorrect genoype attributes for decomposes SNP's. #1334
Comments
Neill; It might be worth suggesting to drop (or fix) these in vcfallelicprimatives itself. I know @zeeev and @ekg have been working on vcflib recently so might have ideas how to make it do the right thing. |
If you use I need to make this more clear, because it's come up many times. But, I can't make this the default behavior as it is not correct and I don't know how to automatically re-derive the fields after decomposition. |
@ekg Are you saying the after breaking a tri-allelic site the genotype likelihoods are no longer valid? I was just bit by the same behavior in vcffilter. Maybe a tool is order? |
Erik and Zev; |
So it is possible to filter out the alleles and maintain correct genotype On Thu, Apr 14, 2016 at 2:05 AM Brad Chapman notifications@github.com
|
@NeillGibson I am running into this same issue, but isn't |
Hi @nh13 I think vcfallelicprimitive itself trims the alternative alleles that are duplicate alt alleles after converting to primitives. My guess is that the Something like an alt allele that arises from sequencing noise and is put in the alt alleles field but was never assigned to a sample genotype. |
- Use --strict-vcf to avoid Integer/Float problems for genotype quality (samtools/bcftools#420) - Remove FMT/DPR since bcftools does not like outputs annotated as A for REF/ALT (#1334) - Removes needs for /dev/null stderr redirection since FreeBayes includes contig lines. - Fixes gVCF reference allele problems. Thanks to @lijiayong
Neill and Nils; |
Please open and update if it is still a issue. |
Hi,
Would it be possible to drop the incorrect genotype attributes for decomposed variants?
The Freebayes variant calling pipe currently decomposes longer variants to SNPs/indels if possible and keeps all genotype attributes.
https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/variation/freebayes.py#L128
The total set of genotype attributes kept is
Of these genotype attributes the following can be corrupt for decomposed multi-allelic variants.
They are corrupt because the number of values for these attributes doesn't correspond anymore with the number of alleles in the VCF ALT colum.
This causes a BCFTools assertions to fail that is used for some subset operations like for example
samtools/bcftools#404
This could be fixed by dropping the PL, AO and QA attributes for the all or just the decomposed variants.
This would leave the following attributes for the decomposed SNPs.
I tried dropping the PL,AO and QA attributes for the decomposed variants and the resulting VCF file still seems to be valid. I did this by first splitting the VCF file into decomposed and non-decomposed variants.
I thought the following command would just remove the possible corrupt GT attributes from the decomposed variants and also output the non-decomposed variants unchanged.
It however filters all the non-decomposed variants / outputs only variants with the DECOMPOSED flag set.
Maybe another tool could be added that only removes the possible corrupt genotype attributes for the decomposed variants?
Or I could open a ticket at vcflib to ask for an option "--keep-minimal-geno" ?
Do you have any ideas on how to handle this?
The text was updated successfully, but these errors were encountered: