Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with binomial calculation #2186

Open
chundruv opened this issue May 17, 2024 · 1 comment
Open

Problem with binomial calculation #2186

chundruv opened this issue May 17, 2024 · 1 comment

Comments

@chundruv
Copy link

Hi Petr,

I noticed that when calculating the binomial of allele depth there is an issue when there is a multiallelic variant, which is not a problem when normalising first.

I had one position which was set to missing with the setGT when done before splitting, and not after splitting. It was a 0/6 genotype, the LAD was 19,21, and the GQ was 15.
When you print out the binom before splitting it is giving a “.”, whereas after splitting it is giving the actual probabilities.
However, when I manually change it to 0/1, it gives the correct probability without splitting so it’s a problem with the non-0/1 genotypes, it doesn’t like that.

Also, is there an option when splitting multiallelics to set the non-ref calls on other alleles to missing? Just thinking that if you set it to homref it might not be the best for some analyses. Also, is there a way to update the DP and other format fields when splitting?

Thanks,
Karitk

@pd3
Copy link
Member

pd3 commented May 21, 2024

Any chance you could provide a concrete test case? Since you are quoting LAD, it is not clear what exactly the input data is like.

Also, is there an option when splitting multiallelics to set the non-ref calls on other alleles to missing? Just thinking that if you set it to homref it might not be the best for some analyses.

The command bcftools norm has the option

--multi-overlaps 0|.        Fill in the reference (0) or missing (.) allele when splitting multiallelics [0]

Also, is there a way to update the DP and other format fields when splitting?

The program does not attempt to recalculate DP from the split AD values, if that's what you mean. However, tere is the plugin +fill-tags which allows to do that

bcftools +fill-tags in.bcf -Ob -o out.bcf -- -t 'FORMAT/DP:1=int(smpl_sum(FORMAT/AD))'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants