Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools mpileup + call seemingly confident in allele only present in a few reads? #2185

Open
emos8710 opened this issue May 17, 2024 · 3 comments

Comments

@emos8710
Copy link

Hi!

I have an issue that is puzzling me. I have two salmonella samples that I'm comparing and expect them to be highly similar if not identical. I have mapped them with bowtie2 (three different versions as I was investigating if it was the mapping that was the issue) and created a mpileup and called variants with bcftools v 1.14 and 1.17 (no difference in the result)

bcftools call -m -v -O v --ploidy 1

I'm getting some variant calls that I don't understand. Here is the best example. There are three copies of each sample as I was checking if there was any difference between bowtie2 versions.

NC_003197.2	1349066	.	C	T	720.935	.	DP=815;VDB=0.00653457;SGB=-21.0274;RPBZ=-12.8027;MQBZ=-8.78292;MQSBZ=4.75308;BQBZ=6.99271;SCBZ=0;FS=0;MQ0F=0.00368098;AC=3;AN=6;DP4=4,76,70,101;MQ=37	GT:PL	0:0,112	0:0,112	0:0,118	1:255,0	1:255,0	1:255,0

I have also tried running the mpileup with a MQ threshold of 20, which barely makes a difference.
For sample 1 there are 23 reads supporting REF and 126 reads supporting ALT. For sample 2 there are 8 reads supporting REF and 112 reads supporting ALT (plus 2 reads supporting a second ALT). However bcftools seems pretty confident in the REF call for sample 1 and very confident in the ALT call for sample 2. Why is this? I would expect more uncertainty, so I could filter this position :)

@Lilu-guo
Copy link

Yes, I also found that the specified MQ threshold didn't seem to work.
For example, when using -q 20, it still uses 30 as the threshold; and when using -q 40 (greater than 30), it uses 40 normally.

Maybe we can discuss bcftools further. It will be more efficient to use WeChat or other instant tools.

@pd3
Copy link
Member

pd3 commented May 21, 2024

I am unable to comment without seeing the data, ideally a small bam slice + corresponding fasta reference chunk.

EDIT: This script can be used to create a small test case https://github.com/pd3/mpileup-tests/blob/main/misc/create-bam-test

@emos8710
Copy link
Author

Thanks for your reply, I will get back with some data!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants