Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

homozygous with equal allele depth #2056

Open
abhortas opened this issue Dec 13, 2023 · 2 comments
Open

homozygous with equal allele depth #2056

abhortas opened this issue Dec 13, 2023 · 2 comments

Comments

@abhortas
Copy link

abhortas commented Dec 13, 2023

I am genotyping 10 individuals with mpileup and call commands and I am getting some strange results.

I am using this code:

bcftools mpileup --skip-indels --annotate FORMAT/AD,FORMAT/DP,INFO/AD -C 0 -d 550 -Ou -f $ref $name | bcftools call -mv -Ov -o calls.vcf

variable $name has the 10 bam files of each individual and $ref the reference genome.

Here is one of the lines of the final vcf:

##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths (high-quality bases)">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

........GT:PL:DP:AD 1/1:31,6,0:26:13,13 1/1:0,7,1:26:14,12 1/1:8,9,0:38:17,21 1/1:0,16,1:44:28,16 1/1:28,6,0:24:14,10 1/1:3,5,0:22:11,11 1/1:9,0,10:17:11,6 1/1:2,0,6:20:10,10 1/1:10,12,0:57:33,24 1/1:6,0,10:24:9,15

so, it looks so strange for me that the pipeline calls all the 10 genotypes 1/1 when the allele depths are similar in every case.

If I just look at the allele depth I would say all the genotypes should be 0/1

What am I missing?

And yes, I can see the Shred-scaled genotype likelihoods, and several case looks to indicate homozygous 1/1 (here for example 1/1:31,6,0:26:13,13), but in other cases, wouldn't be heterozygous? (here for example 1/1:28,6,0:24:14,10).

And another final question. If I am getting all individuals 1/1, in other words.. a monomorphic site.....why is this site present in the final vcf? I would spect at least one allele of ALT (or REF).

@pd3
Copy link
Member

pd3 commented Dec 14, 2023

Starting with the last question, homozygous alternate genotypes count as variants relative to the reference genome.
Regarding the genotypes, it depends on the actual reads and base qualities, and BAQ realignment - try to switch it off. The results look dodgy, a small test case would be required to understand and explain what is going on

@abhortas
Copy link
Author

Thanks for the quick response.

Ok, I understand now why I have some monomorphic sites, I hadn't thought it but it makes sense.

On the other hand, I've already tried running the pipeline with BAQ realignment off, and the results are practically identical. All the genotypes are 1/1 and just change few allele depths, but nothing important, just one or two values.

it continue looks strange genotyping but I guess it is due to bases qualities and the internal algorithm of the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants