Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools mpileup call with empty output #2069

Open
valentinaOpazo opened this issue Jan 5, 2024 · 6 comments
Open

bcftools mpileup call with empty output #2069

valentinaOpazo opened this issue Jan 5, 2024 · 6 comments

Comments

@valentinaOpazo
Copy link

Hello, I'm interesting in identify if my sample has a insertion, deletion or if it's heterozygote (in a specific region). To do this I ran the next code
bcftools mpileup -Ou -r chrX:start-end -f genome.fa $Input_Path/input.bam | bcftools call -Ou -mv -o test_option3

When I run it I don't get any error messages. The following are the terminal messages.

Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[mpileup] 1 samples in 1 input files
[mpileup] maximum number of reads per input file set to -d 25

The problem is my output file test_option3 contain only the header. The last rows are:

##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##bcftools_callVersion=1.19-3-gd6f535f1+htslib-1.19-1-g61b037bb
##bcftools_callCommand=call -cv; Date=Fri Jan 5 13:47:43 2024
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TCGA-A2-A0YF-01A-31-A615-42-X007-S01_aliquot

I also tried to execute the same analysis in Galaxy and I got the same output file, Is it possible that the error was on my bam file? How can I test it?

@pd3
Copy link
Member

pd3 commented Jan 8, 2024

Check if there are any reads in that region

samtools view your.bam chrX:start-end | less

A common problem is a mismatch in chromosome naming convention (chrX vs X) between bam, fasta reference or the user.

@valentinaOpazo
Copy link
Author

Thanks for your quick answer, I appreciate it a lot!
I checked if there are reads in the region and there are (below is a screenshot of the output of samtools view)

image

Also, bam files and reference use the same chromosome convention name. So I'm still having trouble with the output bcf file

@pd3
Copy link
Member

pd3 commented Jan 9, 2024

The next thing to check then is what raw mpileup looks like. Maybe all positions in that region are non-variant, therefore bcftool call -v removes everything? You can test with

bcftools mpileup -r chrX:start-end -f genome.fa file.bam | less

or

bcftools mpileup -r chrX:start-end -f genome.fa file.bam | bcftools call -mA | less

@valentinaOpazo
Copy link
Author

You are right! When I removed -v option on bcftools call, the output isn't empty anymore.
However, I don't completely understand the output. When do you say that region are non-variant, what does it mean? I'm analyzing one sample per run code, so does it mean that my sample is equal to the reference genome? Below is one output file

image

In another sample, I got an output file that looks like the screenshot below. In this case, the interpretation must be that It has a deletion of 5 bases?
image

@pd3
Copy link
Member

pd3 commented Jan 16, 2024

In the first screenshot all bases in all reads are identical to the reference, hence there are no variants, nothing to call.

The second screenshot shows several bases with no coverage. There seem to be no overlapping reads, therefore the five positions with zero coverage are not called as deletion. It is possible there was a read with an indel and was filtered by mpileup (eg because of the --min-ireads option), but this I cannot tell just from seeing the screenshot.

@pd3
Copy link
Member

pd3 commented May 24, 2024

This requires a test case to reproduce and debug the problem. This script can be used to create a small slice of the bam and the reference https://github.com/pd3/mpileup-tests/blob/main/misc/create-bam-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants