Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map2cov parses more reads than expected by samtools flagstat #46

Closed
filip-husnik opened this issue Mar 15, 2017 · 4 comments
Closed

map2cov parses more reads than expected by samtools flagstat #46

filip-husnik opened this issue Mar 15, 2017 · 4 comments

Comments

@filip-husnik
Copy link

filip-husnik commented Mar 15, 2017

Hi Dominik,

My issue is similar to #39 where blobtools also reports 'more' reads than expected by flagstat. My bam file with corrected PacBio fasta reads mapped to the reference genome by bwa mem raises a warning with map2cov. It does not seem that the differences in numbers represent multimappings somehow missed by samflags used by map2cov. samtools view -f 1 -F 1024 -F 256 -F 2048 according to #40

I have quite a lot of supplementary reads. Is there maybe a problem in the way how BtIO.py gets the numbers of mapping and total reads? The warning says Based on samtools flagstat: expected 5031818 reads, 5529014 reads were parsed but when I run flagstat separately it correctly prints 5529014 + 0 mapped (99.71% : N/A)

Thanks!

Filip

My blobtools version is blobtools v0.9.19.
I have only samtools 1.3.1 (+htslib 1.3.1) in my $PATH and this is its flagstat output:

5545330 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
497196 + 0 supplementary
0 + 0 duplicates
5529014 + 0 mapped (99.71% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

This is the map2cov warning:

[STATUS]	: 	Checking with 'samtools flagstat'
[STATUS]	: 	Mapping reads = 5,031,818, total reads = 5,048,134 (mapping rate = 99.7%)
[PROGRESS]	: 	100%
[PROGRESS]	: 	109% 
[WARN]		: Based on samtools flagstat: expected 5031818 reads, 5529014 reads were parsed
@DRL
Copy link
Owner

DRL commented Mar 15, 2017

Hi Filip,

The command that is being run in the background in order to actually extract the reads is:

samtools view -f 1 -F 1024 -F 256 -F 2048

where the issue concerning your data is the -f 1 which mean the reads have to be paired, which I think yours are not. Hence the output will be wrong.

Bamfilter was meant for paired-end reads since SE reads can easily be grep'ed (which is also faster):

  1. grep the contig-names you want to keep/delete
  2. cut the readname column and put into file
  3. Use something like Heng Li's seqtk subseq to extract the reads based on the readname.

Normally this warning is not a problem, but in your case it is misleading since you have SE reads... will add a check for this.

Hope that helps.

Cheers,

dom

@filip-husnik
Copy link
Author

Thanks, Dominik. I see, I'm too used to work with PE reads that I did not realize the flag will be wrong for the PacBio SE reads. I don't really need to filter the bamfile, I was just checking the bamfilter functionality (unfortunately with a wrong file).

Does it mean that read_cov and base_cov values in the .cov file generated by map2cov are correct for a bam file with SE reads and I should just ignore the warning with similar bam files such as from PEARed/FLASHed PE reads?

Cheers,
Filip

@DRL
Copy link
Owner

DRL commented Mar 15, 2017

No worries.

Yes, map2cov (as well as create) will output a correct read_cov/base_cov for both PE and SE, as it uses the following flags:

samtools view -F 1024 -F 4 -F 256

Remember, that the message is just a "warning" which means that counts are different than expected, based on flagstats. But it is not an "error".

The only reason flagstats is actually invoked is to be able to produce a progress-bar while parsing. As long as you a happy with only getting coverage from reads that pass the flags, everything will be fine.

cheers,

dom

@DRL DRL closed this as completed Mar 15, 2017
@filip-husnik
Copy link
Author

Awesome, thanks.

Filip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants