-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
covstats supporting cram #45
Comments
P.S. I think I've figured out how to calculate median coverage from the mosdepth output. It's not as fast as covstats, but still faster than most other tools out there. |
I think it should be straight-forward to support CRAM in covstats, I'll just do a system call to samtools view. I'll have a look at this next week if not sooner. |
Much appreciated, thank you! |
I just had a look at this and the cram index does not store the total number of mapped reads so it's not possible to estimate the coverage like that as it is for the bam index. we could iterate over a few cram slices, count reads, and not the byte offsets in the index, then use that rate and the total file size to estimate. this will be relatively accurate, but less-so than even the bam index estimate. |
Are there any updates on this? I've been trying to implement a workflow involving covstats, and when it runs on CRAM files, the coverage reports as zero. Other stats appear to be accurate. |
given the lack of cram parser in go, I don't plan to support this. It's possible to get actual coverage for a 30X WGS cram in < 5 minutes using mosdepth. |
Thanks for your reply. I've noticed that covstats seems to run slowly on crams, even small ones. When you say there's no cram parser in go... what is it doing for the other stats? Apologies if this is a silly question, I'm very new to go. In other words -- is covstats running on crams supported, just not for coverage? Or should I consider crams as not supported, period? |
Actually, I forgot what it was doing. It makes a system call to samtools which converts cram to bam, then parses the bam. So the sampling stuff (the parts that you notice are filled out from covstats) work fine with cram. But the mapped reads is taken from the bam index and not present in the cram index so it can't estimate coverage. |
What would you recommend to very quickly extract mean/median coverage (+the insert size info) for CRAMs? Would it make sense to slice the CRAM to a few random regions, convert to a small BAM and then use covstats? |
I suppose I could make it calculate the mean for the first part of the chromosome with covstats. |
Any chance of covstats supporting cram format? Unsurprisingly, when I tried I got an error:
$ goleft covstats input.cram
The text was updated successfully, but these errors were encountered: