Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

covstats supporting cram #45

Open
hdashnow opened this issue Jun 21, 2018 · 10 comments
Open

covstats supporting cram #45

hdashnow opened this issue Jun 21, 2018 · 10 comments

Comments

@hdashnow
Copy link

Any chance of covstats supporting cram format? Unsurprisingly, when I tried I got an error:

$ goleft covstats input.cram

panic: gzip: invalid header

goroutine 1 [running]:
github.com/brentp/goleft/covstats.pcheck(0xa8ae60, 0xc4200d8410)
	/home/brentp/go/src/github.com/brentp/goleft/covstats/covstats.go:28 +0x4a
github.com/brentp/goleft/covstats.Main()
	/home/brentp/go/src/github.com/brentp/goleft/covstats/covstats.go:231 +0x913
main.main()
	/home/brentp/go/src/github.com/brentp/goleft/cmd/goleft/goleft.go:68 +0x17f```
@hdashnow
Copy link
Author

P.S. I think I've figured out how to calculate median coverage from the mosdepth output. It's not as fast as covstats, but still faster than most other tools out there.

@brentp
Copy link
Owner

brentp commented Jun 21, 2018

I think it should be straight-forward to support CRAM in covstats, I'll just do a system call to samtools view. I'll have a look at this next week if not sooner.

@hdashnow
Copy link
Author

Much appreciated, thank you!

@brentp
Copy link
Owner

brentp commented Jun 22, 2018

I just had a look at this and the cram index does not store the total number of mapped reads so it's not possible to estimate the coverage like that as it is for the bam index. we could iterate over a few cram slices, count reads, and not the byte offsets in the index, then use that rate and the total file size to estimate. this will be relatively accurate, but less-so than even the bam index estimate.

@aofarrel
Copy link

Are there any updates on this? I've been trying to implement a workflow involving covstats, and when it runs on CRAM files, the coverage reports as zero. Other stats appear to be accurate.

@brentp
Copy link
Owner

brentp commented Oct 26, 2020

given the lack of cram parser in go, I don't plan to support this. It's possible to get actual coverage for a 30X WGS cram in < 5 minutes using mosdepth.

@aofarrel
Copy link

aofarrel commented Oct 26, 2020

Thanks for your reply. I've noticed that covstats seems to run slowly on crams, even small ones. When you say there's no cram parser in go... what is it doing for the other stats? Apologies if this is a silly question, I'm very new to go.

In other words -- is covstats running on crams supported, just not for coverage? Or should I consider crams as not supported, period?

@brentp
Copy link
Owner

brentp commented Oct 26, 2020

Actually, I forgot what it was doing. It makes a system call to samtools which converts cram to bam, then parses the bam. So the sampling stuff (the parts that you notice are filled out from covstats) work fine with cram. But the mapped reads is taken from the bam index and not present in the cram index so it can't estimate coverage.

@jmonlong
Copy link

What would you recommend to very quickly extract mean/median coverage (+the insert size info) for CRAMs?
5 mins with mosdepth is pretty good but say we don't want that much info, just the median depth.

Would it make sense to slice the CRAM to a few random regions, convert to a small BAM and then use covstats?
We'd need to do this manually outside covstats, right? I see a "Regions" option but I'm not sure how it's used in the code, including if the full CRAM would be converted to BAM before computing coverage on these regions or not.

@brentp
Copy link
Owner

brentp commented Oct 27, 2020

I suppose I could make it calculate the mean for the first part of the chromosome with covstats.
As soon as you starting converting to bam, you're going to be going up in time, even if it's only part of the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants