Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster bgzf compression/decompression with libdeflate? #124

Open
chrchang opened this issue Jun 29, 2018 · 2 comments
Open

Faster bgzf compression/decompression with libdeflate? #124

chrchang opened this issue Jun 29, 2018 · 2 comments

Comments

@chrchang
Copy link

If the latest htslib is built on a system with libdeflate (https://github.com/ebiggers/libdeflate), .bam and .vcf.gz compression and decompression speed is roughly doubled over stock zlib, and substantially better than what you get with Intel and Cloudflare zlib as well. I've put together a simple cgo wrapper for libdeflate (see https://godoc.org/github.com/grailbio/base/compress/libdeflate), and confirmed that modifying hts/bgzf to use its functions over compress/gzip (or klauspost/compress/gzip) produces a similar speedup.

The catch is that the libdeflate package currently requires cgo, and it would take an unreasonable amount of work (from my perspective, anyway) to change this. I see that there's currently no cgo dependency anywhere in biogo, though it appears to have existed in the past. Does this disqualify the proposal, or is there a way to introduce a cgo dependency that you'd consider acceptable?

If the latter is true, I'll go ahead and create a pull request of the appropriate form.

@kortschak
Copy link
Member

I am happy for this to happen if it is guarded by a build tag and off by default.

@chrchang
Copy link
Author

Okay, sounds good.

I also noticed that existing dependencies are restricted to the standard library and other github.com/biogo repositories, so I assume a fork of the libdeflate wrapper should be placed somewhere around here. I'll plan on positioning it at hts/bgzf/libdeflate (since the library has a buffer-size-must-be-set-in-advance constraint which makes it only useful for bgzf and very similar use cases), but let me know if you'd rather organize this differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants