Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more compression formats #408

Open
eschnett opened this issue Nov 8, 2023 · 3 comments
Open

Support more compression formats #408

eschnett opened this issue Nov 8, 2023 · 3 comments

Comments

@eschnett
Copy link

eschnett commented Nov 8, 2023

CHORD is a radio telescope in Canada https://www.chord-observatory.ca that's currently being constructed. We are considering / experimenting with file formats for various data products, and ASDF looks interesting because it is (a) simple and (b) can be efficiently streamed.

In the past, compression algorithms very similar to Blosc's https://www.blosc.org/pages/ "bitshuffle" have proven very useful. I wonder whether these could be added to the standard.

I have, as experiment, added support for c-blosc, c-blosc2, and zstd to https://github.com/eschnett/asdf-cxx . I wonder whether you are in principle interested in augmenting the standard, using blsc, bls2, and zstd as compression strings.

@braingram
Copy link
Contributor

Thanks for opening an issue and for sharing your work. It's exciting to see another asdf implementation!

There has been some recent discussion about adding zstd support to asdf (see PR: asdf-format/asdf#1570). As the python asdf now supports adding compression algorithms via extensions (see this example adding zstd support: https://github.com/braingram/asdf-zstd) we'd like to soon create a new asdf-compressors package that adds a number of compression algorithms (see the roadmap for a mention of this plan: https://github.com/asdf-format/asdf/wiki/Roadmap#changes-not-tied-to-a-particular-version). It would be great to coordinate this with asdf-cxx to make sure the labels match and features are compatible.

I will give asdf-cxx a closer look. Have you done much testing with files written by asdf-cxx and read by the python (or IDL) implmentation of asdf (and vice versa)? It would be great to hear more about asdf-cxx and your impressions of asdf.

@braingram
Copy link
Contributor

braingram commented Nov 9, 2023

FYI: I ran your demo-compression example (thanks for providing that with your code!). I had to slightly modify it to not attempt to save using blosc2 (I didn't immediately find it on homebrew). The file it generated was readable in python with the new modifications to the asdf-compression package (this is a work-in-progress and I hope to move it to the asdf-format organization soon). I opened an issue to track some compatibility tests (there is one other package that has already added some form of blosc support via an extension): https://github.com/braingram/asdf-compression/issues/3

@eschnett
Copy link
Author

I think blosc2 is not available from Homebrew, Debian, etc. The main difference between blosc2 and blosc is that the former supports uncompressed data sizes larger than 2 GByte. For the time being just using blosc would be good enough.

I am now adding support for liblz4 as compressor to follow suit. I think you're using lz4f as token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants