-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for bytedelta filter #456
Conversation
Using the latest bytedelta version here, a python script for downloading data and another for transcoding from the ERA5 dataset, I am getting pretty impressive figures. Using regular shuffle in Blosc2:
Using shuffle + bytedelta:
Using bitshuffle:
All in all, bar the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This bytedelta filter makes a delta for every byte on a data stream. This is implemented as plugin for Blosc2, and mainly meant to be used after a shuffle filter.
This is essentially based on @aras-p fascinating blog: https://aras-p.info/blog/2023/03/01/Float-Compression-7-More-Filtering-Optimization/, but removing the shuffle part. For speed, it would make sense to intertwine shuffle and bytedelta, as it is done in the blog, but that would require far more work, as Blosc supports shuffle for general (2 to 255) typesizes (channels in the blog jargon).
This is still a bit preliminary (I still need to assess the new dependency on SSE4.1). Some benchmarks should follow soon too.