Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Blosc2 filter #969

Merged
merged 100 commits into from Dec 9, 2022
Merged

Add Blosc2 filter #969

merged 100 commits into from Dec 9, 2022

Conversation

oscargm98
Copy link
Collaborator

@oscargm98 oscargm98 commented Oct 26, 2022

This PR implements Blosc2 filter support. For more details see: RELEASE_NOTES.rst

@t20100
Copy link

t20100 commented Nov 28, 2022

Some early feedbacks from starting integration of the blosc2 filter in hdf5plugin (silx-kit/hdf5plugin#201) and to answer silx-kit/hdf5plugin#201 (comment):

  • For the default number of threads, I would use 1 and allow the users to change it if needed (through an env var?). The issue I see with setting it by default to something else than 1 is that it's not a good default when using multiprocessing.
  • For the filter cd_values, I've just checked and it seems possible to pass an arbitrary number of integers. What about leveraging this and allow to set a pipeline of filters rather than using the same parameters as for blosc1? And for example passing (0, 0, 0, 0, compression, compression level, filter1, filter1_param, filter2, filter2_param, varying number...). It probably does not cover all cases, but would already enable using delta or trunc_prec + bitshuffle. Else this is only available through direct chunk write

@FrancescAlted
Copy link
Member

Some early feedbacks from starting integration of the blosc2 filter in hdf5plugin (silx-kit/hdf5plugin#201) and to answer silx-kit/hdf5plugin#201 (comment):

  • For the default number of threads, I would use 1 and allow the users to change it if needed (through an env var?). The issue I see with setting it by default to something else than 1 is that it's not a good default when using multiprocessing.

That's a good suggestion. We have just implemented support for BLOSC_* env vars (in particular BLOSC_NTHREADS) for the super-chunk API in C-Blosc2 2.5.0, and are working towards releasing Python-Blosc2 including C-Blosc2 2.5.0. Once this would be there, we can proceed to make the default number of threads to 1 and control that via BLOSC_NTHREADS.

  • For the filter cd_values, I've just checked and it seems possible to pass an arbitrary number of integers. What about leveraging this and allow to set a pipeline of filters rather than using the same parameters as for blosc1? And for example passing (0, 0, 0, 0, compression, compression level, filter1, filter1_param, filter2, filter2_param, varying number...). It probably does not cover all cases, but would already enable using delta or trunc_prec + bitshuffle. Else this is only available through direct chunk write.

Good to know and agreed. Let's start with something simple first and let's see how we can accommodate more parameters in the future.

@FrancescAlted
Copy link
Member

We think this is mostly ready to be merged. If nobody opposes, we will merge this and start the release process during the next week.

Copy link
Member

@avalentino avalentino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mostly OK for me.
Just a minor comment.

doc/source/usersguide/installation.rst Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
@FrancescAlted FrancescAlted merged commit c9b8aa8 into master Dec 9, 2022
@FrancescAlted FrancescAlted deleted the direct-chunking-blosc2 branch December 9, 2022 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants