Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COMPAT] New split mode for forward compatibility by default #216

Merged
merged 10 commits into from Feb 22, 2018

Conversation

FrancescAlted
Copy link
Member

New public function blosc_set_splitmode() for selecting another split mode than default. The supported ones are:

  • BLOSC_FORWARD_COMPAT_SPLIT (offers reasonably forward compatibility)
  • BLOSC_AUTO_SPLIT (for nearly optimal results (based on heuristics))
  • BLOSC_NEVER_SPLIT (for experimenting when trying to get compression ratios or speed)
  • BLOSC_ALWAYS_SPLIT (for experimenting when trying to get compression ratios or speed)

Still need to implement support for setting the split mode via environment variables.

…splitmode() for selecting another split mode.
@FrancescAlted
Copy link
Member Author

Adresses #215.

@kalvdans
Copy link
Contributor

kalvdans commented Feb 16, 2018

The patch looks good to me. Very nice to add tests for previous formats.

Should we add a note in the format description in README_HEADER.rst that compressor flags have changed meaning without bumping the compressor version? That will help other implementations also be forward compatible without having to dig into the git history.

Am I correct assuming the threadsafe API blosc_compress_ctx() will always use the reasonably forward compatibility mode? We should probably add a note to that function.

(I am a colleague to @estan)

@FrancescAlted
Copy link
Member Author

The patch looks good to me. Very nice to add tests for previous formats.

The tests are not there yet (only the files), but my plan is to add them soon.

Should we add a note in the format description in README_HEADER.rst that compressor flags have changed meaning without bumping the compressor version? That will help other implementations also be forward compatible without having to dig into the git history.

Yes, the plan is to add a section about this in the RELEASE_NOTES.rst. However, the format has not changed this time; the change is only about having the split flag activated or not by default, so there should be not a need to change README_HEADER.rst. The main message in the release notes should be: "from now on, all the buffers created starting with blosc 1.14.0 will be forward compatible with any previous versions of the library, at least until 1.3.0, when support for multi-codecs was introduced, and that of zstd codec is not used (zstd was introduced in 1.11.0). Versions from 1.11.0 to 1.14.0 might generate buffers that cannot be read with versions < 1.11.0 (this was due to a forward incompatible change introduced in 1.11.1).".

Am I correct assuming the threadsafe API blosc_compress_ctx() will always use the reasonably forward compatibility mode? We should probably add a note to that function.

Yes. The new 'BLOSC_FORWARD_COMPAT_SPLIT' mode will be the default for all the compression functions. And the blosc_set_splitmode() function will affect to all compression functions too.

@kalvdans
Copy link
Contributor

However, the format has not changed this time;

I was unclear, I meant the format change done in 9d06255. To avoid similar problems in future we should check that the reserved bits are zeros (I can make a pull request).

Your suggested release note sounds detailed and good!

@FrancescAlted
Copy link
Member Author

I see. Agreed, I should have bumped the BLOSC_VERSION_FORMAT when I did
9d06255. Please feel free to file a PR about this.

@estan
Copy link

estan commented Feb 16, 2018

This looks great.

Some comments:

I think the explanation of what BLOSC_FORWARD_COMPAT_SPLIT actually means should also go in the code documentation, not just the release notes.

But, I'm actually a little worried about the naming here. If I casually read BLOSC_FORWARD_COMPAT_SPLIT in some code, I'd assume that means forward compatibility is ensured, period. But when in fact, it's forward compat with a twist (not forward compatible if ZSTD is used).

Correct me if I'm wrong, but it is in fact the BLOSC_ALWAYS_SPLIT mode that ensures forward compatibility with all previous versions?

How about naming it like this:

  • BLOSC_DEFAULT_SPLIT
  • BLOSC_OPTIMAL_SPLIT
  • BLOSC_NEVER_SPLIT
  • BLOSC_ALWAYS_SPLIT
  • BLOSC_FORWARD_COMPAT_SPLIT (alias of BLOSC_ALWAYS_SPLIT)

The advantage is that when a user sees BLOSC_DEFAULT_SPLIT, will immediately know

  1. It is the default.
  2. The docs must be consulted to know what the meaning is (the name does not convey the semantics, which are a little complicated)

and when a user sees BLOSC_FORWARD_COMPAT_SPLIT, that is indeed (as can expected by the name) the mode that gives forward compatibility with all previous versions.

@estan
Copy link

estan commented Feb 16, 2018

Sorry, my bad! I had missed the fact that zstd was introduced in the very same version as the no splitting stuff (I thought it was a couple of releases earlier). So the names do indeed make sense.

@FrancescAlted FrancescAlted merged commit 4719f7e into master Feb 22, 2018
@FrancescAlted FrancescAlted deleted the forward-compat branch February 22, 2018 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants