Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable compressRunOnSerialization for Roaring bitmaps. #3228

Merged
merged 1 commit into from
Jul 8, 2016

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Jul 7, 2016

To get the benefit of the changes in metamx/bytebuffer-collections#15 we need to pass compressRunOnSerialization = true.

@gianm gianm added this to the 0.9.2 milestone Jul 7, 2016
@gianm gianm force-pushed the roaring-compress-configure branch from dd7a743 to be4d494 Compare July 7, 2016 13:32
@lemire
Copy link

lemire commented Jul 7, 2016

This is important.

@lemire
Copy link

lemire commented Jul 7, 2016

Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience, 2016.
http://arxiv.org/abs/1603.06549

@fjy
Copy link
Contributor

fjy commented Jul 7, 2016

👍

|Field|Type|Description|Required|
|-----|----|-----------|--------|
|`type`|String|Must be `roaring`.|yes|
|`compressRunOnSerialization`|Boolean|Use a run-length encoding where it is estimated as more space efficient.|no (default == `false`)|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to not enable RLE by default ?

@nishantmonu51
Copy link
Member

LGTM, 👍 , would be nice to make RLE enable by default in case it does not have any significant overheads.

@gianm
Copy link
Contributor Author

gianm commented Jul 7, 2016

@nishantmonu51 The main reason I didn't make it default was that the discussion in metamx/bytebuffer-collections#15 indicated that the RLE containers are not readable by pre 0.5 versions of roaring. Druid updated to 0.5 in 0.8.2 (#1759), so that means bitmaps written with this option won't be readable by Druid 0.8.1 or earlier.

Maybe that's fine?

@lemire is the above correct?

@lemire
Copy link

lemire commented Jul 7, 2016

@gianm

No, there should be no significant downside to activating run compression. In theory, there is a tiny overhead during serialization, but it is small, and often more than compensated by storage savings so that the end result is faster overall serialization. In the worst case, run compression brings little storage benefit and slightly lower performance, but there should be many cases where you have a large net win. We recommend it as the default.

As for compatibility problems... I think that what you are saying is correct...

RLE containers are not readable by pre 0.5 versions of roaring.

... but I am not sure there is a problem.

Roaring is backward compatible. So the library will happily load bitmaps without run compression. The older versions of the library will balk at bitmaps with run compression, but then why would it ever have to read them? It would only be a problem if someone upgraded Druid, rewrote the indexes, and then downgraded Druid... then you'd get (clean) exceptions (something about an unrecognized cookie).

If this ever happened and you wanted to make you index "go back in time", that is, undo the added compression, the Roaring API got you covered if you ever want to support this... https://github.com/RoaringBitmap/RoaringBitmap/blob/master/src/main/java/org/roaringbitmap/buffer/MutableRoaringBitmap.java#L1194

Please note that Roaring uses header cookies. This comes at a tiny cost storage-wise, but it has the benefit that the Roaring bitmaps are "self described". It limits the risk regarding data corruption. That is, you don't get Roaring trying to load and interpret garbage... instead, it balks after reading one word and throws up an exception which you can quickly interpret.

@gianm
Copy link
Contributor Author

gianm commented Jul 7, 2016

It would only be a problem if someone upgraded Druid, rewrote the indexes, and then downgraded Druid...

Yeah, that's the case I was thinking of. It can happen when someone updates a cluster, decides something about it isn't working for them after a few days, and rolls it back. Maybe that's fine though, we could point this out in the release notes and anyone concerned about that could set the property to false until they're confident they won't need to roll back.

I'm OK with setting the default to "true" and including something in the release notes. Anyone care to chime in on that plan?

@drcrallen
Copy link
Contributor

@gianm 👍 We only officially support version-to-version upgrades, not jumping around. So as long as a rollback can occur between, say, 0.9.2 and 0.9.1, I think its OK. Anyone expecting to roll back to 0.8.1 from 0.9.2 has other problems.

Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
@gianm
Copy link
Contributor Author

gianm commented Jul 7, 2016

@fjy @nishantmonu51 @drcrallen @lemire I changed the default to true and updated the docs

@nishantmonu51
Copy link
Member

thanks @gianm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants