-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roaring bitmaps by default #9548
Conversation
sounds good, but can you document a version of PR description in the doc file where configuration is (or maybe create a separate bitmaps.md file and link to that), so that users can make an informed decision. I know the paper is linked but it would be useful if summary of pros/cons is listed in Druid docs itself. |
I added a 'compression' section to the segment documentation page, that attempts to dissuade people from changing the defaults unless they verify that the settings are in fact better for their use case (which is why I left this off at first, because people probably shouldn't be changing these unless they know what they are doing imo). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 after CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, including the new docs.
LGTM, thanks |
Description
I think it is finally time to switch to using Roaring bitmaps instead of using CONCISE by default. Using Druid with Roaring is rather well tested by now, and in most cases I think it is going to provide a better out of the box experience, where the speed is generally worth the potential for larger segment sizes that come with high cardinalities. There will still exist cases of datasets with ultra high cardinality columns where CONCISE might produce smaller segments due to the overhead of the Roaring format, but it makes sense to me for the operator to opt into the decision of wanting the smallest possible segments at the potential cost of speed, rather than that being the default.
Related: http://db.ucsd.edu/wp-content/uploads/2017/03/sidm338-wangA.pdf (though this paper is using a from scratch custom version of roaring apparently)
This PR has:
Key changed/added classes in this PR
DefaultBitmapSerdeFactory