New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clone smallest bitmap, adjust naive/workshy and thresholds in FastAggregation.and
#612
Conversation
cc67655
to
928471d
Compare
FastAggregation.and
I don't have time to evaluate this exhaustively, this has been determined to be effective already by Apache Pinot, but ran a quick benchmark on my laptop: before:
after:
|
@@ -35,7 +35,7 @@ public static RoaringBitmap and(Iterator<? extends RoaringBitmap> bitmaps) { | |||
* @return aggregated bitmap | |||
*/ | |||
public static RoaringBitmap and(RoaringBitmap... bitmaps) { | |||
if (bitmaps.length > 2) { | |||
if (bitmaps.length > 10) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optionally '10' could be made a variable with a comment indicating that it is based on a heuristic (so we know it is a rule of thumb).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user can already call the underlying methods if they want to and I think the parameter would be confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardstartin This is a fine answer.
@frensjan Would you be willing to check this out and maybe run a benchmark? |
Yes, I'd be happy to. Sorry for the lack of response on #608, still enjoying my winter vacation :) |
LGTM! In my macro benchmarking for a particular query, In my work I dropped the use of RoaringBitmap postings = null;
for( ... ) {
RoaringBitmap p = readBitmap(...);
postings = and(postings, p);
}
private static RoaringBitmap and(RoaringBitmap a, RoaringBitmap b) {
if (a == null) {
return b;
}
if (a.getLongSizeInBytes() < b.getLongSizeInBytes()) {
RoaringBitmap result = a.clone();
result.and(b);
return result;
} else {
RoaringBitmap result = b.clone();
result.and(a);
return result;
}
} This improvement in I've also tried a variant with if (a.cardinalityExceeds(b.getLongCardinality())) {
...
` |
@lemire I'm pretty confident that this isn't going to cause problems so I recommend we merge this |
Thanks for the feedback @frensjan, much appreciated. |
Merging. I will issue a release. |
The release doesn't seem to have propagated yet, but it looks like the tag was issued 45 minutes ago |
Yes. A new release is upcoming. |
### What changes were proposed in this pull request? This pr aims upgrade RoaringBitmap 0.9.38 ### Why are the changes needed? This version bring a bug fix: - RoaringBitmap/RoaringBitmap#613 a performance optimization: - RoaringBitmap/RoaringBitmap#612 other changes as follows: RoaringBitmap/RoaringBitmap@0.9.36...0.9.38 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes #39613 from LuciferYang/SPARK-42092. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Sean Owen <srowen@gmail.com>
This is based on observations made in Apache Pinot and separately in #608