Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clone smallest bitmap, adjust naive/workshy and thresholds in FastAggregation.and #612

Merged
merged 1 commit into from Jan 4, 2023

Conversation

richardstartin
Copy link
Member

This is based on observations made in Apache Pinot and separately in #608

  • for small numbers of bitmaps, naive_and can be faster than the more complex workShyAnd method
  • putting the smallest bitmap first in the aggregation can accelerate intersections

@richardstartin richardstartin changed the title clone smallest bitmap, adjust naive/workshy and thresholds clone smallest bitmap, adjust naive/workshy and thresholds in FastAggregation.and Jan 3, 2023
@richardstartin
Copy link
Member Author

I don't have time to evaluate this exhaustively, this has been determined to be effective already by Apache Pinot, but ran a quick benchmark on my laptop:

before:

Benchmark                         (constructionStrategy)  (count)  (probability)  (seed)   (size)   Mode  Cnt      Score     Error  Units
FastAggregationRLEStressTest.and         CONSTANT_MEMORY       10           0.01   99999  1000000  thrpt    5   8469.991 ± 710.023  ops/s
FastAggregationRLEStressTest.and         CONSTANT_MEMORY       10            0.1   99999  1000000  thrpt    5  18620.905 ± 229.084  ops/s
FastAggregationRLEStressTest.and         CONSTANT_MEMORY       10            0.5   99999  1000000  thrpt    5  12077.086 ±  21.410  ops/s
FastAggregationRLEStressTest.and         CONSTANT_MEMORY      100           0.01   99999  1000000  thrpt    5   1444.833 ± 135.439  ops/s
FastAggregationRLEStressTest.and         CONSTANT_MEMORY      100            0.1   99999  1000000  thrpt    5    163.065 ±   3.768  ops/s
FastAggregationRLEStressTest.and         CONSTANT_MEMORY      100            0.5   99999  1000000  thrpt    5    457.596 ±  15.069  ops/s

after:

Benchmark                         (constructionStrategy)  (count)  (probability)  (seed)   (size)   Mode  Cnt      Score      Error  Units
FastAggregationRLEStressTest.and         CONSTANT_MEMORY       10           0.01   99999  1000000  thrpt    5   9597.927 ± 1438.935  ops/s
FastAggregationRLEStressTest.and         CONSTANT_MEMORY       10            0.1   99999  1000000  thrpt    5  18970.302 ±  186.505  ops/s
FastAggregationRLEStressTest.and         CONSTANT_MEMORY       10            0.5   99999  1000000  thrpt    5  12065.076 ±   50.169  ops/s
FastAggregationRLEStressTest.and         CONSTANT_MEMORY      100           0.01   99999  1000000  thrpt    5   1459.275 ±   19.018  ops/s
FastAggregationRLEStressTest.and         CONSTANT_MEMORY      100            0.1   99999  1000000  thrpt    5    163.538 ±    2.663  ops/s
FastAggregationRLEStressTest.and         CONSTANT_MEMORY      100            0.5   99999  1000000  thrpt    5    457.225 ±    8.471  ops/s

@@ -35,7 +35,7 @@ public static RoaringBitmap and(Iterator<? extends RoaringBitmap> bitmaps) {
* @return aggregated bitmap
*/
public static RoaringBitmap and(RoaringBitmap... bitmaps) {
if (bitmaps.length > 2) {
if (bitmaps.length > 10) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optionally '10' could be made a variable with a comment indicating that it is based on a heuristic (so we know it is a rule of thumb).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user can already call the underlying methods if they want to and I think the parameter would be confusing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardstartin This is a fine answer.

@lemire
Copy link
Member

lemire commented Jan 3, 2023

@frensjan Would you be willing to check this out and maybe run a benchmark?

@frensjan
Copy link

frensjan commented Jan 4, 2023

Yes, I'd be happy to. Sorry for the lack of response on #608, still enjoying my winter vacation :)

@frensjan
Copy link

frensjan commented Jan 4, 2023

LGTM!

In my macro benchmarking for a particular query, FastAggregation.and used to dominate the benchmark at ~80% in cloning a large bitmap to ~4% in FastAggregation.and in total.

In my work I dropped the use of FastAggregation.and(...) and incrementally computed the intersection. Something like:

RoaringBitmap postings = null;

for( ... ) {
    RoaringBitmap p = readBitmap(...);
    postings = and(postings, p);
}

private static RoaringBitmap and(RoaringBitmap a, RoaringBitmap b) {
    if (a == null) {
        return b;
    }

    if (a.getLongSizeInBytes() < b.getLongSizeInBytes()) {
        RoaringBitmap result = a.clone();
        result.and(b);
        return result;
    } else {
        RoaringBitmap result = b.clone();
        result.and(a);
        return result;
    }
}

This improvement in FastAggregation yields slightly faster results in my macro benchmark. The difference seems to stem from the inability to use a cheaper metric to compare the bitmaps on. getLongSizeInBytes comes up as a fairly significant method CPU-wise in profiling; bitmap.highLowContainer.size() as @richardstartin is using should be way cheaper.

I've also tried a variant with cardinalityExceeds but that doesn't make much of a difference.

if (a.cardinalityExceeds(b.getLongCardinality())) {
    ...
`

@richardstartin
Copy link
Member Author

@lemire I'm pretty confident that this isn't going to cause problems so I recommend we merge this

@richardstartin
Copy link
Member Author

Thanks for the feedback @frensjan, much appreciated.

@lemire
Copy link
Member

lemire commented Jan 4, 2023

Merging.

I will issue a release.

@lemire lemire merged commit b2ad9b3 into master Jan 4, 2023
@richardstartin
Copy link
Member Author

Merging.

I will issue a release.

The release doesn't seem to have propagated yet, but it looks like the tag was issued 45 minutes ago

@lemire
Copy link
Member

lemire commented Jan 4, 2023

The release doesn't seem to have propagated yet, but it looks like the tag was issued 45 minutes ago

Yes. A new release is upcoming.

srowen pushed a commit to apache/spark that referenced this pull request Jan 17, 2023
### What changes were proposed in this pull request?
This pr aims upgrade RoaringBitmap 0.9.38

### Why are the changes needed?
This version bring a bug fix:

- RoaringBitmap/RoaringBitmap#613

a performance optimization:

- RoaringBitmap/RoaringBitmap#612

other changes as follows:

RoaringBitmap/RoaringBitmap@0.9.36...0.9.38

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #39613 from LuciferYang/SPARK-42092.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants