Optimise FastAggregation#and by cloning the lowest cardinality bitmap #608

frensjan · 2022-12-21T22:33:04Z

Is your feature request related to a problem? Please describe.
I noticed that FastAggregation#and was consuming a lot of time in RoaringArray#clone. It turned out that I was anding bitmaps with greatly different cardinalities; e.g. on bitmap with 100k+ and another with < 100 bits set with an intersection of ~ 10 bits.

Describe the solution you'd like
It would be ideal if FastAggregation optimises this issue away. In a particular benchmark I'm running, selecting the bitmap with the lowest cardinality yields a ~4x performance improvement.

Selecting the smallest bitmap isn't straightforward with the current interface of FastAggregation#and for the form with an Iterator as argument. But the array version could easily use this heuristic and adding a version with a List<RoaringBitmap> or Collection could also work. I'd be happy to contribute this if desired.

The big question of course is what the cost of such optimisation is for cases with more homogeneous cardinalities. Any thoughts and feedback on this are welcome.

The text was updated successfully, but these errors were encountered:

lemire · 2022-12-22T01:51:15Z

That's 100% reasonable and it does not look like a PR would be difficult to produce.

PR invited.

frensjan · 2022-12-24T22:36:58Z

I've been working this a bit; some observations.

Given a benchmark setup:

@Param({"0.1", "0.5"})
double density;

@Param({"0.001", "0.01", "0.1", "0.5", "1.0"})
double overlap;

@Setup(Level.Trial)
public void setup() {
    int limit = 0x100000;
    int cardinality1 = (int) (limit * density);
    int cardinality2 = (int) (cardinality1 * overlap);
    
    SplittableRandom random = new SplittableRandom(42);
    
    bitmap1 = new RoaringBitmap();
    while (bitmap1.getCardinality() < cardinality1) {
        bitmap1.add(random.nextInt(limit));
    }
    
    bitmap2 = selectRandom(cardinality2, random);
    bitmap3 = selectRandom(cardinality2, random);
    bitmap4 = selectRandom(cardinality2, random);
    
    bitmaps = List.of(bitmap1, bitmap2, bitmap3, bitmap4);
}

private RoaringBitmap selectRandom(int cardinality, SplittableRandom random) {
    int[] ints = bitmap1.toArray();

    for (int i = 0; i < cardinality; i++) {
        int tmp = ints[i];
        int j = random.nextInt(bitmap1.getCardinality());
        ints[i] = ints[j];
        ints[j] = tmp;
    }

    ints = Arrays.copyOf(ints, cardinality);
    partialRadixSort(ints);

    return RoaringBitmap.bitmapOf(ints);
}

and benchmarks like:

return FastAggregation.and(bitmaps.iterator());

return FastAggregation.and(bitmaps.toArray(RoaringBitmap[]::new));

RoaringBitmap[] array = bitmaps.toArray(RoaringBitmap[]::new);
Arrays.sort(array, Comparator.comparing(RoaringBitmap::getLongSizeInBytes));
return FastAggregation.and(array);

List<RoaringBitmap> list = new ArrayList<>(bitmaps);
list.sort(Comparator.comparingLong(RoaringBitmap::getLongSizeInBytes));
return FastAggregation.and(list.iterator());

I get the impression that for 'overlap densities' of 0.1 and less, sorting the input by its cardinality, size or something similar improves performance.

When comparing the array version of FastAggregation.and with the iteration version over a size-sorted list:

Performance increase is worthwhile when the first bitmap has a density of 50% and the subsequent bitmaps have a density overlap of 10% at a ~67% speed improvement.
At a density of 10% and overlap density of 10% the speed improvement is over 6x.
At a density of 10% and an overlap density of 0.1% the speed improvement is a whopping 39x!

So it seems the performance improvement is caused by sorting by size and using the naive_and implementation. There seems to be no performance increase when sorting the input by size in combination with the array version of FastAggregation.and.

Note that for density 10% with overlap density 0.1%, the iterator version of FastAggregation.and without sorting is over 2x faster than the array version!

However, at higher densities of the first bitmap (10% and 50%) and higher overlap densities (50% and 100%), the array version of FastAggregation.and has a clear overhand being 3 to 4x faster than the iterator version over bitmaps sorted by size.

I would appreciate feedback on this. @lemire / richardstartin?

lemire · 2022-12-25T18:05:33Z

Such problems are always data-dependent and so it is important to take a broad view and include many tests. We also include more than one approach in our API because not one approach can be best, it is all about tradeoffs.

Could you add FastAggregationRLEStressTest to your benchmarks?

    public void createBitmaps() {
      random = new SplittableRandom(seed);
      RoaringBitmapWriter<RoaringBitmap> bitmapWriter = constructionStrategy.create();
      bitmaps = new RoaringBitmap[count];
      bufferBitmaps = new ImmutableRoaringBitmap[count];
      double p = Math.pow(probability, 1D/count);
      for (int i = 0; i < count; ++i) {
        for (int j = (int)(Math.log(random.nextDouble())/Math.log(1-p));
             j < size;
             j += (int)(Math.log(random.nextDouble())/Math.log(1-p)) + 1) {
            bitmapWriter.add(j);
        }
        bitmaps[i] = bitmapWriter.get();
        bufferBitmaps[i] = bitmaps[i].toMutableRoaringBitmap();
        bitmapWriter.reset();
      }
    }

lemire · 2022-12-25T18:18:11Z

sorting the input by its cardinality, size or something similar improves performance.

You probably do not even need to sort. If you have a tiny bitmap, putting it first should help. I will add a remark to our documentation.

richardstartin · 2022-12-25T18:27:30Z

Putting the bitmap with the smallest number of containers first would be beneficial. This came up in Pinot, which sorts by ascending cardinality now, but having access to the containers allows for better worst case implications.

lemire · 2022-12-25T18:44:06Z

@richardstartin Added an issue: #611

frensjan · 2023-01-04T20:47:12Z

closed through #612

frensjan added the enhancement label Dec 21, 2022

lemire mentioned this issue Dec 25, 2022

Added a performance hint. #610

Merged

lemire mentioned this issue Dec 25, 2022

Add getContainerCount() method #611

Open

richardstartin mentioned this issue Jan 2, 2023

clone smallest bitmap, adjust naive/workshy and thresholds in FastAggregation.and #612

Merged

frensjan closed this as completed Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise FastAggregation#and by cloning the lowest cardinality bitmap #608

Optimise FastAggregation#and by cloning the lowest cardinality bitmap #608

frensjan commented Dec 21, 2022

lemire commented Dec 22, 2022

frensjan commented Dec 24, 2022

lemire commented Dec 25, 2022

lemire commented Dec 25, 2022

richardstartin commented Dec 25, 2022

lemire commented Dec 25, 2022

frensjan commented Jan 4, 2023

Optimise FastAggregation#and by cloning the lowest cardinality bitmap #608

Optimise FastAggregation#and by cloning the lowest cardinality bitmap #608

Comments

frensjan commented Dec 21, 2022

lemire commented Dec 22, 2022

frensjan commented Dec 24, 2022

lemire commented Dec 25, 2022

lemire commented Dec 25, 2022

richardstartin commented Dec 25, 2022

lemire commented Dec 25, 2022

frensjan commented Jan 4, 2023