Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise FastAggregation#and by cloning the lowest cardinality bitmap #608

Closed
frensjan opened this issue Dec 21, 2022 · 7 comments
Closed

Comments

@frensjan
Copy link

Is your feature request related to a problem? Please describe.
I noticed that FastAggregation#and was consuming a lot of time in RoaringArray#clone. It turned out that I was anding bitmaps with greatly different cardinalities; e.g. on bitmap with 100k+ and another with < 100 bits set with an intersection of ~ 10 bits.

Describe the solution you'd like
It would be ideal if FastAggregation optimises this issue away. In a particular benchmark I'm running, selecting the bitmap with the lowest cardinality yields a ~4x performance improvement.

Selecting the smallest bitmap isn't straightforward with the current interface of FastAggregation#and for the form with an Iterator as argument. But the array version could easily use this heuristic and adding a version with a List<RoaringBitmap> or Collection could also work. I'd be happy to contribute this if desired.

The big question of course is what the cost of such optimisation is for cases with more homogeneous cardinalities. Any thoughts and feedback on this are welcome.

@lemire
Copy link
Member

lemire commented Dec 22, 2022

That's 100% reasonable and it does not look like a PR would be difficult to produce.

PR invited.

@frensjan
Copy link
Author

I've been working this a bit; some observations.

Given a benchmark setup:

@Param({"0.1", "0.5"})
double density;

@Param({"0.001", "0.01", "0.1", "0.5", "1.0"})
double overlap;

@Setup(Level.Trial)
public void setup() {
    int limit = 0x100000;
    int cardinality1 = (int) (limit * density);
    int cardinality2 = (int) (cardinality1 * overlap);
    
    SplittableRandom random = new SplittableRandom(42);
    
    bitmap1 = new RoaringBitmap();
    while (bitmap1.getCardinality() < cardinality1) {
        bitmap1.add(random.nextInt(limit));
    }
    
    bitmap2 = selectRandom(cardinality2, random);
    bitmap3 = selectRandom(cardinality2, random);
    bitmap4 = selectRandom(cardinality2, random);
    
    bitmaps = List.of(bitmap1, bitmap2, bitmap3, bitmap4);
}

private RoaringBitmap selectRandom(int cardinality, SplittableRandom random) {
    int[] ints = bitmap1.toArray();

    for (int i = 0; i < cardinality; i++) {
        int tmp = ints[i];
        int j = random.nextInt(bitmap1.getCardinality());
        ints[i] = ints[j];
        ints[j] = tmp;
    }

    ints = Arrays.copyOf(ints, cardinality);
    partialRadixSort(ints);

    return RoaringBitmap.bitmapOf(ints);
}

and benchmarks like:

return FastAggregation.and(bitmaps.iterator());
return FastAggregation.and(bitmaps.toArray(RoaringBitmap[]::new));
RoaringBitmap[] array = bitmaps.toArray(RoaringBitmap[]::new);
Arrays.sort(array, Comparator.comparing(RoaringBitmap::getLongSizeInBytes));
return FastAggregation.and(array);
List<RoaringBitmap> list = new ArrayList<>(bitmaps);
list.sort(Comparator.comparingLong(RoaringBitmap::getLongSizeInBytes));
return FastAggregation.and(list.iterator());

I get the impression that for 'overlap densities' of 0.1 and less, sorting the input by its cardinality, size or something similar improves performance.

When comparing the array version of FastAggregation.and with the iteration version over a size-sorted list:

  • Performance increase is worthwhile when the first bitmap has a density of 50% and the subsequent bitmaps have a density overlap of 10% at a ~67% speed improvement.
  • At a density of 10% and overlap density of 10% the speed improvement is over 6x.
  • At a density of 10% and an overlap density of 0.1% the speed improvement is a whopping 39x!

So it seems the performance improvement is caused by sorting by size and using the naive_and implementation. There seems to be no performance increase when sorting the input by size in combination with the array version of FastAggregation.and.

Note that for density 10% with overlap density 0.1%, the iterator version of FastAggregation.and without sorting is over 2x faster than the array version!

However, at higher densities of the first bitmap (10% and 50%) and higher overlap densities (50% and 100%), the array version of FastAggregation.and has a clear overhand being 3 to 4x faster than the iterator version over bitmaps sorted by size.

I would appreciate feedback on this. @lemire / richardstartin?

@lemire
Copy link
Member

lemire commented Dec 25, 2022

Such problems are always data-dependent and so it is important to take a broad view and include many tests. We also include more than one approach in our API because not one approach can be best, it is all about tradeoffs.

Could you add FastAggregationRLEStressTest to your benchmarks?

    public void createBitmaps() {
      random = new SplittableRandom(seed);
      RoaringBitmapWriter<RoaringBitmap> bitmapWriter = constructionStrategy.create();
      bitmaps = new RoaringBitmap[count];
      bufferBitmaps = new ImmutableRoaringBitmap[count];
      double p = Math.pow(probability, 1D/count);
      for (int i = 0; i < count; ++i) {
        for (int j = (int)(Math.log(random.nextDouble())/Math.log(1-p));
             j < size;
             j += (int)(Math.log(random.nextDouble())/Math.log(1-p)) + 1) {
            bitmapWriter.add(j);
        }
        bitmaps[i] = bitmapWriter.get();
        bufferBitmaps[i] = bitmaps[i].toMutableRoaringBitmap();
        bitmapWriter.reset();
      }
    }

@lemire
Copy link
Member

lemire commented Dec 25, 2022

sorting the input by its cardinality, size or something similar improves performance.

You probably do not even need to sort. If you have a tiny bitmap, putting it first should help. I will add a remark to our documentation.

@richardstartin
Copy link
Member

Putting the bitmap with the smallest number of containers first would be beneficial. This came up in Pinot, which sorts by ascending cardinality now, but having access to the containers allows for better worst case implications.

@lemire
Copy link
Member

lemire commented Dec 25, 2022

@richardstartin Added an issue: #611

@frensjan
Copy link
Author

frensjan commented Jan 4, 2023

closed through #612

@frensjan frensjan closed this as completed Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants