New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise FastAggregation#and by cloning the lowest cardinality bitmap #608
Comments
That's 100% reasonable and it does not look like a PR would be difficult to produce. PR invited. |
I've been working this a bit; some observations. Given a benchmark setup: @Param({"0.1", "0.5"})
double density;
@Param({"0.001", "0.01", "0.1", "0.5", "1.0"})
double overlap;
@Setup(Level.Trial)
public void setup() {
int limit = 0x100000;
int cardinality1 = (int) (limit * density);
int cardinality2 = (int) (cardinality1 * overlap);
SplittableRandom random = new SplittableRandom(42);
bitmap1 = new RoaringBitmap();
while (bitmap1.getCardinality() < cardinality1) {
bitmap1.add(random.nextInt(limit));
}
bitmap2 = selectRandom(cardinality2, random);
bitmap3 = selectRandom(cardinality2, random);
bitmap4 = selectRandom(cardinality2, random);
bitmaps = List.of(bitmap1, bitmap2, bitmap3, bitmap4);
}
private RoaringBitmap selectRandom(int cardinality, SplittableRandom random) {
int[] ints = bitmap1.toArray();
for (int i = 0; i < cardinality; i++) {
int tmp = ints[i];
int j = random.nextInt(bitmap1.getCardinality());
ints[i] = ints[j];
ints[j] = tmp;
}
ints = Arrays.copyOf(ints, cardinality);
partialRadixSort(ints);
return RoaringBitmap.bitmapOf(ints);
} and benchmarks like: return FastAggregation.and(bitmaps.iterator()); return FastAggregation.and(bitmaps.toArray(RoaringBitmap[]::new)); RoaringBitmap[] array = bitmaps.toArray(RoaringBitmap[]::new);
Arrays.sort(array, Comparator.comparing(RoaringBitmap::getLongSizeInBytes));
return FastAggregation.and(array); List<RoaringBitmap> list = new ArrayList<>(bitmaps);
list.sort(Comparator.comparingLong(RoaringBitmap::getLongSizeInBytes));
return FastAggregation.and(list.iterator()); I get the impression that for 'overlap densities' of 0.1 and less, sorting the input by its cardinality, size or something similar improves performance. When comparing the array version of
So it seems the performance improvement is caused by sorting by size and using the Note that for density 10% with overlap density 0.1%, the iterator version of However, at higher densities of the first bitmap (10% and 50%) and higher overlap densities (50% and 100%), the array version of I would appreciate feedback on this. @lemire / richardstartin? |
Such problems are always data-dependent and so it is important to take a broad view and include many tests. We also include more than one approach in our API because not one approach can be best, it is all about tradeoffs. Could you add FastAggregationRLEStressTest to your benchmarks? public void createBitmaps() {
random = new SplittableRandom(seed);
RoaringBitmapWriter<RoaringBitmap> bitmapWriter = constructionStrategy.create();
bitmaps = new RoaringBitmap[count];
bufferBitmaps = new ImmutableRoaringBitmap[count];
double p = Math.pow(probability, 1D/count);
for (int i = 0; i < count; ++i) {
for (int j = (int)(Math.log(random.nextDouble())/Math.log(1-p));
j < size;
j += (int)(Math.log(random.nextDouble())/Math.log(1-p)) + 1) {
bitmapWriter.add(j);
}
bitmaps[i] = bitmapWriter.get();
bufferBitmaps[i] = bitmaps[i].toMutableRoaringBitmap();
bitmapWriter.reset();
}
} |
You probably do not even need to sort. If you have a tiny bitmap, putting it first should help. I will add a remark to our documentation. |
Putting the bitmap with the smallest number of containers first would be beneficial. This came up in Pinot, which sorts by ascending cardinality now, but having access to the containers allows for better worst case implications. |
@richardstartin Added an issue: #611 |
closed through #612 |
Is your feature request related to a problem? Please describe.
I noticed that
FastAggregation#and
was consuming a lot of time inRoaringArray#clone
. It turned out that I was anding bitmaps with greatly different cardinalities; e.g. on bitmap with 100k+ and another with < 100 bits set with an intersection of ~ 10 bits.Describe the solution you'd like
It would be ideal if FastAggregation optimises this issue away. In a particular benchmark I'm running, selecting the bitmap with the lowest cardinality yields a ~4x performance improvement.
Selecting the smallest bitmap isn't straightforward with the current interface of
FastAggregation#and
for the form with anIterator
as argument. But the array version could easily use this heuristic and adding a version with aList<RoaringBitmap>
orCollection
could also work. I'd be happy to contribute this if desired.The big question of course is what the cost of such optimisation is for cases with more homogeneous cardinalities. Any thoughts and feedback on this are welcome.
The text was updated successfully, but these errors were encountered: