Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generalise bitmap construction and expose heuristics #276

Merged
merged 9 commits into from
Oct 26, 2018
Merged

generalise bitmap construction and expose heuristics #276

merged 9 commits into from
Oct 26, 2018

Conversation

richardstartin
Copy link
Member

@richardstartin richardstartin commented Oct 20, 2018

This PR generalises efficient construction of bitmaps. This introduces a few extra constructors and marker interfaces, which I expect need socialising amongst users. This breaks the interface of OrderedWriter yet again, but this is unlikely to affect any user but myself, no warnings or compiler errors are introduced anywhere else. The aim is to be able to make optimisations to bitmap build times without needing to modify source code within the library, by providing various toggles, and to make the choice between implementations simpler to switch between using generics.

  • OrderedWriter becomes RoaringBitmapWriter because it can accept writes in any order
  • Adds add(long min, long max) to RoaringBitmapWriter
  • Adds addMany(int.. data) to RoaringBitmapWriter
  • Generalises Container and MappeableContainer with the introduction of a marker interface as a generic type bound
  • Generalises appending to RoaringArray and MutableRoaringArray via a marker interface
  • Implements a "wizard" to allow construction of any mutable variety of bitmap
  • Introduces heuristics for bitmap construction, such as the range of values to expect, how best to initialise the RoaringArray, whether to partially sort data first, whether to expect mostly array, runs or bitmaps
  • Allows fast construction of FastRankRoaringBitmap - it's possible FastRankRoaringBitmap could be properly integrated into FastAggregation and ParallelAggregation by adopting this approach to constructing bitmaps.

For example, to create a large, contiguous "existence bitmap"

RoaringBitmapWriter<RoaringBitmap> writer = RoaringBitmapWriter.writer()
.expectedRange(0L, Integer.MAX_VALUE) // initialises RoaringArray accordingly
.get();
writer.add(0L, Integer.MAX_VALUE);
RoaringBitmap existence = writer.get();

To create a FastRankRoaringBitmap with constant memory during construction:

RoaringBitmapWriter<FastRankRoaringBitmap> writer = RoaringBitmapWriter.writer()
.constantMemory() // use an 8kB array to buffer writes during construction to reduce allocations
.fastRank() // indicate that fast rank and select operations are desired
.expectedDensity(0.001) // useful if this can be guessed
.get();
writer.addMany(0, 25, 250000);
FastRankRoaringBitmap bitmap = writer.get();

And to build a buffer bitmap:

RoaringBitmapWriter<MutableRoaringBitmap> writer = RoaringBitmapWriter.bufferWriter()
.optimiseForRuns() // if known, let the writer know that the best guess id RunContainer
.initialCapacity(20) // tune the size of the MutableRoaringArray
.get();
IntStream.range(0, 20 * 0xFFFF).forEach(writer::add);
MutableRoaringBitmap bitmap = writer.get();

@coveralls
Copy link

coveralls commented Oct 20, 2018

Coverage Status

Coverage decreased (-0.005%) to 91.742% when pulling 5f263be on richardstartin:ordered-writer-array-heuristics into 02c397c on RoaringBitmap:master.

@lemire
Copy link
Member

lemire commented Oct 22, 2018

Great work. Maybe someone wants to review this ?

(I will review.)

@richardstartin
Copy link
Member Author

OK. I have just removed one of the marker interfaces to reduce the verbosity required for type bounds over the different implementations.

Since there are quite a few interesting features contributed by various users it would be good to try to tie them together into a consistent API somehow: this PR attempts to start that process by generalising construction. I am happy to make changes if necessary.

@lemire
Copy link
Member

lemire commented Oct 23, 2018

I have just asked @blacelle and @rafael-telles to review... It would be great to get wider feedback.

@richardstartin
Copy link
Member Author

I've fixed an existing bug where the key wraps.

@ppiotrow
Copy link
Member

I've put it in the bitmap builders codebase I maintain. I love the API, all is clear, and suited for advanced users. I can see differences in processing time between different heuristics used. LGTM! great job!

As next step I would imagine a support for users in deciding, which heuristics to enable. Given that some of my bitmaps collections are quite similar every day I build them, I'd like to decide which heuristics to use. The simplest version could be a tool to analyse some simple statistics like
public BitmapStatistics analyse(Iterator<RoaringBitmap> bitmaps)
where BitmapStatistics contains the number of containers each type, container cardinality distribution. Given that used could make some smarter decision on which wizard features to use.

The more advanced would be
public BuilderRecommendation analyse(Iterator<RoaringBitmap> bitmaps)
where BuilderRecommendation would return more detailed recommendations like optimiseForArrays together with expectedValuesPerContainer(128).

@richardstartin
Copy link
Member Author

@ppiotrow that would be quite useful. Actually, I'm interested in automated data quality checks and producing a significantly different "recommendation" day on day would be a very cheap indication that a new dataset is significantly different. Do you have ideas about how to implement it?

@ppiotrow
Copy link
Member

Looking at your code, the most powerful method seems to be expectedValuesPerContainer.
Maybe good enough recommendation is just to return average number of elements per containers?

I wonder if we could use information about number of array, bitmap and run containers, also the average number of elements per each container type to apply some expert rules. But no better ideas now.

@richardstartin
Copy link
Member Author

The performance rationale for the writer abstraction in this PR, along with the benefits of doing a partial radix sort, is described here.

@richardstartin
Copy link
Member Author

Are there any concerns to be addressed?

@ppiotrow
Copy link
Member

The org.roaringbitmap slowly became bag of classes - not only because of this PR. There are around 40 java files. Maybe builders can be stored in org.roaringbitmap.builders package with some of classes being package private?

@richardstartin
Copy link
Member Author

richardstartin commented Oct 24, 2018

I'm inclined to agree, but not without quite a lot of changes like changing the visibility of e.g. Util. Perhaps there could be a bit of a spring clean in another PR?

@ppiotrow
Copy link
Member

I think this might be good place&time to share. While building gigabytes of bitmaps we are getting best results by not using G1GC.
Optimising for throughput this setup has proven to be twice to infinity (as G1 can just fail) times better for our huge data.
-XX:+UseParallelGC -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=10000

@richardstartin
Copy link
Member Author

@ppiotrow does that relate to the features in this PR? Or just GC tuning advice?

@ppiotrow
Copy link
Member

General tip for creating Roaring Bitmaps. I assume, that the new builders code (if used smartly) should decrease number of allocations. Still I expect better results with ParallelGC than G1 using new builders code, as this is generally advised setup while optimising for throughput.
I think this might be useful to compare/mention if you publish results from your improvement.
Also quite worth is to at least consider such options in the JMH benchmarks . .shouldDoGC(true).jvmArgs("-Xmx128M", "-XX:+UseG1GC").

@richardstartin
Copy link
Member Author

OK. I guess they don't call it the "throughput collector" for nothing! Maybe you could get yourself onto JDK11 and give ZGC a try? Performance wise, these features are designed for mostly ordered insertions, and for completely random insertions could only be slightly worse than repeatedly calling RoaringBitmap.add. For completely or mostly ordered insertions, the performance should be a lot better.

@ppiotrow
Copy link
Member

ppiotrow commented Oct 26, 2018

I'm mostly working ordered data. Previous OrderedWriter was a little bit too heavy because of allocating new long[1024] for each created bitmap. Because of building millions of them in parallel I would need to grant huge memory just for buffers.
The new API is perfect as I could define different builders strategies for different groups of bitmaps. I plan to switch to new API in around month. During that for sure I'll check ZGC - good idea - and share results.

@richardstartin
Copy link
Member Author

If you have 32 threads, 32 * 8kB = 256kB really isn't that much memory, particularly compared to what you're building or what it's being built from, or considering container transitions. I think it might be a good idea to allow resetting the writer (clearing the buffer and reinitialising the underlying bitmap) and then you would always have a lot less than 1MB for buffering.

@ppiotrow
Copy link
Member

I meant holding millions of bitmaps in memory to build them. Think of new OrderedWriter[5000000]. This allocates 40GB just for buffers.

@richardstartin
Copy link
Member Author

I don't know anything about how your application works but the moment you put a single bitmap container in those 5M bitmaps you have 40GB anyway, so it feels like having 5M bitmaps, let alone building 5M bitmaps at the same time in a single process, could be part of the problem. @lemire discusses two encoding strategies for reducing the number of bitmaps required here.

@ppiotrow
Copy link
Member

Not if this is array or run container.

@richardstartin
Copy link
Member Author

Sure, but 1 BitmapContainer per bitmap = 40GB, because you have 5 million bitmaps.

@ppiotrow
Copy link
Member

That is what I meant. The previous API was just classic "space–time tradeoff" example where OrderedWriter used Space (8kB per bitmap) to get better Time results. The new API is beautiful because I can "prefer bitmap" for my dense features, while "preferring arrays" for sparse features (most of features I have are sparse). Also I can help preallocating valid sizes for arrays. Good job, I can't wait to try it.

@richardstartin
Copy link
Member Author

No I understand entirely, and hope I didn't give you the impression I didn't like what you were saying. I just never imagined anyone creating 5 million of these!

@lemire
Copy link
Member

lemire commented Oct 26, 2018

Ok. So this looks good to me.

We lost 0.005% in coverage and somehow coveralls thinks this warrants a red flag. Mysterious.

@lemire lemire merged commit e21398e into RoaringBitmap:master Oct 26, 2018

boolean isEmpty();

T runOptimize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's little too late but I've noticed this.
I think this method should be called more general toEfficientContainer and have that logic implemented.
Right now if we "optimiseForBitmaps", we may end up with BitmapContainers with cardinality lower than 4096 in the final bitmap which should not happen.
Also it's happy coincidence that RunContainer::runOptimize calls RunContainer::toEfficientContainer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not too late because nobody (even me) is using this yet, and optimiseForBitmaps should probably either not exist, or just call constantMemory.

The idea of this interface was to simply create a generic type bound, and leave the implementations alone. toEfficientContainer isn't defined for all container types at the moment. But I think it should be an easy change to make.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my second question why there is both constantMemory and optimiseForBitmaps. I'd remove optimiseForBitmaps if we still can.

@lemire
Copy link
Member

lemire commented Oct 27, 2018

We can issue a new release.

@richardstartin richardstartin deleted the ordered-writer-array-heuristics branch February 19, 2019 17:32
Smallhi pushed a commit to Smallhi/RoaringBitmap that referenced this pull request Jun 14, 2021
* generalise bitmap construction and expose heuristics

* remove need for one marker interface

* get rid of unnecessary Appender interface

* avoid wrap around for key, test writes to max key succeed

* add buffer test cases

* feed expected container size into arry containers

* allow resetting of a writer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants