-
Notifications
You must be signed in to change notification settings - Fork 570
Implement support for 64-bit integers #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
thx,that`s what i want. |
hi , lemire, i 've checked the "Roaring over 64-bit" code , it support long quite good , but it's didn't realize any Serialization and Deserialization which eagerly needed in my project .. can use suggest any other java BitMap which support 64 bit ? BestWishes ! 😃 |
I do not. Sorry. However, you can fork the code and build up the features you need. |
+1 for this feature in RoaringBitmap. Now we have to create RoaringBitmap[] to hold huge BloomFilters( up to 10 billions bits) |
Warning: this issue is marked as "help wanted" meaning that nobody is working on it. |
@lemire what about using a long to store the most significant bits in RoaringArray? Using a long costs 48 bits per key on the short required for 32 bit storage, and allows storage of up to 80 bit integers. A cost of 48 bits per key is competitive with any solution requiring an object reference (32 bits on a small heap, 64 bits > 32GB heap). Thoughts? Does an admissible solution require that RoaringBitmaps storing only 32 bit integers do not get larger or slower under any circumstances? |
@openkappa Yes, what you describe is one possibility out of a few that are quite reasonable. We have a paper http://r-libre.teluq.ca/930/ that reviews these possibilities and there are some experiments. Sadly the paper is in French. There is more work to be done on this front. |
@lemire It sounds like this isn't a decision you're likely to rush into, is this perhaps an open research topic in its early stages? If this functionality is required in a short time-frame, especially by anyone more interested in the "D" than "R" of "R&D", is it recommended to take a view on which implementation is best (for a particular definition of "good") and fork? I noticed the rust implementation went for RoaringTreemap (RoaringBitmap/roaring-rs#38) - is this the preferred data structure? For anyone else who prefers English, here's what I gleaned from the paper about the three proposals:
The implementations were benchmarked on some synthetic data over a range of densities generated from a uniform distribution and Zipfian distribution separately. Compression, performance of union, intersection and insertion were measured for each data set. For uniform data each proposal produces a similar compression ratio. Compression ratio across a range of densities is not presented for Zipfian data. Each implementation is similar for intersection performance, for both Zipfian and uniform data. LazyRoaring and Roaring2Level out perform RoaringTreemap for union, for all densities and both uniform and Zipfian data. RoaringTreemap and Roaring2Level significantly outperform LazyRoaring for insertion; Roaring2Level is better for high density uniformly distributed data; results for Zipfian data are not presented. |
@openkappa If this functionality is required in a short time-frame, especially by anyone more interested in the "D" than "R" of "R&D", is it recommended to take a view on which implementation is best (for a particular definition of "good") and fork? A fork is always possible, but I wish people wouldn't do this. Instead, one could probably implement RoaringTreemap in a weekend as an extension of the existing RoaringBitmap library. This was done in the past by Samy Chambi. His code might still be around on bitbucket. So we know that it is doable. And because we had benchmarks, we know that the results are likely to be reasonable. The same thing has been done in CRoaring: someone implemented a 64-bit version of Roaring by extending the 32-bit version. Forks lead to effort duplication. By building on the existing library, we can minimize duplication and maximize code reuse. It sounds like this isn't a decision you're likely to rush into Roaring is a community driven project. Look at the CRoaring library, for example, it supports 64-bit integers using one model. This happened because people with the talent, time and interest push a solution forward with a willingness to maintain the end result for the duration. is this perhaps an open research topic in its early stages? I wouldn't say so. I don't think that there are deep computer science problems. It is a matter of producing quality code, testing it and so forth. In open source projects, if something is not happening, it might simply be because nobody showed enough interest. |
@lemire Sure, I get the community aspect. I ask because it's not clear to me if there is consensus on the preferred data structure - is there? The paper indicates that there's a tradeoff between Roaring2Level and RoaringTreemap depending on whether you care about inserts or treat bitmaps as immutable and just want query performance. If someone goes off one weekend and unilaterally chooses to implement one or the other, then it might not be the best outcome, even if it's better than not supporting longs at all. |
@openkappa
I sort of gave you an answer: The strong argument in favor of RoaringTreemap is that it is easier to implement. Can something else be implemented and end up being better for some use cases? Sure. Now, here is the thing: with this sort of work, there will never be one solution that is always best in all cases. I mean, obviously you can micro-optimize one given design, and this is win-win... but there are design choices to be made and some choices will favor one type of use cases over others. That's just how it is. It is slightly frustrating because whatever you implement, someone will come up with a use case that makes you look bad. But then there is something of a fix. What you can do is setup a competition against some reasonable alternatives and measure the results in a use case you like. So people can say "in my use case, you work is useless", and then you can say "sure, but look at this other example". Watch my recent talk where I sort of do this... https://www.youtube.com/watch?v=ubykHUyNi_0 Finally, there is another point to consider: code that is correct and reasonably efficient always win over code that might theoretically exist but that nobody has coded. |
It is now supported. |
We have several implementations of Roaring over 64-bit integers in separate repositories...
https://bitbucket.org/samytto/lazyroaring64-bits
https://bitbucket.org/samytto/roaring64bits2levels
https://bitbucket.org/samytto/roaringtreemap
... as well as an evaluation...
https://bitbucket.org/samytto/datastructures64-bitints
Work is needed to bring the best of these implementations in the main RoaringBitmap library.
The text was updated successfully, but these errors were encountered: