-
Notifications
You must be signed in to change notification settings - Fork 277
Support for bitmaps of size > 2^32 in C #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
I'm going to leave this open. |
We have several implementations of Roaring over 64-bit integers in separate repositories... https://bitbucket.org/samytto/lazyroaring64-bits https://bitbucket.org/samytto/roaring64bits2levels https://bitbucket.org/samytto/roaringtreemap ... as well as an evaluation... |
All the implementations above appear to be in Java; any news on a 64bit version for C/C++? We have various IDs which are 64bits (32 bits is a lot, yes, but when you're talking 100's of transactions / sec and services running for weeks/months, and being stored durably as well, it goes by pretty quickly...). We'll look into a two-layer approach. You mentioned a year ago that Samy Chambi was working on doing this and might publish something; did that happen and if so is there someplace to get a copy? Cheers! |
Not that I know. I have marked the issue as "help wanted" just now.
Yes. There is a paper in French with experimental results as well as links to (Java) implementations. http://r-libre.teluq.ca/930/1/Roaring64bits.pdf |
We have a relevant PR : #75 |
With the merger of PR #75, this issue is resolved for the c++ layer. |
I think that the issue remains open for C users. |
@lemire I just wrote a C++/CLI wrapper that allow's .NET callers (C#, F# etc..) to use the 64 bit CPP interface. It's able to be amalgamated since I used inline #pragma define's for the CLI mode. It requires a small change to the build settings in visual studio that specifies mixed mode (only for the shim.cpp). I can send you a pull request if you'd like, I'm not really great with cmake but can include a project settings file if that helps. The nice thing about mixed mode is that there is only the one DLL instead of at least two with other thunk mechanisms. |
I'd encourage you to issue a PR. |
@K2 My main concern at this point is to make sure that your code can be used by others in a reasonable manner. It is fine to ignore CMake... but I am concerned about dropping undocumented mysterious files in the middle of a medium-size project. Who is going to find these files and know how to use them? |
Sure I can take a look at CMake if you'd like or if you have other suggestions I'm open to it, maybe another .NET (the RogueException/CRoaring.Net is very good with 32bit) for 64. Similar to those who started this thread (and others Tornhoof/RoaringBitmap#2) regarding 64 implementations, I was looking over the various implementations for a 64 bit compatible version before going out and making my own when I noticed the CPP version and figured that it'd be nice if that were exported into .NET since it's an existing and tested implementation. The CLI code I wrote is almost (exception move operator, I could do better at simulating that;) 1:1 with the Roaring64Map in terms of interfaces. To me it seemed less error prone to add 64 bit .NET support by extending an existing underlying native implementation, I'd be assured not to inject any serious bugs. An added bonus was that as the Roaring64Map class performance improves it'd automatically boost this interface. I think in a perfect world if/when the underlying C interfaces are extended to 64bit then RogueException's exemplary work would do well for any other .NET language. I'd tend to defer to @RogueException to extend his code on his timeline, moving to 64 isn't quite as easy to-do and maintain good performance since the *Many operators essentially turn into enumerated forms over the inputs that slows everything down a lot, i.e. Roaring64Map::AddMany. If I were to rewrite those methods it'd be a lot nicer to sort of determine a *Many threshold to sort/bucket the inputs and use the underlying C *many call's to speed them up.... I can fork this off or invite @RogueException for suggestions or @ksajme or anybody else... |
Believe this is resolved (up to 2^64) by the addition of roaring64 to the c code. |
Closing. |
In OLAP workload, 2^32 rows is quickly reached. This limitation impedes the use of roaring bitmaps as bitmap indices on such system.
It seems the implementation (and the paper) are strongly built on 2^32 assumption. What are the implication of growing the number of buckets vs the size of a bucket?
The text was updated successfully, but these errors were encountered: