-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose roaring_bitmap_overwrite #59
Conversation
It was made public in roaring.h 2 years ago according to git blame.
Adding the C function signature in this file is not enough, you also have to add some Python code somewhere to actually call the function. So, I suggest to add an |
Expose overwrite in bitmap.pxi
Fix.
Thanks for your fast reply. I added a method I'm probably out of use cases for now. So feel free to reject the change, if you don't think the new method might help other people. For my application it seems that roaring bitmaps have too large of an overhead. Memory was not a problem yet (similar to iterating over all intersections of a list of sets). |
Thank you!
Could you give more details? Roaring bitmaps should be both faster and cost less memory than the classical set. |
I don't know how much detail you want: I'm a SageMath developer. We use bitsets for all kind of things in mathematics and I only tried one thing with roaring so far. For this, the only thing I really need is subset check and intersection. I need only constant memory, which I can afford. So no reason to allocate anything more than once. The thing boils down to the following (I'm making up some numbers to give you an idea): Given 40 sets in range(0, 100 000) in a format of your choice and an extra one. Intersect each of them with the extra one and determine the inclusion maximal ones (in the end you should have your original format so that you can continue). I want to do this many times. (its a depth-first search and depth is maybe 10, maybe 20 but constant for each problem). The largest I dealt with so far was 120 sets with about 10 million elements. The best format I came up with so far was the following: An uncompressed bitmap/bitset and an array of pointers to the non-zero 256-bit chunks. So instead of looping through the entire thing, I just loop through the non-zero 256-bits. I didn't try other ways to store the significant chunks yet (poor design choice on my side, which I need to refactor such that the algorithm doesn't depend on the data structure). So maybe a bitset for this as well might be better (because you can get an extremely fast certificate that something is not a subset). If I remember correctly, the intersection takes up about a quarter of the time, because it has to go through the entire thing. The subset check usually stops early, I guess. If it's faster, then the extra dependency shouldn't be a problem. The large problem above took a month on something like 40 cores. At the moment I'm working on some smaller problem, which I want to solve in a few minutes on a standard computer. Regarding density of the sets: I start with somewhat dense sets. The deeper I go into the inclusion, the sparser they get. Eventually they will just contain a few elements. But then again the number of sets decreases (as I'm only considering inclusion maximal). |
Well, in my opinion, this kind of algorithms is exactly where roaring bitmaps should shine, both in terms of computation time and memory consumption, especially with a changing set density.
This might be a problem for pyroaring however. Like most of the Python programs, the GIL will prevent the use of the 40 cores in parallel in a multithreaded program. |
Yes you are right. The examples were just way to small. For the associahedron, which is pretty sparse for the most part, I get the following timings (one could apply a trick, because this thing is simple, but this isn't done yet)
I will keep investigating. As mentioned, the implementation is very naive (like appending lists etc). It sure looks like roaring is catching up fast. This naive implementation (nothing naive about roaring) is already faster than the current implementation in SageMath in this particular instance (which still uses bare uncompressed bitsets just with a more sophisticated wrapper around it). I would be glad, if we could use roaring as the sophisticated bitset library in SageMath. There would be a couple of things to do though, e.g. implementing bitwise shifts. From what I understand of roaring, this shouldn't be too hard to implement. I might even be able to figure this out. As it currently compiles, pyroaring isn't capabable of parallel computation. For me, it would suffice to just include the files as they are and define the I don't know yet, what other people in SageMath think. At the moment they told me that it is preferable to find a well-maintained and well-optimized bitset implementation instead of redoing this stuff. I also think it would be great to only have one roaring python interface. No need to create another. |
It seems pretty weird to me that pyroaring is slower than the builtin set by a factor 2. Would you mind sharing your code? Anyway, the rightmost column looks impressive, this one migth be hard to beat. |
It's not that hard to beat :-) It's already faster in this scenario. Exchanging the compressed bitset implementation by roaring (not pyroaring) and using optimized wrapper around the algorithm, I get the following times with croaring on 1 core: 2.37s for the 10-dimensional case and 15s for the 11-dimensional case. Parallelization has almost no overhead, so I would say that is faster than those 5 seconds on 4 cores. I also ran a few other examples and roaring bitmaps are faster at some point. This is pretty impressive. So for this challenging calculation, which took a month, roaring bitmaps could have reduced runtime significantly I suppose. One might even think about trying the next dimensional case with roaring, but I don't think the cause justifies the means. So I would say the overhead of roaring just needs a range of at least 100 000 elements to pay off (which is not suprising, if each container contains 64k bits and the compression heurestics are made for this size). The example that performed so disappointing has only range(782) but 782 sets. So I should have known, that roaring bitmaps is not a candidate for this kind of problem. Here comes my naive implementation:
I attached a zip file with two data example, so you can actually run it, if you care:
|
It was made public in roaring.h 2 years ago according to git blame.