Expose roaring_bitmap_overwrite #59

kliem · 2020-09-16T14:17:57Z

It was made public in roaring.h 2 years ago according to git blame.

Ezibenroc · 2020-09-16T16:27:50Z

Adding the C function signature in this file is not enough, you also have to add some Python code somewhere to actually call the function.

So, I suggest to add an overwrite method in the BitMap class (file bitmap.pxi), this should be straightforward.

Expose overwrite in bitmap.pxi

Fix.

kliem · 2020-09-18T05:25:38Z

Thanks for your fast reply. I added a method overwrite as suggested.

I'm probably out of use cases for now. So feel free to reject the change, if you don't think the new method might help other people.

For my application it seems that roaring bitmaps have too large of an overhead. Memory was not a problem yet (similar to iterating over all intersections of a list of sets).

Ezibenroc · 2020-09-18T23:19:28Z

Thank you!

For my application it seems that roaring bitmaps have too large of an overhead. Memory was not a problem yet (similar to iterating over all intersections of a list of sets).

Could you give more details? Roaring bitmaps should be both faster and cost less memory than the classical set.
Or maybe it was simply not worth it to add an additional dependency in your project?

kliem · 2020-09-19T00:18:07Z

I don't know how much detail you want:

I'm a SageMath developer. We use bitsets for all kind of things in mathematics and I only tried one thing with roaring so far. For this, the only thing I really need is subset check and intersection. I need only constant memory, which I can afford. So no reason to allocate anything more than once. The thing boils down to the following (I'm making up some numbers to give you an idea):

Given 40 sets in range(0, 100 000) in a format of your choice and an extra one. Intersect each of them with the extra one and determine the inclusion maximal ones (in the end you should have your original format so that you can continue). I want to do this many times. (its a depth-first search and depth is maybe 10, maybe 20 but constant for each problem).

The largest I dealt with so far was 120 sets with about 10 million elements.

The best format I came up with so far was the following: An uncompressed bitmap/bitset and an array of pointers to the non-zero 256-bit chunks. So instead of looping through the entire thing, I just loop through the non-zero 256-bits. I didn't try other ways to store the significant chunks yet (poor design choice on my side, which I need to refactor such that the algorithm doesn't depend on the data structure). So maybe a bitset for this as well might be better (because you can get an extremely fast certificate that something is not a subset).

If I remember correctly, the intersection takes up about a quarter of the time, because it has to go through the entire thing. The subset check usually stops early, I guess.

If it's faster, then the extra dependency shouldn't be a problem. The large problem above took a month on something like 40 cores. At the moment I'm working on some smaller problem, which I want to solve in a few minutes on a standard computer.

Regarding density of the sets: I start with somewhat dense sets. The deeper I go into the inclusion, the sparser they get. Eventually they will just contain a few elements. But then again the number of sets decreases (as I'm only considering inclusion maximal).

Ezibenroc · 2020-09-21T16:01:43Z

Well, in my opinion, this kind of algorithms is exactly where roaring bitmaps should shine, both in terms of computation time and memory consumption, especially with a changing set density.

The large problem above took a month on something like 40 cores.

This might be a problem for pyroaring however. Like most of the Python programs, the GIL will prevent the use of the 40 cores in parallel in a multithreaded program.

kliem · 2020-09-21T20:07:28Z

Yes you are right. The examples were just way to small.
I just reimplemented my problem in a naive very short python algorithm and compared roaring with python sets with uncompressed bitsets (implemted in sage, but it works as you think it would, no intrinsics, just normal bitwise operations).

For the associahedron, which is pretty sparse for the most part, I get the following timings (one could apply a trick, because this thing is simple, but this isn't done yet)

recursion depth	range	max n sets	python set	uncomprossed bitset	pyroaring	my own cython/Cpp-implementation on 4 pysical cores
10	58786	65	18s	48s	41s	0.3s
11	208012	77	110s	1204s	228s	5s

I will keep investigating. As mentioned, the implementation is very naive (like appending lists etc).

It sure looks like roaring is catching up fast. This naive implementation (nothing naive about roaring) is already faster than the current implementation in SageMath in this particular instance (which still uses bare uncompressed bitsets just with a more sophisticated wrapper around it).

I would be glad, if we could use roaring as the sophisticated bitset library in SageMath. There would be a couple of things to do though, e.g. implementing bitwise shifts. From what I understand of roaring, this shouldn't be too hard to implement. I might even be able to figure this out.

As it currently compiles, pyroaring isn't capabable of parallel computation. For me, it would suffice to just include the files as they are and define the cdef extern things as nogil. Then the interface and the annoying stuff like memory managment are taken care of and if you really want to go for speed, you can access the attributes and release the GIL. This is how I'm currently doing it.

I don't know yet, what other people in SageMath think. At the moment they told me that it is preferable to find a well-maintained and well-optimized bitset implementation instead of redoing this stuff.

I also think it would be great to only have one roaring python interface. No need to create another.

Ezibenroc · 2020-09-22T09:10:07Z

It seems pretty weird to me that pyroaring is slower than the builtin set by a factor 2. Would you mind sharing your code?

Anyway, the rightmost column looks impressive, this one migth be hard to beat.

kliem · 2020-09-22T10:34:02Z

It's not that hard to beat :-)

It's already faster in this scenario. Exchanging the compressed bitset implementation by roaring (not pyroaring) and using optimized wrapper around the algorithm, I get the following times with croaring on 1 core: 2.37s for the 10-dimensional case and 15s for the 11-dimensional case. Parallelization has almost no overhead, so I would say that is faster than those 5 seconds on 4 cores. I also ran a few other examples and roaring bitmaps are faster at some point.

This is pretty impressive. So for this challenging calculation, which took a month, roaring bitmaps could have reduced runtime significantly I suppose. One might even think about trying the next dimensional case with roaring, but I don't think the cause justifies the means.

So I would say the overhead of roaring just needs a range of at least 100 000 elements to pay off (which is not suprising, if each container contains 64k bits and the compression heurestics are made for this size).

The example that performed so disappointing has only range(782) but 782 sets. So I should have known, that roaring bitmaps is not a candidate for this kind of problem.

Here comes my naive implementation:

def face_iterator(coatoms, ignored_sets):
    while coatoms:
        a = coatoms.pop()
        yield a
        new_coatoms = [intersection(a,b) for b in coatoms]
        #print(new_coatoms)
        new_coatoms = remove_visited(new_coatoms, ignored_sets)
        #print(new_coatoms)
        new_coatoms = inclusion_maximal(new_coatoms)
        #print(new_coatoms)
        for b in face_iterator(new_coatoms, ignored_sets.copy()):
            yield b

        ignored_sets.append(a)

def remove_visited(new_coatoms, ignored_sets):
    output = []
    for e in new_coatoms:
        for f in ignored_sets:
            if is_subset(e,f):
                break
        else:
            output.append(e)

    return output

def inclusion_maximal(new_coatoms):
    output = []
    while new_coatoms:
        e = new_coatoms.pop()
        for f in chain(output, new_coatoms):
            if is_subset(e,f):
                break
        else:
            output.append(e)

    return output

I attached a zip file with two data example, so you can actually run it, if you care:

load('naive_implementation.py')                                                                                                                                               
facets = load_poly('asso_10', FrozenBitMap)                                                                                                                                   
%time sum_f_vector(facets)

pyroaring_compare.zip

Expose roaring_bitmap_overwrite

7f4f8db

It was made public in roaring.h 2 years ago according to git blame.

kliem added 2 commits September 18, 2020 06:30

Update bitmap.pxi

fab82f9

Expose overwrite in bitmap.pxi

Update bitmap.pxi

48e6398

Fix.

Ezibenroc merged commit 8b0f24b into Ezibenroc:master Sep 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose roaring_bitmap_overwrite #59

Expose roaring_bitmap_overwrite #59

kliem commented Sep 16, 2020

Ezibenroc commented Sep 16, 2020

kliem commented Sep 18, 2020

Ezibenroc commented Sep 18, 2020

kliem commented Sep 19, 2020

Ezibenroc commented Sep 21, 2020

kliem commented Sep 21, 2020

Ezibenroc commented Sep 22, 2020

kliem commented Sep 22, 2020

Expose roaring_bitmap_overwrite #59

Expose roaring_bitmap_overwrite #59

Conversation

kliem commented Sep 16, 2020

Ezibenroc commented Sep 16, 2020

kliem commented Sep 18, 2020

Ezibenroc commented Sep 18, 2020

kliem commented Sep 19, 2020

Ezibenroc commented Sep 21, 2020

kliem commented Sep 21, 2020

Ezibenroc commented Sep 22, 2020

kliem commented Sep 22, 2020