Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose roaring_bitmap_overwrite #59

Merged
merged 3 commits into from
Sep 18, 2020
Merged

Conversation

kliem
Copy link
Contributor

@kliem kliem commented Sep 16, 2020

It was made public in roaring.h 2 years ago according to git blame.

It was made public in roaring.h 2 years ago according to git blame.
@Ezibenroc
Copy link
Owner

Adding the C function signature in this file is not enough, you also have to add some Python code somewhere to actually call the function.

So, I suggest to add an overwrite method in the BitMap class (file bitmap.pxi), this should be straightforward.

Expose overwrite in bitmap.pxi
@kliem
Copy link
Contributor Author

kliem commented Sep 18, 2020

Thanks for your fast reply. I added a method overwrite as suggested.

I'm probably out of use cases for now. So feel free to reject the change, if you don't think the new method might help other people.

For my application it seems that roaring bitmaps have too large of an overhead. Memory was not a problem yet (similar to iterating over all intersections of a list of sets).

@Ezibenroc
Copy link
Owner

Thank you!

For my application it seems that roaring bitmaps have too large of an overhead. Memory was not a problem yet (similar to iterating over all intersections of a list of sets).

Could you give more details? Roaring bitmaps should be both faster and cost less memory than the classical set.
Or maybe it was simply not worth it to add an additional dependency in your project?

@Ezibenroc Ezibenroc merged commit 8b0f24b into Ezibenroc:master Sep 18, 2020
@kliem
Copy link
Contributor Author

kliem commented Sep 19, 2020

I don't know how much detail you want:

I'm a SageMath developer. We use bitsets for all kind of things in mathematics and I only tried one thing with roaring so far. For this, the only thing I really need is subset check and intersection. I need only constant memory, which I can afford. So no reason to allocate anything more than once. The thing boils down to the following (I'm making up some numbers to give you an idea):

Given 40 sets in range(0, 100 000) in a format of your choice and an extra one. Intersect each of them with the extra one and determine the inclusion maximal ones (in the end you should have your original format so that you can continue). I want to do this many times. (its a depth-first search and depth is maybe 10, maybe 20 but constant for each problem).

The largest I dealt with so far was 120 sets with about 10 million elements.

The best format I came up with so far was the following: An uncompressed bitmap/bitset and an array of pointers to the non-zero 256-bit chunks. So instead of looping through the entire thing, I just loop through the non-zero 256-bits. I didn't try other ways to store the significant chunks yet (poor design choice on my side, which I need to refactor such that the algorithm doesn't depend on the data structure). So maybe a bitset for this as well might be better (because you can get an extremely fast certificate that something is not a subset).

If I remember correctly, the intersection takes up about a quarter of the time, because it has to go through the entire thing. The subset check usually stops early, I guess.

If it's faster, then the extra dependency shouldn't be a problem. The large problem above took a month on something like 40 cores. At the moment I'm working on some smaller problem, which I want to solve in a few minutes on a standard computer.

Regarding density of the sets: I start with somewhat dense sets. The deeper I go into the inclusion, the sparser they get. Eventually they will just contain a few elements. But then again the number of sets decreases (as I'm only considering inclusion maximal).

@Ezibenroc
Copy link
Owner

Well, in my opinion, this kind of algorithms is exactly where roaring bitmaps should shine, both in terms of computation time and memory consumption, especially with a changing set density.

The large problem above took a month on something like 40 cores.

This might be a problem for pyroaring however. Like most of the Python programs, the GIL will prevent the use of the 40 cores in parallel in a multithreaded program.

@kliem
Copy link
Contributor Author

kliem commented Sep 21, 2020

Yes you are right. The examples were just way to small.
I just reimplemented my problem in a naive very short python algorithm and compared roaring with python sets with uncompressed bitsets (implemted in sage, but it works as you think it would, no intrinsics, just normal bitwise operations).

For the associahedron, which is pretty sparse for the most part, I get the following timings (one could apply a trick, because this thing is simple, but this isn't done yet)

recursion depth range max n sets python set uncomprossed bitset pyroaring my own cython/Cpp-implementation on 4 pysical cores
10 58786 65 18s 48s 41s 0.3s
11 208012 77 110s 1204s 228s 5s

I will keep investigating. As mentioned, the implementation is very naive (like appending lists etc).

It sure looks like roaring is catching up fast. This naive implementation (nothing naive about roaring) is already faster than the current implementation in SageMath in this particular instance (which still uses bare uncompressed bitsets just with a more sophisticated wrapper around it).

I would be glad, if we could use roaring as the sophisticated bitset library in SageMath. There would be a couple of things to do though, e.g. implementing bitwise shifts. From what I understand of roaring, this shouldn't be too hard to implement. I might even be able to figure this out.

As it currently compiles, pyroaring isn't capabable of parallel computation. For me, it would suffice to just include the files as they are and define the cdef extern things as nogil. Then the interface and the annoying stuff like memory managment are taken care of and if you really want to go for speed, you can access the attributes and release the GIL. This is how I'm currently doing it.

I don't know yet, what other people in SageMath think. At the moment they told me that it is preferable to find a well-maintained and well-optimized bitset implementation instead of redoing this stuff.

I also think it would be great to only have one roaring python interface. No need to create another.

@Ezibenroc
Copy link
Owner

It seems pretty weird to me that pyroaring is slower than the builtin set by a factor 2. Would you mind sharing your code?

Anyway, the rightmost column looks impressive, this one migth be hard to beat.

@kliem
Copy link
Contributor Author

kliem commented Sep 22, 2020

It's not that hard to beat :-)

It's already faster in this scenario. Exchanging the compressed bitset implementation by roaring (not pyroaring) and using optimized wrapper around the algorithm, I get the following times with croaring on 1 core: 2.37s for the 10-dimensional case and 15s for the 11-dimensional case. Parallelization has almost no overhead, so I would say that is faster than those 5 seconds on 4 cores. I also ran a few other examples and roaring bitmaps are faster at some point.

This is pretty impressive. So for this challenging calculation, which took a month, roaring bitmaps could have reduced runtime significantly I suppose. One might even think about trying the next dimensional case with roaring, but I don't think the cause justifies the means.

So I would say the overhead of roaring just needs a range of at least 100 000 elements to pay off (which is not suprising, if each container contains 64k bits and the compression heurestics are made for this size).

The example that performed so disappointing has only range(782) but 782 sets. So I should have known, that roaring bitmaps is not a candidate for this kind of problem.

Here comes my naive implementation:

def face_iterator(coatoms, ignored_sets):
    while coatoms:
        a = coatoms.pop()
        yield a
        new_coatoms = [intersection(a,b) for b in coatoms]
        #print(new_coatoms)
        new_coatoms = remove_visited(new_coatoms, ignored_sets)
        #print(new_coatoms)
        new_coatoms = inclusion_maximal(new_coatoms)
        #print(new_coatoms)
        for b in face_iterator(new_coatoms, ignored_sets.copy()):
            yield b

        ignored_sets.append(a)

def remove_visited(new_coatoms, ignored_sets):
    output = []
    for e in new_coatoms:
        for f in ignored_sets:
            if is_subset(e,f):
                break
        else:
            output.append(e)

    return output

def inclusion_maximal(new_coatoms):
    output = []
    while new_coatoms:
        e = new_coatoms.pop()
        for f in chain(output, new_coatoms):
            if is_subset(e,f):
                break
        else:
            output.append(e)

    return output

I attached a zip file with two data example, so you can actually run it, if you care:

load('naive_implementation.py')                                                                                                                                               
facets = load_poly('asso_10', FrozenBitMap)                                                                                                                                   
%time sum_f_vector(facets)                                                                                                                                                   

pyroaring_compare.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants