Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement compatible bitfield #466

Merged
merged 16 commits into from
Jun 1, 2020
Merged

Implement compatible bitfield #466

merged 16 commits into from
Jun 1, 2020

Conversation

austinabell
Copy link
Contributor

Summary of changes
Changes introduced in this pull request:

Was previously setup because it was optimistic that a basic bitvec would be sufficient and the API would have all the necessary functionality. Because of inconsistencies in how the bitvectors are decoded and when they error, this functionality was setup.

It doesn't match the go impl 1:1 because to match their iterators is quite tedious and going to PR the functionality in first and I'll open an issue to benchmark and optimize. There are some benefits to having it the way this is rather than those iterators, so will wait for benchmarks to draw conclusions. (This impl does less encoding/decoding but requires more memory in an actual environment probably)

cc @dutterbutter let me know if this interface fits your needs, happy to tweak things if needed. I sometimes require mutable reference and flush to avoid inefficiencies and unnecessary decoding but I can change if needed.

Reference issue to close (if applicable)

Closes

Other information and links

Copy link
Contributor

@timvermeulen timvermeulen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool!

I haven't looked too much into the Go implementation but I'm curious what the exact trade-off is for flushing the bitfield on operations that otherwise only need read-only access. Surely something like a single get call on an RLE+-encoded bitvec can be done more efficiently than decoding it first. So I suppose the idea is that you usually do more than one get (or other operations that are more efficient on decoded bitvecs than encoded ones), and at some point it pays off to decode it up front?

We could always expose the flush operation and have the user of the type decide whether to do this or not (if they have a mutable reference to it). Then we can make get/first/count etc work on shared references as well.

utils/bitfield/tests/bitfield_tests.rs Outdated Show resolved Hide resolved
utils/bitfield/src/lib.rs Outdated Show resolved Hide resolved
utils/bitfield/src/lib.rs Outdated Show resolved Hide resolved
utils/bitfield/src/lib.rs Outdated Show resolved Hide resolved
utils/bitfield/src/lib.rs Outdated Show resolved Hide resolved
utils/bitfield/src/lib.rs Show resolved Hide resolved
utils/bitfield/src/lib.rs Show resolved Hide resolved
utils/bitfield/src/lib.rs Outdated Show resolved Hide resolved
utils/bitfield/src/lib.rs Outdated Show resolved Hide resolved
utils/bitfield/tests/bitfield_tests.rs Outdated Show resolved Hide resolved
@austinabell
Copy link
Contributor Author

I haven't looked too much into the Go implementation but I'm curious what the exact trade-off is for flushing the bitfield on operations that otherwise only need read-only access.

Lookups are O(1) after the first flush, instead of checking the set and unset hashsets as well as iterating over the RLE encoded bits.

Surely something like a single get call on an RLE+-encoded bitvec can be done more efficiently than decoding it first.

Yes, I was planning on doing this in a seperate issue which would include benchmarking, but you can iterate over the RLE encoded bits to see if the index bit is set (that is the only other thing preventing) but waiting for benchmarking because checking the hash of the index against the set and unset cache as well as iterating over the RLE bits isn't trivial in extreme cases

So I suppose the idea is that you usually do more than one get (or other operations that are more efficient on decoded bitvecs than encoded ones), and at some point it pays off to decode it up front?

Yeah, that's the idea. I don't particularly like it, but it solves our need and matches the go impl (and where it would error, which is important)

We could always expose the flush operation and have the user of the type decide whether to do this or not (if they have a mutable reference to it). Then we can make get/first/count etc work on shared references as well.

Yes, that was what I wanted to do, but this being drop in functionality for the specs-actors, invariants of different error handling and inconsistent functionality is not something worth this change right now, especially since this will most likely be refactored. The main goal is to match functionality and have a consistent API

Copy link
Contributor

@timvermeulen timvermeulen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be good enough for now

@austinabell
Copy link
Contributor Author

Applied f94217d @timvermeulen @AshantiMutinta Sorry to do after your review, but I liked Ashanti's suggestion and it was a very small change and not functional difference to make in another PR

@timvermeulen
Copy link
Contributor

@austinabell Oh I missed that but lgtm 🙂

@austinabell austinabell merged commit 6308313 into master Jun 1, 2020
@austinabell austinabell deleted the austin/bitfield branch June 1, 2020 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants