Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitfield improvements #506

Merged
merged 25 commits into from
Jun 26, 2020
Merged

Bitfield improvements #506

merged 25 commits into from
Jun 26, 2020

Conversation

timvermeulen
Copy link
Contributor

Summary of changes

  • BitField now always contains an encoded bitvec and never a decoded one, which means that most methods no longer need a &mut self reference
  • RLEPlus is an RLE+ encoded BitVec that has been verified to be encoded correctly
  • BitField and RLEPlus work with usize rather than u64 in order to minimize numeric conversions
  • AHashMap replaces FnvHashMap due to speedups of around 30%-50% according to pretty rudimentary benchmarks
  • the RangeIterator trait was added for working with lazy sequences of disjoint ranges
    • the RLE+ encoding works with runs of 1s and 0s, but working with ranges of 1s instead makes implementing several algorithms a lot less error-prone
    • consequently, a (meaningless) trailing run of 0s is no longer encoded, so a few tests needed minor changes to not use a trailing run of 0s
    • any RangeIterator is still a regular Iterator<Item = Range<usize>> so you can still use them in for loops and stuff like that
  • BitField borrows some behavior from HashMap: union, intersection, and difference return (lazy) RangeIterators, and the |, &, and - operators are their eager counterparts that return BitFields
    • the existing union for taking the union of any number of BitFields has been renamed multi_union
    • unlike in the Go implementation, these combinators are completely lazy and are linear over the number of ranges, not on the number of bits
  • a BitField can be iterated with ranges() as well as with iter() which returns an iterator over the set bits of the bit field
    • the private retrieve_set_indices was therefore renamed bounded_iter(max: usize) and was made public, it now also returns an iterator instead of a B: FromIterator
    • we no longer need BitField::for_each which can be directly replaced with .iter().try_for_each(...)
  • BitReader and BitWriter are used to efficiently read and write multiple bits at a time (up to 8) which significantly speeds up RLE+ encoding and decoding

I think I haven't forgotten anything!

Reference issue to close

Closes #468

Copy link
Contributor

@austinabell austinabell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the changes sooo much.

Only actual issue I have is that there will be inconsistencies in cases with the go implementation (specifically with the actors). The issue is that the bitfield is iterated and checked to be valid on instantiation which has an upfront cost but also in the cases of an invalid bitfield (or partially invalid) the go implementation will not error on serialization (which is an ErrSerialization exit code) and instead will error when the invalid part of the bitfield is iterated over (ErrIllegalState usually and in different places so it's possible another error was hit before).

One case that will for sure cause us to fork the network is if a bitfield is sent in params of a transaction that is invalid. We will fail immediately on deserialization and return that exit code, where the go implementation will either throw a different exit code or not error at all, which will fork the network.

To be clear though, I much prefer this way, it makes more sense for sure, but we need to match the go implementation and they are only erroring when the invalid ranges are iterated over.

utils/bitfield/Cargo.toml Outdated Show resolved Hide resolved
utils/bitfield/src/iter.rs Outdated Show resolved Hide resolved
utils/bitfield/src/iter.rs Outdated Show resolved Hide resolved
utils/bitfield/src/iter.rs Outdated Show resolved Hide resolved
utils/bitfield/src/lib.rs Show resolved Hide resolved
Ok(())
});
Ok(st.info.sector_size)
if declared_recoveries.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as another potential inconsistency (would have failed on serialization instead of ErrIllegalState exit code)

format!("failed to union faults: {}", e),
)
})?;
let all_declared = BitField::multi_union(&declared_sectors);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here (sorry for comment spam)

Comment on lines +980 to +981
let recoveries = &st.recoveries & &all_declared;
let new_faults = &all_declared - &recoveries;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two also (I'll stop checking for these now)

vm/actor/src/builtin/miner/state.rs Outdated Show resolved Hide resolved
vm/actor/src/builtin/miner/mod.rs Show resolved Hide resolved
@austinabell austinabell changed the base branch from master to main June 24, 2020 15:24
Copy link
Member

@ec2 ec2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, aside from Austins concerns

@timvermeulen
Copy link
Contributor Author

For future reference, Austin's concerns have been resolved — this is now consistent with the Go implementation

@timvermeulen
Copy link
Contributor Author

@austinabell @ec2 I pushed some minor changes to deal with integer overflow in the same way the Go implementation does for which I had to drop down from ranges to runs because a trailing 0 run still shouldn't overflow 64 bits, which is why DecodedRanges was replaced by Runs. This PR should be good to go now.

I'll make a separate PR for removing the bitvec dependency.

@timvermeulen timvermeulen merged commit 6e33c23 into main Jun 26, 2020
@timvermeulen timvermeulen deleted the tim/bitfield branch June 26, 2020 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BitField optimization and benchmarking
4 participants