Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/materialized view commitment #1071

Merged
merged 44 commits into from Aug 9, 2019

Conversation

@jcnelson
Copy link
Member

commented Jul 18, 2019

Implementation of SIP 004. However, the MARF index has not been tied to the Clarity VM DB yet.

jcnelson added 6 commits Jul 18, 2019
…ementation)
@jcnelson jcnelson requested review from lgalabru, kantai and zone117x Jul 18, 2019
jcnelson added 12 commits Jul 18, 2019
…iehash
…ocks. Found and fixed a bug when walking multiple 1-block fork segments.
… MARF handle, and add a way to begin, insert-batch, and commit key/value pairs as strings. The insert-batch will only calculate the ART hash once (unlike doing many inserts individually). Also, add a stress-test that test the MARF across many different forks by building a 256-node binary tree out of blocks, and disable long-running tests.
… stores the hash of some data, whose contents live in another data store (e.g. sqlite)
…bers
…T to disk by writing it to a temporary file and rename-ing it into place. Adds an fsync() to make the disk data durable, and adds a method for identifying in-progress but interrupted writes.
…t_ available via the fs::File interface.
@codecov

This comment has been minimized.

Copy link

commented Jul 29, 2019

Codecov Report

Merging #1071 into develop will increase coverage by 2.7%.
The diff coverage is 83.04%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop    #1071     +/-   ##
==========================================
+ Coverage    77.13%   79.83%   +2.7%     
==========================================
  Files           91       94      +3     
  Lines        17761    23139   +5378     
==========================================
+ Hits         13700    18474   +4774     
- Misses        4061     4665    +604
Impacted Files Coverage Δ
src/chainstate/stacks/index/node.rs 91.44% <ø> (ø)
src/util/macros.rs 79.54% <100%> (+2.8%) ⬆️
src/util/mod.rs 55% <100%> (+11.25%) ⬆️
src/chainstate/stacks/index/storage.rs 67.32% <67.32%> (ø)
src/chainstate/stacks/index/mod.rs 71.42% <71.42%> (ø)
src/chainstate/stacks/index/proofs.rs 75.65% <75.65%> (ø)
src/chainstate/stacks/index/bits.rs 78.43% <78.43%> (ø)
src/chainstate/stacks/index/marf.rs 85.72% <85.72%> (ø)
src/chainstate/stacks/index/fork_table.rs 92.73% <92.73%> (ø)
src/chainstate/stacks/index/trie.rs 94.05% <94.05%> (ø)
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6295a16...bee377d. Read the comment docs.

…o we don't have to match everything, and use more-descriptive variable names for TrieStorage and TrieCursor instances (as well as TriePaths and TrieLeafs, where appropriate)
// what's the next block we'll shunt to?
let mut idx = 0;
for i in 0..ancestor_hashes.len() {
if (1u32 << i) <= back_block {

This comment has been minimized.

Copy link
@kantai

kantai Aug 2, 2019

Member

These skip lists look ups (i.e., what's the next index in the skip list) should probably be functions of the MARF struct to ensure that backwards iteration over the list is the same everywhere.

This comment has been minimized.

Copy link
@jcnelson

jcnelson Aug 3, 2019

Author Member

I can do that.

This comment has been minimized.

Copy link
@jcnelson

jcnelson Aug 4, 2019

Author Member

Actually, looking at this deeper, it's not clear to me what would be gained by moving this to the MARF struct. The MARF never needs to use the skip-list to look up prior nodes -- it just uses the fork table within the TrieFileStorage instance. The MARF only needs to write the skip-list on calls to update_root_hash(). Only the prover needs to read the skip-list, and trying to move the logic to do so away from the prover I think would make it harder for the reader to understand the verifier.

Copy link
Member

left a comment

This code looks good to me -- the MARF looks like the perfect data structure for what we need, and I'm excited to be integrating this.

I have a couple comments that should be addressed before merging:

  1. A number of functions and constructors are called pretty frequently with similar arguments -- these should be factored to the extent possible (e.g., getting a root node pointer). I've commented in the places it seemed most appropriate, but there's probably room for more factoring.
  2. There's some error handling that should be panics (I've commented in the locations that seem like they may be corruption paths).
  3. Interactions with the TrieCursor appear to be fraught with danger. First, callers are responsible for cleaning up internal structures of the Cursor when the cursor stops for various reasons (back pointers, a node is being upgraded). Second, there's a number of cases where the cursor may stop -- it can stop because it reached a backpointer, because the path diverged, because it reached a leaf (which was the expected path), or because it reached the expected path which was an intermediate node -- that's four conditions, but the set of possible combinations of eop, eonp, div, and is_backptr(node.id) is 2^4.
    I think the cursor handling code should be refactored to have walk return an enum indicating why it stopped.
  4. We shouldn't compute hashes any more than necessary -- not out of a performance concern (which I'm not too worried about a few extra hashes), but from a correctness perspective, we should be able to point at the block of code that guarantees that a trie's merkel hashes have been correctly updated. Splitting or duplicating this responsibility leaves me less certain that structure updates are handled correctly.
  5. I think the use of Vecs is a misuse of Vecs. fast_extend... will panic if a Vec isn't an appropriate length, which leads me to believe that we shouldn't use Vecs at all -- a safety benefit is eliminating the unsafe block and getting slice length enforcement from the compiler. There would also likely be performance benefits (Vec has memory overhead and may allocate more than the reserved capacity, many clones would necessarily be eliminated). I realize this would require a lot of effort, so I'd say we should tag this as an issue and merge this PR without doing this.
@jcnelson

This comment has been minimized.

Copy link
Member Author

commented Aug 3, 2019

Thanks for your feedback @kantai!

A number of functions and constructors are called pretty frequently with similar arguments -- these should be factored to the extent possible (e.g., getting a root node pointer). I've commented in the places it seemed most appropriate, but there's probably room for more factoring.

Agreed. I will clean this up.

There's some error handling that should be panics (I've commented in the locations that seem like they may be corruption paths).

Yup, will fix those.

Interactions with the TrieCursor appear to be fraught with danger. First, callers are responsible for cleaning up internal structures of the Cursor when the cursor stops for various reasons (back pointers, a node is being upgraded).

Yes, but I don't think this can be helped. Different callers need to be able to do different things when the cursor cannot make progress, and when those things are done, the TrieCursor needs to be "restarted" so it can continue walking along the Trie path -- especially if the Trie structure itself changes in-between calls to walk(). Trying to put those various routines into the TrieCursor will have the (IMHO more-undesirable) effect of moving unrelated concerns into the TrieCursor, and away from the larger bodies of code that are designed to work with them. So, I think the TrieCursor should continue to expect the caller to restart it.

Second, there's a number of cases where the cursor may stop -- it can stop because it reached a backpointer, because the path diverged, because it reached a leaf (which was the expected path), or because it reached the expected path which was an intermediate node -- that's four conditions, but the set of possible combinations of eop, eonp, div, and is_backptr(node.id) is 2^4.

Yeah, in retrospect, this would have been a better approach. I'll go ahead and make this change.

We shouldn't compute hashes any more than necessary -- not out of a performance concern (which I'm not too worried about a few extra hashes), but from a correctness perspective, we should be able to point at the block of code that guarantees that a trie's merkel hashes have been correctly updated. Splitting or duplicating this responsibility leaves me less certain that structure updates are handled correctly.

Please see my inline comment above.

I think the use of Vecs is a misuse of Vecs. fast_extend... will panic if a Vec isn't an appropriate length, which leads me to believe that we shouldn't use Vecs at all -- a safety benefit is eliminating the unsafe block and getting slice length enforcement from the compiler. There would also likely be performance benefits (Vec has memory overhead and may allocate more than the reserved capacity, many clones would necessarily be eliminated). I realize this would require a lot of effort, so I'd say we should tag this as an issue and merge this PR without doing this.

Yeah, getting rid of Vecs entirely would be good -- they're heap-allocated, so mutating one potentially means an expensive call to mmap() or sbrk() as well as grabbing/releasing one or more locks. But, I'd prefer to tackle this last, once everything else is to this team's liking.

jcnelson added 8 commits Aug 4, 2019
…ptionError (since other parts of the codebase expect this)
…couldn't finish walking to the next node (call this a CursorError). Use this for following backptrs and COW'ing nodes instead of trying to deduce what happened.
…k to the next node
…hunt_proof() into two methods -- one for backptrs and one for blocks
…s fail; update all related tests
@jcnelson

This comment has been minimized.

Copy link
Member Author

commented Aug 4, 2019

Interactions with the TrieCursor appear to be fraught with danger. First, callers are responsible for cleaning up internal structures of the Cursor when the cursor stops for various reasons (back pointers, a node is being upgraded).

So, what I've done with this now is I've made it so there are three explicit repair_*() methods in TrieCursor that can only be called if the last call to .walk() encountered a particular error that would make the repair_() method appropriate. For example, following a back-pointer allows the caller to call repair_backptr_step_backptr() or repair_backptr_step_finish(), but not repair_retarget(). Similarly, encountering a backptr on a walk() entitles the caller to tall repair_backptr_*(), but not repair_retarget(). Failure to observe either of these rules will result in an immediate panic!() instead of possibly soldiering on with corrupt state.

Second, there's a number of cases where the cursor may stop -- it can stop because it reached a backpointer, because the path diverged, because it reached a leaf (which was the expected path), or because it reached the expected path which was an intermediate node -- that's four conditions, but the set of possible combinations of eop, eonp, div, and is_backptr(node.id) is 2^4.

So for this, I made it so TrieCursor's .walk() method returns a CursorError type, which in turn indicates why it could not continue to walk -- i.e. the child doesn't exist, the child is a backptr, or the path diverged. The error type is now used to determine the correct course of action, instead of deducing what went wrong though a combination of calls to .div() (now removed), .eop(), and checks on the last returned node and its ptr.

jcnelson added 2 commits Aug 4, 2019
…d node's hash correctly (and by extension, all descendent modified node hashes)
@kantai
kantai approved these changes Aug 5, 2019
Copy link
Member

left a comment

This looks great to me, thanks for making these changes, @jcnelson !

Copy link
Member

left a comment

Awesome and dense! I'll continue my review tomorrow.

src/chainstate/stacks/index/node.rs Show resolved Hide resolved
fork_queue.push_back(root_hash.clone());

while fork_queue.len() > 0 {
let next_hash = match fork_queue.pop_front() {

This comment has been minimized.

Copy link
@lgalabru

lgalabru Aug 5, 2019

Member

We can probably re-use the pattern
while let Some(next_hash) = fork_queue.pop_front() {
suggested by @kantai earlier.

Copy link
Member

left a comment

I think I'm starting to understand how all this pieces fit together.
I need to spend more time on the shunt proofs / Merkle skip-list.
This is art :)

src/chainstate/stacks/index/marf.rs Show resolved Hide resolved
src/chainstate/stacks/index/marf.rs Show resolved Hide resolved
src/chainstate/stacks/index/trie.rs Show resolved Hide resolved
src/chainstate/stacks/index/trie.rs Show resolved Hide resolved
src/chainstate/stacks/index/trie.rs Outdated Show resolved Hide resolved
src/chainstate/stacks/index/marf.rs Show resolved Hide resolved
let cur_block_hash = self.storage.get_cur_block();
let cur_block_rw = self.storage.readwrite();

let marf_value = MARFValue::from_value(value);

This comment has been minimized.

Copy link
@kantai

kantai Aug 7, 2019

Member

it wasn't clear on my first read, but the interface for the insert functions is pretty difficult to use --

The insert takes a string value, but it doesn't actually store the value -- instead it opaquely hashes it and stores that hash (which get would return, not the value itself). This interface would be impossible to use for a caller who is managing side storage for large values themselves (which is probably the best approach -- managing the item storage seems like an orthogonal concern to the marf index).

I think an interface like:

insert(&mut self, key: &[u8; 32], value: &[u8; 40]) -> Result<(), Error>;
get(&mut self, block_hash: &BlockHeaderHash, key: &[u8; 32]) -> Result<Option<[u8;40]>, Error>;

Would be more ergonomic.

This comment has been minimized.

Copy link
@jcnelson

jcnelson Aug 7, 2019

Author Member

This interface would be impossible to use for a caller who is managing side storage for large values themselves (which is probably the best approach

I agree that something needs to change, but I don't think it's this. When you insert data, the algorithm should be:

1. Get the MARF hash of the value
2. Store the key/value-hash in the MARF
3. Store the value and value-hash in the data store

To read data, the algorithm is:

1. Given the chain tip and key, get the MARF value hash
2. Given the value hash, get the associated value
3. (optional) Verify the value hashes to the value hash

So, the MARF interface would be:

fn insert(&mut self, key: &String, value_hash: &MARFValue) -> Result<(), Error>;
fn get(&mut self, block_hash: &BlockHeaderHash, key: &String) -> Result<Option<MARFValue>, Error>

And, the interface for MARFValue would be:

fn from_value(s: &String) -> MARFValue;
fn verify(&self, s: &String) -> bool;

This comment has been minimized.

Copy link
@kantai

kantai Aug 7, 2019

Member

These two hashes are only ever computed in those interface functions, so it doesn't seem like something that they should necessarily be required to do -- callers can quite easily perform that themselves.

The other issue for integration (a much larger issue) is that gets require a mutable reference. Part of this is because of the usage tracking, but also part of it is because of the structure of the MarfFileStorage struct.

This comment has been minimized.

Copy link
@jcnelson

jcnelson Aug 7, 2019

Author Member

These two hashes are only ever computed in those interface functions, so it doesn't seem like something that they should necessarily be required to do -- callers can quite easily perform that themselves.

Why do you think that the caller should be expected to provide two "bare" byte slices as arguments? I can't see why the caller should never need to know or care about the TriePath structure or the MARFValue's internal hash, because presumably the caller only cares about building an index over key/value pairs, and not the particulars of how the index itself is calculated. That's why I'm proposing to take a String as a key and a MARFValue as a value.

I can see why you might want to avoid String since we're really dealing with serialized Clarity objects. I'd be happy to take a Vec<u8> instead for each of these, but I can't see how supplying "bare" byte slices with oddly-specific lengths that are dependent on the internal implementation improves things.

The other issue for integration (a much larger issue) is that gets require a mutable reference. Part of this is because of the usage tracking, but also part of it is because of the structure of the MarfFileStorage struct.

I'm not sure why this is a problem? Regardless of how the MARF is implemented, a mutable reference is always going to be requried as a direct consequence of having to deal with a file descriptor internally. You don't want concurrent access to it, since it itself has internal mutable state like an offset, a read-ahead buffer, and a last-error.

This comment has been minimized.

Copy link
@kantai

kantai Aug 7, 2019

Member

I can see why you might want to avoid String since we're really dealing with serialized Clarity objects. I'd be happy to take a Vec instead for each of these, but I can't see how supplying "bare" byte slices with oddly-specific lengths that are dependent on the internal implementation improves things.

Because the MARF doesn't need to be responsible for either hashing keys or managing the actual storage of values.

I'm not sure why this is a problem?

The borrow checker gets pretty unhappy if every function requires mutable pointers (which would be the case for the Clarity VM to use .get)

This comment has been minimized.

Copy link
@jcnelson

jcnelson Aug 7, 2019

Author Member

Because the MARF doesn't need to be responsible for either hashing keys or managing the actual storage of values.

Actually, it does -- the whole point of hashing is to ensure that the keys are evenly distributed, no matter what the schedule of inserts is. This is particularly important to us, because one attack avenue a network adversary has is to hammer the MARF index by inserting and reading many keys that are close to each other in the key space (since these paths will have the most nodes in them and will thus incur the highest I/O costs to service). Even if they are suitably paid for, it still slows down the peer. Ensuring that all paths are derived from a cryptographic hash thwarts this attack somewhat, since the adversary is computationally bound. But nevertheless, at some point, I was going to audit the code to ensure that the insert/get paths to include the VRF seed hash in order to ensure that keys are evenly distributed, no matter how the contract data are structured and no matter what kinds of reasonable computing power the adversary has.

The MARF doesn't store values at all. It stores keys, index nodes and value hashes.

The borrow checker gets pretty unhappy if every function requires mutable pointers (which would be the case for the Clarity VM to use .get)

The only solution I can think of is to refactor get to open/close a file handle internally as a stack variable. We'd want to think very carefully about this because it can lead to a resource exhaustion attack -- i.e. we'll want to limit the number of concurrent calls to .get() globally.

This comment has been minimized.

Copy link
@kantai

kantai Aug 7, 2019

Member

Actually, it does -- the whole point of hashing is to ensure that the keys are evenly distributed, no matter what the schedule of inserts is.

Okay, that's fine then.

The only solution I can think of is to refactor get to open/close a file handle internally as a stack variable. We'd want to think very carefully about this because it can lead to a resource exhaustion attack -- i.e. we'll want to limit the number of concurrent calls to .get() globally.

The mutability won't be too much of an issue for now, since the Clarity VM does a lot more cloning than it really needs to. As we try to eliminate spurious clones, though, the borrow checker's mutability checks will start to scream at us.

However, the file handle issue you mention is already present: any back pointer traversed opens a file descriptor. Making get work on an immutable pointer would only really add one more file descriptor open.

This comment has been minimized.

Copy link
@jcnelson

jcnelson Aug 7, 2019

Author Member

However, the file handle issue you mention is already present: any back pointer traversed opens a file descriptor. Making get work on an immutable pointer would only really add one more file descriptor open.

Understood.

The resource exhaustion attack I had in mind becomes relevant once we hook the RPC and P2P network interfaces up to the Clarity VM database to allow clients to query key/value pairs. Because these calls can be concurrent, the MARF's get will either need to take an immtuable self reference (possibly by way of creating the initial file descriptor on the stack), or we'll want to create a bunch of read-only thread-local Clarity VM database handles (each of which contains a read-only MARF struct) and give each P2P and PRC thread its own MARF. I'm in favor of the latter approach because it lets us bound the number of file descriptors used by the MARF globally in a very natural way -- it's a function of the (fixed) number of threads we allocate to these other subsystems. We can also continue to allow get to take a mutable self reference. Trying to accomplish something similar by tracking the number of concurrent get calls on a single thread-shared MARF sounds much hairier by comparison. But we don't have to solve this problem today.

jcnelson added 2 commits Aug 8, 2019
* the hash of the key (the path) is consistent with the segment proofs
* the hash of the value (the tail of the first segment proof) is consistent with the value

for i in 0..node.ptrs().len() {
if node.ptrs()[i].id() == TrieNodeID::Empty {
hashes.push(TrieHash::from_data(&[]));

This comment has been minimized.

Copy link
@lgalabru

lgalabru Aug 8, 2019

Member

Why do we keep all this empty nodes in our proofs?

This comment has been minimized.

Copy link
@lgalabru

lgalabru Aug 8, 2019

Member

Should we at least cache TrieHash::from_data(&[])?

This comment has been minimized.

Copy link
@jcnelson

jcnelson Aug 9, 2019

Author Member

The prover needs to supply sibling hashes, even if they are the "empty" hash. However, when the prover goes to encode the proof to transmit over the network, it can (and should) compress the the proof to eliminate the redundant hashes. But if it's okay with you, I'd like to address this in a separate PR :)

Copy link
Member

left a comment

Looks good to me! Thank you for the intellectual challenge 😂.
I'd be volunteering on writing tests if we think that we should spend more time on this.

@jcnelson

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2019

Thanks @lgalabru and @kantai for reviewing!

@lgalabru I'll gladly take more tests if you want to provide them ;) We'll want to hammer this part of the codebase as much as possible before shipping it, since it is consensus-critical.

@jcnelson jcnelson merged commit 44f64ec into develop Aug 9, 2019
6 checks passed
6 checks passed
ci/circleci: build Your tests passed on CircleCI!
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: test_demo Your tests passed on CircleCI!
Details
codecov/patch 83.04% of diff hit (target 77.13%)
Details
codecov/project 79.83% (+2.7%) compared to 6295a16
Details
license/cla Contributor License Agreement is signed.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.