Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IAVL Spec #167

Merged
merged 10 commits into from Sep 24, 2019
Merged

IAVL Spec #167

merged 10 commits into from Sep 24, 2019

Conversation

AdityaSripal
Copy link
Member

Write documentation for current structure and behavior of the IAVL repo. Should make it easier for future developers to understand and contribute to the project.

@AdityaSripal AdityaSripal added the C:docs Component: Documentation label Aug 20, 2019
@AdityaSripal AdityaSripal changed the title IAVL Spec WIP: IAVL Spec Aug 20, 2019

When a node is no longer part of the latest IAVL tree, it is called an orphan. The orphaned node will exist in the nodeDB so long as there are versioned IAVL trees that are persisted in nodeDB that contain the orphaned node. Once all trees that referred to orphaned node have been deleted from database, the orphaned node will also get deleted.

In Ethereum, the analog is [Patricia tries](http://en.wikipedia.org/wiki/Radix_tree). There are tradeoffs. Keys do not need to be hashed prior to insertion in IAVL+ trees, so this provides faster iteration in the key space which may benefit some applications. The logic is simpler to implement, requiring only two types of nodes -- inner nodes and leaf nodes. On the other hand, while IAVL+ trees provide a deterministic merkle root hash, it depends on the order of transactions. In practice this shouldn't be a problem, since you can efficiently encode the tree structure when serializing the tree contents.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice this shouldn't be a problem, since you can efficiently encode the tree structure when serializing the tree contents.

I think this is definitely a problem in practice. It makes importing / exporting the tree data much harder.

Another problem is IAVL requires rebalancing, which is extra work (and I think potential security concerns?) for gas cost estimates. I'm not sure concretely how expensive I/O is for a rebalance, I would anticipate that its large.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the I/O cost for a single rebalance is large, since the iAVL monitors for anytime the difference in height of any 2 branches of a node exceeds 2 and rebalances immediately.

The bigger problem is probably the frequency of these relatively inexpensive rebalances which can definitely add up

Still requires concrete analysis tho

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern I have is that rebalancing is typically analyzed in the random write setting. We really want to be secure / optimize for malicious writes, as we should expect there to exist a malicious attacker who will write and delete in a strategy that will require the largest rebalancing. (An example of this sort of idea is an attacker placing many elements with the same hash or consecutive hashes within a hashmap)

A single rebalance is ill-defined, as it depends on the size of the sub-tree being rebalanced, we have to consider the cost for an attacker to cause worst-case rebalances (e.g. large sub-trees)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, our AVL tree re-balance time is linear in the sub-tree size, due to having to merkelize the entire sub-tree.

Copy link
Member

@jackzampolin jackzampolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great spec @AdityaSripal! Had a couple of small Qs but otherwise 👍 from me.

if t.root == nil {
return 0, nil
}
return t.root.get(t, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to explain what t.root.get(t, key) does here?

if t.root == nil {
return nil, nil
}
return t.root.getByIndex(t, index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to explain what t.root.getByIndex(t, index) does here?

// orphanKey: o|50|30|0xABCD
var toVersion, fromVersion int64
var hash []byte
orphanKeyFormat.Scan(orphanKey, &toVersion, &fromVersion, hash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this orphanKeyFormat.Scan(orphanKey, &toVersion, &fromVersion, hash) function do? Just break up the key on |?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea the orphan Key has an expected format o|toVersion|fromVersion|hash. Scan will use this expected format to parse the key and place the toVersion, fromVersion, hash encoded in the key into the respective pointers that are passed in

*/
```

The order of the orphan KeyFormat matters. Since deleting a version `v` will delete all orphans whose `toVersion = v`, we can easily retrieve all orphans from nodeDb by iterating over the key prefix: `o|v`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


Visualization (Nodes are numbered to show correct key order is still preserved):

Before `RotateRight(node8)`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great diagrams!

@tnachen
Copy link
Contributor

tnachen commented Sep 11, 2019

@AdityaSripal thanks a lot for this!

docs/node.md Outdated Show resolved Hide resolved
}
```

Get by index will return both the key and the value. The index is the index in the sorted list of leaf nodes. The leftmost leaf has index 0. It's neighbor has index 1 and so on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused about what the index is--how are the leaf nodes sorted, and is a sorted list stored somewhere? And also, why is it useful to be able to fetch a key/value by index?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodes are lexicographically sorted by keys. This is what allows for efficient iteration over key ranges. This is one of the key properties of the IAVL tree.

Index is the absolute position of a given key.

Let me reread the spec and see how to make this clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the leaf nodes are sorted lexicographically in the tree itself. So getting index 0 will be the leftmost key, and so on. Though I haven't seen where GetByIndex is actually used in the code. Since IterateRange doesn't actually use GetByIndex in its implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, cool. Thank you both!

@tac0turtle tac0turtle changed the title WIP: IAVL Spec IAVL Spec Sep 19, 2019

The IAVL tree is a versioned, snapshottable (immutable) AVL+ tree for persistent data.

The purpose of this data structure is to provide persistent storage for key-value pairs (say to store account balances) such that a deterministic merkle root hash can be computed. The tree is balanced using a variant of the [AVL algorithm](http://en.wikipedia.org/wiki/AVL_tree) so all operations are O(log(n)).
Copy link
Contributor

@liamsi liamsi Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To re-iterate what I said in the call. This is excellent documentation presenting the status quo of the iavl tree. Maybe it is out of scope for this spec but we should definitely describe for instance why the tree is balancing (as opposed to for instance a sparse merkle tree where the merkle hash trail would always the same size)?
A document explaining the overall design space, the required/desired properties and explains the rationale for the current design (what does the iavl tree optimize for and why?) and advantages over other designs is needed. I'm happy to team up on this effort.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree that a rationale document is needed, don't believe its in scope for this PR tho. let's open an issue on it, and im happy to help out writing it up tho I could definitely benefit from more context/help on it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a huge help @liamsi !!!! Happy to see you back in these parts 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liamsi I'd be happy to pair on working on such a document. Its been something thats been on the back of my mind for awhile. (In particular because I think the I in IAVL is unnecessary, and that we can get better performance with a patricia trie variant that supports iteration)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdityaSripal could you make some follow up issues then we can merge this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref #176

@AdityaSripal AdityaSripal marked this pull request as ready for review September 19, 2019 17:25
@jackzampolin jackzampolin marked this pull request as ready for review September 19, 2019 19:55
@tac0turtle tac0turtle merged commit f9d4b44 into master Sep 24, 2019
@tac0turtle tac0turtle deleted the aditya/spec branch September 24, 2019 06:21
@tac0turtle tac0turtle mentioned this pull request Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:docs Component: Documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants