-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IAVL Spec #167
IAVL Spec #167
Conversation
|
||
When a node is no longer part of the latest IAVL tree, it is called an orphan. The orphaned node will exist in the nodeDB so long as there are versioned IAVL trees that are persisted in nodeDB that contain the orphaned node. Once all trees that referred to orphaned node have been deleted from database, the orphaned node will also get deleted. | ||
|
||
In Ethereum, the analog is [Patricia tries](http://en.wikipedia.org/wiki/Radix_tree). There are tradeoffs. Keys do not need to be hashed prior to insertion in IAVL+ trees, so this provides faster iteration in the key space which may benefit some applications. The logic is simpler to implement, requiring only two types of nodes -- inner nodes and leaf nodes. On the other hand, while IAVL+ trees provide a deterministic merkle root hash, it depends on the order of transactions. In practice this shouldn't be a problem, since you can efficiently encode the tree structure when serializing the tree contents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice this shouldn't be a problem, since you can efficiently encode the tree structure when serializing the tree contents.
I think this is definitely a problem in practice. It makes importing / exporting the tree data much harder.
Another problem is IAVL requires rebalancing, which is extra work (and I think potential security concerns?) for gas cost estimates. I'm not sure concretely how expensive I/O is for a rebalance, I would anticipate that its large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe the I/O cost for a single rebalance is large, since the iAVL monitors for anytime the difference in height of any 2 branches of a node exceeds 2 and rebalances immediately.
The bigger problem is probably the frequency of these relatively inexpensive rebalances which can definitely add up
Still requires concrete analysis tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern I have is that rebalancing is typically analyzed in the random write setting. We really want to be secure / optimize for malicious writes, as we should expect there to exist a malicious attacker who will write and delete in a strategy that will require the largest rebalancing. (An example of this sort of idea is an attacker placing many elements with the same hash or consecutive hashes within a hashmap)
A single rebalance is ill-defined, as it depends on the size of the sub-tree being rebalanced, we have to consider the cost for an attacker to cause worst-case rebalances (e.g. large sub-trees)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, our AVL tree re-balance time is linear in the sub-tree size, due to having to merkelize the entire sub-tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great spec @AdityaSripal! Had a couple of small Qs but otherwise 👍 from me.
docs/immutable_tree.md
Outdated
if t.root == nil { | ||
return 0, nil | ||
} | ||
return t.root.get(t, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to explain what t.root.get(t, key)
does here?
docs/immutable_tree.md
Outdated
if t.root == nil { | ||
return nil, nil | ||
} | ||
return t.root.getByIndex(t, index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to explain what t.root.getByIndex(t, index)
does here?
docs/key_format.md
Outdated
// orphanKey: o|50|30|0xABCD | ||
var toVersion, fromVersion int64 | ||
var hash []byte | ||
orphanKeyFormat.Scan(orphanKey, &toVersion, &fromVersion, hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this orphanKeyFormat.Scan(orphanKey, &toVersion, &fromVersion, hash)
function do? Just break up the key on |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea the orphan Key has an expected format o|toVersion|fromVersion|hash
. Scan
will use this expected format to parse the key and place the toVersion, fromVersion, hash encoded in the key into the respective pointers that are passed in
docs/key_format.md
Outdated
*/ | ||
``` | ||
|
||
The order of the orphan KeyFormat matters. Since deleting a version `v` will delete all orphans whose `toVersion = v`, we can easily retrieve all orphans from nodeDb by iterating over the key prefix: `o|v`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
docs/mutable_tree.md
Outdated
|
||
Visualization (Nodes are numbered to show correct key order is still preserved): | ||
|
||
Before `RotateRight(node8)`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great diagrams!
@AdityaSripal thanks a lot for this! |
docs/immutable_tree.md
Outdated
} | ||
``` | ||
|
||
Get by index will return both the key and the value. The index is the index in the sorted list of leaf nodes. The leftmost leaf has index 0. It's neighbor has index 1 and so on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little confused about what the index is--how are the leaf nodes sorted, and is a sorted list stored somewhere? And also, why is it useful to be able to fetch a key/value by index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nodes are lexicographically sorted by keys. This is what allows for efficient iteration over key ranges. This is one of the key properties of the IAVL tree.
Index is the absolute position of a given key.
Let me reread the spec and see how to make this clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the leaf nodes are sorted lexicographically in the tree itself. So getting index 0
will be the leftmost key, and so on. Though I haven't seen where GetByIndex
is actually used in the code. Since IterateRange
doesn't actually use GetByIndex
in its implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, cool. Thank you both!
|
||
The IAVL tree is a versioned, snapshottable (immutable) AVL+ tree for persistent data. | ||
|
||
The purpose of this data structure is to provide persistent storage for key-value pairs (say to store account balances) such that a deterministic merkle root hash can be computed. The tree is balanced using a variant of the [AVL algorithm](http://en.wikipedia.org/wiki/AVL_tree) so all operations are O(log(n)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To re-iterate what I said in the call. This is excellent documentation presenting the status quo of the iavl tree. Maybe it is out of scope for this spec but we should definitely describe for instance why the tree is balancing (as opposed to for instance a sparse merkle tree where the merkle hash trail would always the same size)?
A document explaining the overall design space, the required/desired properties and explains the rationale for the current design (what does the iavl tree optimize for and why?) and advantages over other designs is needed. I'm happy to team up on this effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely agree that a rationale document is needed, don't believe its in scope for this PR tho. let's open an issue on it, and im happy to help out writing it up tho I could definitely benefit from more context/help on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a huge help @liamsi !!!! Happy to see you back in these parts 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liamsi I'd be happy to pair on working on such a document. Its been something thats been on the back of my mind for awhile. (In particular because I think the I in IAVL is unnecessary, and that we can get better performance with a patricia trie variant that supports iteration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AdityaSripal could you make some follow up issues then we can merge this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref #176
Write documentation for current structure and behavior of the IAVL repo. Should make it easier for future developers to understand and contribute to the project.