IAVL Spec #167

AdityaSripal · 2019-08-20T00:03:40Z

Write documentation for current structure and behavior of the IAVL repo. Should make it easier for future developers to understand and contribute to the project.

ValarDragon · 2019-09-10T10:11:41Z

docs/overview.md

+
+When a node is no longer part of the latest IAVL tree, it is called an orphan. The orphaned node will exist in the nodeDB so long as there are versioned IAVL trees that are persisted in nodeDB that contain the orphaned node. Once all trees that referred to orphaned node have been deleted from database, the orphaned node will also get deleted.
+
+In Ethereum, the analog is [Patricia tries](http://en.wikipedia.org/wiki/Radix_tree).  There are tradeoffs.  Keys do not need to be hashed prior to insertion in IAVL+ trees, so this provides faster iteration in the key space which may benefit some applications.  The logic is simpler to implement, requiring only two types of nodes -- inner nodes and leaf nodes.  On the other hand, while IAVL+ trees provide a deterministic merkle root hash, it depends on the order of transactions.  In practice this shouldn't be a problem, since you can efficiently encode the tree structure when serializing the tree contents.


In practice this shouldn't be a problem, since you can efficiently encode the tree structure when serializing the tree contents.

I think this is definitely a problem in practice. It makes importing / exporting the tree data much harder.

Another problem is IAVL requires rebalancing, which is extra work (and I think potential security concerns?) for gas cost estimates. I'm not sure concretely how expensive I/O is for a rebalance, I would anticipate that its large.

I don't believe the I/O cost for a single rebalance is large, since the iAVL monitors for anytime the difference in height of any 2 branches of a node exceeds 2 and rebalances immediately.

The bigger problem is probably the frequency of these relatively inexpensive rebalances which can definitely add up

Still requires concrete analysis tho

The concern I have is that rebalancing is typically analyzed in the random write setting. We really want to be secure / optimize for malicious writes, as we should expect there to exist a malicious attacker who will write and delete in a strategy that will require the largest rebalancing. (An example of this sort of idea is an attacker placing many elements with the same hash or consecutive hashes within a hashmap)

A single rebalance is ill-defined, as it depends on the size of the sub-tree being rebalanced, we have to consider the cost for an attacker to cause worst-case rebalances (e.g. large sub-trees)

AFAIK, our AVL tree re-balance time is linear in the sub-tree size, due to having to merkelize the entire sub-tree.

jackzampolin

Great spec @AdityaSripal! Had a couple of small Qs but otherwise 👍 from me.

jackzampolin · 2019-09-11T00:09:37Z

docs/immutable_tree.md

+	if t.root == nil {
+		return 0, nil
+	}
+	return t.root.get(t, key)


Do we need to explain what t.root.get(t, key) does here?

jackzampolin · 2019-09-11T00:09:59Z

docs/immutable_tree.md

+	if t.root == nil {
+		return nil, nil
+	}
+	return t.root.getByIndex(t, index)


Do we need to explain what t.root.getByIndex(t, index) does here?

jackzampolin · 2019-09-11T00:11:36Z

docs/key_format.md

+// orphanKey: o|50|30|0xABCD
+var toVersion, fromVersion int64
+var hash []byte
+orphanKeyFormat.Scan(orphanKey, &toVersion, &fromVersion, hash)


What does this orphanKeyFormat.Scan(orphanKey, &toVersion, &fromVersion, hash) function do? Just break up the key on |?

Yea the orphan Key has an expected format o|toVersion|fromVersion|hash. Scan will use this expected format to parse the key and place the toVersion, fromVersion, hash encoded in the key into the respective pointers that are passed in

jackzampolin · 2019-09-11T00:13:07Z

docs/key_format.md

+*/
+```
+
+The order of the orphan KeyFormat matters. Since deleting a version `v` will delete all orphans whose `toVersion = v`, we can easily retrieve all orphans from nodeDb by iterating over the key prefix: `o|v`.


This is cool!

jackzampolin · 2019-09-11T00:41:59Z

docs/mutable_tree.md

+
+Visualization (Nodes are numbered to show correct key order is still preserved):
+
+Before `RotateRight(node8)`:


Great diagrams!

tnachen · 2019-09-11T16:26:36Z

@AdityaSripal thanks a lot for this!

docs/node.md

tessr · 2019-09-16T23:09:54Z

docs/immutable_tree.md

+}
+```
+
+Get by index will return both the key and the value. The index is the index in the sorted list of leaf nodes. The leftmost leaf has index 0. It's neighbor has index 1 and so on.


I am a little confused about what the index is--how are the leaf nodes sorted, and is a sorted list stored somewhere? And also, why is it useful to be able to fetch a key/value by index?

Nodes are lexicographically sorted by keys. This is what allows for efficient iteration over key ranges. This is one of the key properties of the IAVL tree.

Index is the absolute position of a given key.

Let me reread the spec and see how to make this clearer.

Yes, the leaf nodes are sorted lexicographically in the tree itself. So getting index 0 will be the leftmost key, and so on. Though I haven't seen where GetByIndex is actually used in the code. Since IterateRange doesn't actually use GetByIndex in its implementation

OK, cool. Thank you both!

liamsi · 2019-09-19T15:33:47Z

docs/overview.md

+
+The IAVL tree is a versioned, snapshottable (immutable) AVL+ tree for persistent data.
+
+The purpose of this data structure is to provide persistent storage for key-value pairs (say to store account balances) such that a deterministic merkle root hash can be computed.  The tree is balanced using a variant of the [AVL algorithm](http://en.wikipedia.org/wiki/AVL_tree) so all operations are O(log(n)).


To re-iterate what I said in the call. This is excellent documentation presenting the status quo of the iavl tree. Maybe it is out of scope for this spec but we should definitely describe for instance why the tree is balancing (as opposed to for instance a sparse merkle tree where the merkle hash trail would always the same size)?
A document explaining the overall design space, the required/desired properties and explains the rationale for the current design (what does the iavl tree optimize for and why?) and advantages over other designs is needed. I'm happy to team up on this effort.

Definitely agree that a rationale document is needed, don't believe its in scope for this PR tho. let's open an issue on it, and im happy to help out writing it up tho I could definitely benefit from more context/help on it

That would be a huge help @liamsi !!!! Happy to see you back in these parts 😉

@liamsi I'd be happy to pair on working on such a document. Its been something thats been on the back of my mind for awhile. (In particular because I think the I in IAVL is unnecessary, and that we can get better performance with a patricia trie variant that supports iteration)

@AdityaSripal could you make some follow up issues then we can merge this

…spec

node and nodedb docs

71b515d

AdityaSripal added the C:docs Component: Documentation label Aug 20, 2019

AdityaSripal changed the title ~~IAVL Spec~~ WIP: IAVL Spec Aug 20, 2019

AdityaSripal added 5 commits August 20, 2019 16:23

start tree docs

83ad1c2

add set/remove documentation

b225366

remove redundant section

71d1e35

add balance docs

e313bb1

add save/delete docs

0547bdc

tac0turtle assigned AdityaSripal Sep 3, 2019

ValarDragon reviewed Sep 10, 2019

View reviewed changes

tac0turtle requested review from brapse, cwgoes, ebuchman, jaekwon, josef-widder, milosevic and ValarDragon September 10, 2019 19:02

jackzampolin approved these changes Sep 11, 2019

View reviewed changes

zmanian approved these changes Sep 11, 2019

View reviewed changes

tessr reviewed Sep 11, 2019

View reviewed changes

docs/node.md Outdated Show resolved Hide resolved

tessr reviewed Sep 16, 2019

View reviewed changes

tac0turtle changed the title ~~WIP: IAVL Spec~~ IAVL Spec Sep 19, 2019

liamsi reviewed Sep 19, 2019

View reviewed changes

AdityaSripal added 2 commits September 19, 2019 10:24

organize docs and provide suggested order

d83ba39

Fix spacing

9e96c97

AdityaSripal marked this pull request as ready for review September 19, 2019 17:25

jackzampolin marked this pull request as ready for review September 19, 2019 19:55

AdityaSripal added 2 commits September 20, 2019 15:29

Merge branch 'master' of github.com:tendermint/iavl into aditya/spec

0b1ea04

Merge branch 'aditya/spec' of github.com:tendermint/iavl into aditya/…

df890ca

…spec

tac0turtle mentioned this pull request Sep 24, 2019

Define design Decisions of IAVL #176

Closed

tac0turtle merged commit f9d4b44 into master Sep 24, 2019

tac0turtle deleted the aditya/spec branch September 24, 2019 06:21

tac0turtle mentioned this pull request Nov 12, 2019

Write Spec #56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IAVL Spec #167

IAVL Spec #167

AdityaSripal commented Aug 20, 2019

ValarDragon Sep 10, 2019

AdityaSripal Sep 11, 2019

ValarDragon Sep 11, 2019

ValarDragon Sep 11, 2019

jackzampolin left a comment

jackzampolin Sep 11, 2019

jackzampolin Sep 11, 2019

jackzampolin Sep 11, 2019

AdityaSripal Sep 11, 2019

jackzampolin Sep 11, 2019

tessr Sep 16, 2019

jackzampolin Sep 11, 2019

tnachen commented Sep 11, 2019

tessr Sep 16, 2019

zmanian Sep 17, 2019

AdityaSripal Sep 17, 2019

tessr Sep 17, 2019

liamsi Sep 19, 2019 •

edited

AdityaSripal Sep 19, 2019

jackzampolin Sep 19, 2019

ValarDragon Sep 20, 2019

tac0turtle Sep 21, 2019

tac0turtle Sep 24, 2019


		When a node is no longer part of the latest IAVL tree, it is called an orphan. The orphaned node will exist in the nodeDB so long as there are versioned IAVL trees that are persisted in nodeDB that contain the orphaned node. Once all trees that referred to orphaned node have been deleted from database, the orphaned node will also get deleted.

		In Ethereum, the analog is [Patricia tries](http://en.wikipedia.org/wiki/Radix_tree). There are tradeoffs. Keys do not need to be hashed prior to insertion in IAVL+ trees, so this provides faster iteration in the key space which may benefit some applications. The logic is simpler to implement, requiring only two types of nodes -- inner nodes and leaf nodes. On the other hand, while IAVL+ trees provide a deterministic merkle root hash, it depends on the order of transactions. In practice this shouldn't be a problem, since you can efficiently encode the tree structure when serializing the tree contents.


		Visualization (Nodes are numbered to show correct key order is still preserved):

		Before `RotateRight(node8)`:


		The IAVL tree is a versioned, snapshottable (immutable) AVL+ tree for persistent data.

		The purpose of this data structure is to provide persistent storage for key-value pairs (say to store account balances) such that a deterministic merkle root hash can be computed. The tree is balanced using a variant of the [AVL algorithm](http://en.wikipedia.org/wiki/AVL_tree) so all operations are O(log(n)).

IAVL Spec #167

IAVL Spec #167

Conversation

AdityaSripal commented Aug 20, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackzampolin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tnachen commented Sep 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liamsi Sep 19, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liamsi Sep 19, 2019 •

edited