Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new function Walk() to trie #146

Closed
wants to merge 3 commits into from
Closed

add new function Walk() to trie #146

wants to merge 3 commits into from

Conversation

p4u
Copy link

@p4u p4u commented Oct 29, 2020

Walk allows iteration over all values stored into the SMT.

This is a useful method for exporting the tree (and importing afterwards) and also for search specific values.

Signed-off-by: p4u pau@dabax.net

@p4u
Copy link
Author

p4u commented Oct 30, 2020

This is not 100% correct... Sometimes the test will fail and I don't know why (yet).

Maybe someone with more knowledge on this SMT implementation can find the problem.

@p4u
Copy link
Author

p4u commented Oct 30, 2020

So with this last implementation, go test -timeout 30s . -run ^TestTrieWalk$ -count=1 works always but -count=10 crashes... I'm not sure if this is something from my changes or just anything wrong with the trie_test.go

Walk allows iteration over all key/values stored into the SMT.
If the callback function returns true, walk will stop.

Signed-off-by: p4u <pau@dabax.net>
Signed-off-by: p4u <pau@dabax.net>
@paouvrard
Copy link
Contributor

Thank you @p4u for your input. Have you checked this repository which does the export of an Aergo state snapshot ?
https://github.com/aergoio/state-tools

@p4u
Copy link
Author

p4u commented Nov 2, 2020

I did not know about the existence of that repository, I'll take a look.

However, from a SMT API consumer perspective with the experience of using several goLang Merkletree implementations for blockchain development, I think the following methods are missing:

  1. A Walk() as proposed here
  2. A Snapshot(root) in order to make an immutable tree (will not change over time)
  3. Get and Walk should allow specifying a root, something like smt.Get(key, root []byte)
  4. A Count(root) method in order to get the number of leafs
  5. A way to export and import the tree in order to generate the same exact Root hash

Does it makes sense to you?
To this end, some functions from the state-tools repository might be added to the SMT trie package. I think it would help on the adoption of this trie implementation by other open source blockchain projects.

This is our interface and our current implementations (where I want to add Aergo SMT): https://gitlab.com/vocdoni/go-dvote/-/tree/master/statedb

@paouvrard
Copy link
Contributor

Definitely makes sense.

I know some projects used the same trie but usually adapt it to their needs. So if there is consensus on an interface/usage that is useful to many, I'm happy to add features. We might have to refactor outside of the Aergo client code.
The interface you linked mentions Iterate, is that the same as Walk but for iterating from a subtree?

There is also the new approach https://github.com/ledgerwatch/turbo-geth which stores the leaves directly instead of querying from the root in logN, and re-constructs the root when a leaf is updated. It brings performance benefits but changes how we think about the trie (need to keep track of what is the current version of the trie).

Copy link
Contributor

@paouvrard paouvrard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution ! Nice work navigating the trie serialization 👍
Please see comments in the code for solving the random bug.

// Walk over the whole tree and compare the values
i := 0
if err := smt.Walk(func(v *WalkResult) bool {
if string(v.Value) != string(values[i]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes can be compared with bytes.Equal()

return err
}
if isShortcut {
var key []byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key can be accessed directly as:
lnode[:HashLength]

// walk fetches the value of a key given a trie root
func (s *Trie) walk(walkc chan (*WalkResult), stop *bool, root []byte, batch [][]byte, iiBatch, height int) error {
if len(root) == 0 || *stop {
// the trie does not contain the key or stop bool is set tu true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// the sub tree is empty or stop walking

return nil
}
// Fetch the children of the node
nbatch, iBatch, lnode, rnode, isShortcut, err := s.loadChildren(root, height, iiBatch, batch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, nbatch and iiBatch can be batch and iBatch

for {
select {
case <-close:
break
Copy link
Contributor

@paouvrard paouvrard Nov 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tracked down the random bug you mentioned to this function. There are cases where the Walk() finishes iterating the tree and returns while the callback hasn't finished executing. So if calling Walk() 2 times in a row, the i++ from the 1st walk can override the i = 1 reset for the 2nd walk.

To prevent this, you need confirmation that the goroutine has finished executing before Walk() returns, by using a blocking channel that waits for the goroutine to return for example.

Also, this break should be a return otherwise only select is broken and not the for loop.

Using WaitGroup doesn't seem necessary here.

close channel can be renamed as it is a golang builtin

For example:

	s.lock.RLock()
	defer s.lock.RUnlock()
	s.atomicUpdate = false
	walkc := make(chan *WalkResult)
	exit := make(chan (bool))
	finishedWalk := make(chan (bool))
	stop := false
	go func() {
		for {
			select {
			case <-finishedWalk:
                                exit <- true
				return
			case value := <-walkc:
				if stop = callback(value); stop {
                                        // break and loop to case <- finishedWalk
					break
				}
			}
		}
	}()
	err := s.walk(walkc, &stop, s.Root, nil, 0, s.TrieHeight)
	finishedWalk <- true
        <- exit. // wait for goroutine to return
	return err

@p4u
Copy link
Author

p4u commented Dec 10, 2020

Hey @paouvrard I have been out for some days but now I'm back. Thank you for your review, I am going to incorporate the changes.

@p4u
Copy link
Author

p4u commented Dec 10, 2020

I applied your suggestions and now go test -timeout 30s . -run=^TestTrieWalk$ -count=100 works fine, thank you!

I added another function named GetWithRoot in order to Get a value for a specific Root. And in addition I added the root as a parameter to Walk() to the caller can choose on which Root he wants to walk.

I keep everything on the same PR because in my local repository I did not split it, I hope that's not a problem.

Looking forward for your review :)

@p4u
Copy link
Author

p4u commented Dec 10, 2020

Damn it, go test -timeout 30s . -count=100 fails... I'll look into it.

@p4u
Copy link
Author

p4u commented Dec 11, 2020

Fixed! Now go test -timeout 120s . -count=100 works for me.

Now Walk() is data race safe and does not return until all callbacks have finished.

Signed-off-by: p4u <pau@dabax.net>
@p4u
Copy link
Author

p4u commented Dec 12, 2020

The last version solves concurrency problems and it is data race safe. Tested with go test -timeout 120s . -run ^TestTrieWalk$ -count=100 -race

@p4u
Copy link
Author

p4u commented Dec 12, 2020

If this PR gets finally merged, next step could be to add a String() function, Export() and Import(), which IMO would make the API more powerful. Something like:

func (t *trie) String() string {
	buf := bytes.Buffer{}
	t.Walk(nil, func(k, v []byte) int32 {
		buf.WriteString(fmt.Sprintf("%x => %x\n", k, v))
		return 0
	})
	return buf.String()
}

@p4u
Copy link
Author

p4u commented Mar 27, 2021

So I see there is probably no interest on having this new methods on the SMT. That's fine. I forked the code and applied some changes (including a new abstraction layer). If anyone is interested, the code is here: https://github.com/p4u/asmt

@kroggen
Copy link
Member

kroggen commented May 4, 2023

The aergoio/SMT repo appears to be better for this

The aergoio/state-tools is also related

@kroggen kroggen closed this May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants