Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oneeman/cherry pick discovery bootnode fix #1194

Merged
merged 5 commits into from
Dec 16, 2020

Conversation

oneeman
Copy link
Contributor

@oneeman oneeman commented Oct 23, 2020

Description

This is a fix for ethereum/go-ethereum#20630. If a node initially fails to contract the bootnode for the purpose of discovery, there were two code paths which led to the bootnode being removed from the discovery table, leaving it empty and preventing further discovery. These are visible in the logs @tkporter included in that issue, where it says Removed dead node and where it says Too many findnode failures, dropping. There is an upstream PR, cherry-picked here, which fixes the second case by not dropping the node if the bucket it's in is less than half-full. For the second case, I added a comment on the go-ethereum issue about it to see whether a PR to fix it would be welcome and what logic it should use. In the meantime, I've included a commit in the PR to fix it, by not dropping the node if it's the only one in the bucket.

Note that there is also a task called doRefresh() which runs every 30 minutes and re-adds the bootnodes to the discovery table, so the inability to discover other nodes would not have been permanent

Other changes

I've also cherry-picked two prior commits from upstream to ease the merging by avoiding out-of-order commits on the same areas of the code.

Tested

  • Unit tests pass
  • Monorepo e2e sync test passes
  • Running a mainnet full node, it succeeds in discovering peers, connecting to them, and starting to sync
  • Running two mainnet full nodes, A and B, where B is given A as its bootnode, and B is started a minute before A is started (i.e. the setup @tkporter reported as not working). Until A is started, B does not discover any nodes. Once A is started, B immediately discovers multiple nodes, connects to them, and starts syncing.

Related issues

Backwards compatibility

The changes here are fixes and improvements to the discover package and do not include or assume protocol changes, so there shouldn't be any question of backwards incompatibility. Also, as mentioned in the testing section, this branch successfully discovers and syncs from mainnet peers using the mainnet bootnode.

fjl and others added 4 commits October 23, 2020 11:48
celo-blockchain cherry-pick conflicts:
  cmd/devp2p/discv4cmd.go
  p2p/discover/common.go
  p2p/discover/v4_udp.go

This adds an implementation of the current discovery v5 spec.

There is full integration with cmd/devp2p and enode.Iterator in this
version. In theory we could enable the new protocol as a replacement of
discovery v4 at any time. In practice, there will likely be a few more
changes to the spec and implementation before this can happen.
celo-blockchain cherry-pick conflicts:
  p2p/discover/v4_udp.go
  p2p/discover/v4_udp_test.go

This moves all v4 protocol definitions to a new package, p2p/discover/v4wire.
The new package will be used for low-level protocol tests.
…pty (#21396)

This change improves discovery behavior in small networks. Very small
networks would often fail to bootstrap because all member nodes were
dropping table content due to findnode failure. The check is now changed
to avoid dropping nodes on findnode failure when their bucket is almost
empty. It also relaxes the liveness check requirement for FINDNODE/v4
response nodes, returning unverified nodes as results when there aren't
any verified nodes yet.

The "findnode failed" log now reports whether the node was dropped
instead of the number of results. The value of the "results" was
always zero by definition.

Co-authored-by: Felix Lange <fjl@twurst.com>
If a node initially has trouble reaching the bootnode, the bootnode was being removed in doRevalidate(), leaving the table empty.
@oneeman
Copy link
Contributor Author

oneeman commented Oct 23, 2020

Seeing some lint failures: one about a file having to end with a newline, but others about unused private functions and a field in the code from upstream. Added the "do not merge" tag for now

@kevjue
Copy link
Contributor

kevjue commented Oct 23, 2020

What's the cherry-picked commit from go-ethereum?

@oneeman
Copy link
Contributor Author

oneeman commented Oct 24, 2020

The one with the fix is ethereum/go-ethereum#21396 . The other two I included because they were previous changes on the same code: the first was ethereum/go-ethereum#20750 and the second was ethereum/go-ethereum#21147 . You can see them above, with different hashes but the commit messages will be the same.

@oneeman
Copy link
Contributor Author

oneeman commented Oct 26, 2020

Took care of the lint warnings and removed the "do not merge" label.

@oneeman oneeman mentioned this pull request Nov 4, 2020
@oneeman
Copy link
Contributor Author

oneeman commented Nov 4, 2020

Noticed an issue that sounds like it's due to the bugs fixed in this PR: #939. So I updated the PR description to say that it fixes that issue as well.

Copy link
Contributor

@mcortesi mcortesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say i did a deep review. I don't have a deep knowledge of the discovery refactory, and i need to take a look at it first

Still i think this is good to merge.

My reasoning is:

  • cherry picked commits are most probably valid. They come from already production code that has been running for a while on ethereum. Also, we haven't changed discovery, it's not likely they could break any of our code
  • I've reviewed our commit on top, and seems reasonable, and quite a small case concerning a corner case.

So i'm happy with merging this.

@oneeman oneeman merged commit 35908b1 into master Dec 16, 2020
@oneeman oneeman deleted the oneeman/cherry-pick-discovery-bootnode-fix branch December 16, 2020 14:36
oneeman pushed a commit that referenced this pull request Jan 5, 2021
The conflicts arose from #1194 having been on master, which includes an unpstream PR from 1.9.13 as well as an upstream PR from later on that was cherry-picked.  Resolving them required checking the changes' context, but was then straightforward.

Conflicts:
	cmd/devp2p/discv4cmd.go
	cmd/devp2p/discv5cmd.go
	p2p/discover/lookup.go
	p2p/discover/v4_udp.go
@oneeman oneeman mentioned this pull request Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

p2p/discover: bootnodes not retried often enough when table becomes empty Slow discovery
5 participants