Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix disconnecting unelected validators in ReplaceValidatorPeers() #1191

Merged
merged 1 commit into from
Oct 20, 2020

Conversation

oneeman
Copy link
Contributor

@oneeman oneeman commented Oct 19, 2020

Description

This is a partial fix for #948. That issue mentions that we aren't removing validators if we're not connected to them at that moment. But there was another bug which caused ReplaceValidatorPeers() to not disconnect validators at all. This PR fixes that bug.

Notes:

  1. This bug only affected ReplaceValidatorPeers(), whereas the problem of not removing validators we're not connected to right then affects both ReplaceValidatorPeers() and ClearValidatorPeers().
  2. A full fix for When removing ValidatorPurpose peers, all remote nodes with that purpose should be considered, not just those currently peered with #948 will require some new code in p2p/server.go an p2p/dial.go, which we probably don't have time to do before the upcoming release (and which would be best done after cherry-picking p2p: new dial scheduler ethereum/go-ethereum#20592 from upstream). That's why I'm opening this PR to at least fix this much right now.
  3. In the absence of a full fix, it can still happen that an unelected validator disconnects from us first and so we don't remove them from our static and trusted list. In such a case, we will try to reconnect to them. If we succeed, then the next time RefreshValPeers() runs (which is every 5 minutes), we will remove them properly and all will be well. On the other hand, if they don't let us connect (say they're at their max peers limit already), then we will keep trying to connect to them unsuccessfully. To avoid this, we need a full fix for When removing ValidatorPurpose peers, all remote nodes with that purpose should be considered, not just those currently peered with #948.

Tested

Units tests pass, e2e sync test pases. As far as I know, we don't have tests that specifically check dropping unelected validators. That said, the code here is clear enough to have confidence in the change, I think.

Related issues

This doesn't directly fix #948, but it does fix a bug with similar effects.

Backwards compatibility

This will only cause validators to disconnect from unelected validators, and so poses no problems as far as backwards compatibility.

Copy link
Contributor

@tkporter tkporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants