Fix disconnecting unelected validators in ReplaceValidatorPeers() #1191
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This is a partial fix for #948. That issue mentions that we aren't removing validators if we're not connected to them at that moment. But there was another bug which caused
ReplaceValidatorPeers()
to not disconnect validators at all. This PR fixes that bug.Notes:
ReplaceValidatorPeers()
, whereas the problem of not removing validators we're not connected to right then affects bothReplaceValidatorPeers()
andClearValidatorPeers()
.p2p/server.go
anp2p/dial.go
, which we probably don't have time to do before the upcoming release (and which would be best done after cherry-picking p2p: new dial scheduler ethereum/go-ethereum#20592 from upstream). That's why I'm opening this PR to at least fix this much right now.RefreshValPeers()
runs (which is every 5 minutes), we will remove them properly and all will be well. On the other hand, if they don't let us connect (say they're at their max peers limit already), then we will keep trying to connect to them unsuccessfully. To avoid this, we need a full fix for When removing ValidatorPurpose peers, all remote nodes with that purpose should be considered, not just those currently peered with #948.Tested
Units tests pass, e2e sync test pases. As far as I know, we don't have tests that specifically check dropping unelected validators. That said, the code here is clear enough to have confidence in the change, I think.
Related issues
This doesn't directly fix #948, but it does fix a bug with similar effects.
Backwards compatibility
This will only cause validators to disconnect from unelected validators, and so poses no problems as far as backwards compatibility.