consensus: disconnect from bad peers #789

greg-szabo · 2023-05-03T19:55:24Z

Note: this is the copy of an old (but valid) issue opened here: tendermint/tendermint#2871 It is accompanied by an unmerged PR where Ethan tried to fix most of these: tendermint/tendermint#9417

Security researchers keep finding areas of vulnerabilities here. We should dive into it and figure out if there's really anything, what needs to be updated and whatnot. Possibly more of an epic than a one-off issue. Might be a good candidate for a deep-flow week goal.

The consensus reactor only disconnects from peers in three cases:
* msg fails to decode

* msg fails basic validity checks

* bad VoteSetMaj23Message
While we log other kinds of errors in processing (invalid blocks, proposals, votes), we never disconnect from peers due to them, which open us up to being spammed with bad consensus messages.

We also don't enforce that our peers ever send us anything useful. While we mark them as good if they send us 10,000 unique block parts or votes, we never mark them bad if they just fail to send us anything. This could result in us being connected to lots of useless peers.

We need to address both of these things:
* peers sending us bad messages

* peers not sending us any good messages
Bad Messages

Let's enumerate all messages and see when they indicate a bad peer (beyond failing their ValidateBasic() check).

In general we could probably add some more rules around all of these to prevent eg. receiving the same information multiple times from the same peer, but we'd need to ensure on the sending side that it never happens too.

NewRoundStepMessage, NewValidBlockMessage, HasVoteMessage

These are just informational so we know what to send the peer.

HasVote may contain an index beyond the size of the validator set for the given height, which should be a sign of malice, but currently we just ignore it.

VoteSetMaj23Message

We already stop peers for errors here.

ProposalHeartbeatMessage

Comes with a signature, and could be checked, but not actually helpful and should probably be eliminated outright (#2626).

ProposalMessage

Handled in the receiveRoutine - calls setProposal.

Any error here should result in disconnect from peers (as noted in #2158 (comment))

ProposalPOLMessage

Don't think there's much we can do here

BlockPartMessage

Handled in the receiveRoutine - calls addProposalBlockPart.

Some errors here (eg. in AddPart) should cause us to disconnect from the peer, but others (eg. error in unmarshaling) are not the particular peers fault, but the original proposer, and we don't know which peer corresponds to the actual proposer, so there's nothing we can do here (ultimately, we'd like to publish evidence about this so the app can punish the proposer).

VoteMessage

Handled in the receiveRoutine - calls tryAddVote. This can result in many kinds of errors - some resulting in disconnecting, and others not, hence we should use sentinels (#1327)

tryAddVote calls addVote, which may return ErrVoteHeightMismatch, for which we shouldn't disconnect.

We then call cs.LastCommit.AddVote. Some errors here could come from honest peers, so we shouldn't disconnect:
* ErrVoteUnexpectedStep

* ErrVoteNonDeterministicSignature

* NewConflictingVoteError
The rest are from bad peers, and we should disconnect:
* ErrVoteInvalidValidatorIndex

* ErrVoteInvalidValidatorAddress

* vote.Verify errors (needs sentinel)
VoteSetBitsMessage

Don't think there's much we can do here

Useless Peers

A peer could just never send us any consensus messages and we wouldn't disconnect from it.

We need to enforce some cadence on our peers. While this is partially mitigated by preferentially connecting to at least some peers marked good via the PEX, we would probably benefit from having additional protections within the reactor, though we can leave that to a separate issue.

Summary

For now we should do the following, as described above:
* [ ]  Stop peers for bad proposals

* [ ]  Stop peers for bad block parts

* [ ]  Stop peers for bad votes

* [x]  Remove the heartbeat message
This subsumes #2158, #1327, #2626

The text was updated successfully, but these errors were encountered:

cason · 2023-05-05T09:04:47Z

So, one of the first things here is to create a new version of this PR tendermint/tendermint#9417 pointing to the main branch. It is huge PR now because the main branch of the Tendermint Core repository was updated (force-pushed).

The PR, in any case, was a work in progress.

greg-szabo added bug Something isn't working needs-triage This issue/PR has not yet been triaged by the team. labels May 3, 2023

cason added consensus enhancement New feature or request labels May 5, 2023

cason removed the needs-triage This issue/PR has not yet been triaged by the team. label May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consensus: disconnect from bad peers #789

consensus: disconnect from bad peers #789

greg-szabo commented May 3, 2023

Bad Messages

NewRoundStepMessage, NewValidBlockMessage, HasVoteMessage

VoteSetMaj23Message

ProposalHeartbeatMessage

ProposalMessage

ProposalPOLMessage

BlockPartMessage

VoteMessage

VoteSetBitsMessage

Useless Peers

Summary

cason commented May 5, 2023

consensus: disconnect from bad peers #789

consensus: disconnect from bad peers #789

Comments

greg-szabo commented May 3, 2023

Bad Messages

NewRoundStepMessage, NewValidBlockMessage, HasVoteMessage

VoteSetMaj23Message

ProposalHeartbeatMessage

ProposalMessage

ProposalPOLMessage

BlockPartMessage

VoteMessage

VoteSetBitsMessage

Useless Peers

Summary

cason commented May 5, 2023