You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To improve reliability and catch potentially stale connections early (as opposed to when we actually want to send data to them), add a periodic ping/pong message being sent to peers.
When failing to respond to a ping in a timely manner, terminate the connection (but do not block the peer).
The text was updated successfully, but these errors were encountered:
3540: Network watchdog r=marc-casperlabs a=marc-casperlabs
Closes#3530.
This PR adds a network watchdog in the form of a ping/pong functionality:
* Nodes will periodically send a `Ping` down every outgoing connection.
* Any node receiving a `Ping` will respond with a `Pong` .
* These pings/pongs contain nonces to prevent false positives on retries or allowing for spamming pongs (after a certain amount of invalid pongs, the peer is banned).
* If a ping times out, it is retried a few times.
* Once a certain amount of ping timeouts is hit, the connection is terminated (but the peer is *not* banned).
The core motivation for adding this to 1.5 is to prevent unlikely but possible connection stalls due to deadlock while interdependent nodes fetch backpressured tries from each other. As a side benefit, really slow connections or stalled are also terminated.
Test coverage for the functionality is extensive for the actual logic (see `health.rs`), which attempts to cover every possible edge case. Proper integration in layers up is also tested, but a certain amount of testing remains manual, as there is currently not a good way to easily write a tests that puts the nodes into the deadlocked state.
As a nice side benefit, the node can now be queries for round-trip times to other nodes through `net-info` on the diagnostics port.
Security aspects:
* Ping floods are prevented through rate limiting.
* Pong floods are prevented through rate limiting; also nodes ban peers that send too many unasked pings.
* A 2:1 cost ratio of ping:pong prevents blowing up a peers memory through pings.
Co-authored-by: Marc Brinkmann <marc@casperlabs.io>
To improve reliability and catch potentially stale connections early (as opposed to when we actually want to send data to them), add a periodic ping/pong message being sent to peers.
When failing to respond to a ping in a timely manner, terminate the connection (but do not block the peer).
The text was updated successfully, but these errors were encountered: