Network sync fix 1 (real)#480
Merged
Merged
Conversation
if requests fail, we halve the requested blocks we don't drop peers quickly
Collaborator
Author
Collaborator
Author
this covers some other class of network error, like received too much data, and a few others, none of these malicious
Collaborator
Author
Collaborator
Author
|
I rechecked the current workspace, and the build blocker is gone now: Findings
If you want, I can keep digging for any other behavioral regressions in the sync path. |
czareko
approved these changes
Apr 7, 2026
czareko
added a commit
that referenced
this pull request
Apr 7, 2026
* pull in network sync package modify to make it disconnect less often * bugfix * reset network sync * Network sync fix 1 (real) (#480) * original backoff code from PR 190 added if requests fail, we halve the requested blocks we don't drop peers quickly * configure block request timeout * format * use single state tx pool * taplo format * apply peer drop limit to OutboundFailure::Io this covers some other class of network error, like received too much data, and a few others, none of these malicious * renamed RequestSignature to SyncRequestParams * format * feature gate external packages tests --------- Co-authored-by: Nikolaus Heger <nheger@gmail.com>
czareko
added a commit
that referenced
this pull request
Apr 8, 2026
* draft: Planck profile * feat: Runtime + boot nodes update * fix: json spec updated * feat: Chain-Spec: IP node for Planck * feat: Planck genesis refreshed * feat: Heisenberg refreshed * fix printout * rename rewards_preimage to rewards_inner_hash add nice error messages and checks for the inner hash format * fix: FMT * Add network sync crate (#482) * pull in network sync package modify to make it disconnect less often * bugfix * reset network sync * Network sync fix 1 (real) (#480) * original backoff code from PR 190 added if requests fail, we halve the requested blocks we don't drop peers quickly * configure block request timeout * format * use single state tx pool * taplo format * apply peer drop limit to OutboundFailure::Io this covers some other class of network error, like received too much data, and a few others, none of these malicious * renamed RequestSignature to SyncRequestParams * format * feature gate external packages tests --------- Co-authored-by: Nikolaus Heger <nheger@gmail.com> --------- Co-authored-by: Nikolaus Heger <nheger@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



OG network sync fix
All this helps during network congestion and/or slow networks.
Handling OutboundFailure::Io more generously
The OutboundFailure::Io(std::io::Error) is produced in exactly three scenarios from the libp2p handler:
codec.write_request() fails (line 193-194) -- error writing the request to the stream. This would be a genuine network I/O failure (broken pipe, connection reset, etc).
stream.close() fails (line 195) -- couldn't cleanly half-close the outbound stream. Similar genuine network issues.
codec.read_response() fails (line 196-197) -- error reading the response. This is where your error comes from. Looking at request_responses.rs, this read_response returns Err(io::Error) in these cases:
Response size exceeds limit (InvalidInput) -- your case, a misconfiguration
Varint decode error on the length prefix (InvalidInput) -- malformed/corrupt data from peer
io.read_exact() failure -- TCP-level read error (connection reset, broken pipe, etc)
4. Max sub-streams reached (line 210-213) -- local resource exhaustion, ErrorKind::Other.
rep::IOis-(1 << 10)= -1024. That's a mild penalty (same weight asTIMEOUT). The ban threshold is at71% of i32::MIN(~-1.5 billion). So the reputation hit itself is harmless -- it would take ~1.5 million consecutive IO errors to hit the ban threshold through reputation alone.The problem was never the
rep::IOpenalty. It was the unconditionaldisconnect_peercall which then fed intodisconnected_peers.rs, which applies a fatal (i32::MIN) ban after just 3 disconnects-during-request.So the gating fix is correct: it prevents the
disconnect_peercall during major sync, which prevents the cascade intodisconnected_peers's fatal ban. Therep::IOpenalty is still applied whenshould_drop_peeris true, and it's appropriately mild.To summarize:
Io(_)covers transient network errors and local resource limits -- none are signals of a malicious peer worthy of a ban. The gating is the right approach, same as we do for Timeout, DialFailure, ConnectionClosed, and Refused.