Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: When close to the tip, download blocks in parallel from additional peers to prevent stalling #29664

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mzumsande
Copy link
Contributor

@mzumsande mzumsande commented Mar 15, 2024

Problem:
For stalling at the tip, we have a parallel download mechanism for compact blocks that was added in #27626.
For stalling during IBD, we have a lookahead window of 1024 blocks, and if that is exceeded, we disconnect the stalling peer.
However, if we are close to but not at the tip (<=1024 blocks), neither of these mechanisms apply. We can't do compact blocks yet, and the stalling mechanism doesn't work because the 1024 window cannot be exceeded.

As a result, we have to resort to BLOCK_DOWNLOAD_TIMEOUT_BASE which only disconnects a peer after 10 minutes (plus 5 minutes more for each additional peers we currently have blocks in flight). This is too long in my opinion, especially since peers get assigned up to 16 blocks (MAX_BLOCKS_IN_TRANSIT_PER_PEER) and could repeat this process to stall us even longer if they send us a block after 10 minutes.
This issue was observed in #29281 and #12291 (comment) with broken peers that didn't send us blocks.

Proposed solution:
If we are 1024 or less blocks away from the tip and haven't requested or received a block from any peer for 30 seconds, add another peer to download the critical block from that would help us advance our tip. Add up to two additional peers this way.

Other thoughts

  • I also considered the alternative of extending the existing stalling mechanism that disconnects instead of introducing parallel downloads. This could be potentially less wasteful, but we might be over-eager to disconnect peers when really close to the tip, plus this might lead to cycling through lots of peers in extreme situations where we have a very slow internet connection.
  • The chosen timeout of 30 seconds could lead to inefficiencies / bandwidth waste when we have a really slow internet connection. Maybe it could make sense to track the last successful download times from existing peers and use a dynamic timeout according to that statistics, instead of setting it to a fixed value.
  • I will leave this PR in draft until I have tested it a bit more in the wild.

Fixes #29281

This is in preparation to add more subtests.
It's a pure refactor of the existing test.
And remove the parts that covered this logic from ibd_stalling.
@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 15, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

If we are 1024 or less blocks away from the tip and haven't requested or received
a block from any peer for 30 seconds, add another peer to download the critical
block from. Add up to two additional peers this way.
@mzumsande mzumsande changed the title p2p: When close to the tip, download blocks in pararallel from additional peers to prevent stalling p2p: When close to the tip, download blocks in parallel from additional peers to prevent stalling Mar 16, 2024
@nanlour
Copy link
Contributor

nanlour commented Mar 17, 2024

Maybe it could make sense to track the last successful download times from existing peers and use a dynamic timeout according to that statistics, instead of setting it to a fixed value.

Is it possible for other peers to make me believe I have a slow internet connection?

@mzumsande
Copy link
Contributor Author

Is it possible for other peers to make me believe I have a slow internet connection?

Sorry, I missed this question. For single peers yes, but the idea is that the data would be taken from multiple outbound peers - it is hard for an attacker to control multiple of these, see eclipse attacks.

Will soon come back to this PR and take it out of draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Post startup stalling
3 participants