Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: make block download logic aware of limited peers threshold #28120

Merged
merged 3 commits into from Mar 11, 2024

Conversation

furszy
Copy link
Member

@furszy furszy commented Jul 21, 2023

Even when the node believes it has IBD completed, need to avoid
requesting historical blocks from network-limited peers.
Otherwise, the limited peer will disconnect right away.

The simplest scenario could be a node that gets synced, drops
connections, and stays inactive for a while. Then, once it re-connects
(IBD stays completed), the node tries to fetch all the missing blocks
from any peer, getting disconnected by the limited ones.

Note:
Can verify the behavior by cherry-picking the test commit alone on
master. It will fail there.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 21, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK vasild, mzumsande, pinheadmz, achow101
Stale ACK andrewtoth

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

@DrahtBot DrahtBot added the P2P label Jul 21, 2023
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe getting disconnected could be beneficial in this rare situation? If our tip is that much behind but we're not in IBD, we can't help them and they can't help us, so our priority should be to get the missing blocks asap, and getting disconnected by useless peers seems like it's not such a bad thing towards that goal.

What would be bad is if all our current peers were limited, and we wouldn't try to request the needed blocks from anyone, but also wouldn't try to exchange them for better peers, so that we would be stuck.

Copy link
Member Author

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe getting disconnected could be beneficial in this rare situation? If our tip is that much behind but we're not in IBD, we can't help them and they can't help us, so our priority should be to get the missing blocks asap, and getting disconnected by useless peers seems like it's not such a bad thing towards that goal.

Agree but, while the outcome might be the desired one, the approach to achieve it isn't the best. It seems unwise to have the node sending a message through the wire, knowing it will fail, just to force the other peer to disconnect.
The node should be smarter than that and not rely on external factors to take decisions.

Moreover, in cases where the other peer chooses to ignore the message without disconnecting (maybe due to using different software than bitcoin-core), the node will end up waiting for a response that will never arrive, wasting time and resources, which could have been easily avoided.

And also, while the node has more connection slots available, I don't see why should disconnect a peer that is behaving properly.

Instead, I think that we should try to keep processes separated. The eviction process should be in charge of disconnecting peers that aren't/will not provide any meaningful data. And the block downloading logic should only be responsible of deciding which blocks request to each peer and which ones not.

@naumenkogs
Copy link
Member

What would be bad is if all our current peers were limited, and we wouldn't try to request the needed blocks from anyone, but also wouldn't try to exchange them for better peers, so that we would be stuck.

I agree with this part. I think the solution should be expanded to give a chance in such cases, and it should be merged within the same PR.

@luke-jr
Copy link
Member

luke-jr commented Jul 25, 2023

Agree with @mzumsande and @naumenkogs. Maybe not the same PR if it's substantially different, but the new eviction logic should be merged before this fix.

Copy link
Member Author

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be bad is if all our current peers were limited, and we wouldn't try to request the needed blocks from anyone, but also wouldn't try to exchange them for better peers, so that we would be stuck.

Right now, this will hardly happen. The reason is CheckForStaleTipAndEvictPeers. The logic is as follows:

When the node doesn't update the tip for a period longer than 30 minutes (TipMayBeStale()), it triggers the extra outbound connection process. Which instructs the net layer to bypass the maximum outbound connection restriction and create another full relay outbound connection to sync up the missing blocks.

Then, considering the worst-case scenario: when the extra outbound connection is also a limited peer. In that case, 45 seconds after connecting to the extra peer (scheduler task timeout), CheckForStaleTipAndEvictPeers will call EvictExtraOutboundPeers, which will evict the peer that least recently announced a block header.
Then, in case of no chain movement; the process starts again, adding an extra outbound connection, and so on, until it finds a peer that can provide the missing blocks.

Note:
While this sooner or later corrects the stalling situation, it's not ideal to continue establishing connections to limited peers, on the extra outbound connection process, when the peer is stalled. So, to address this issue, have created #28170. Which basically disallows connections to limited peers when the node is behind enough to know that limited peers will not provide any of the required historical blocks (although they will still provide valid headers).

Copy link
Member Author

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased. Conflicts solved.

src/net_processing.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK c1612ea

src/net_processing.cpp Show resolved Hide resolved
src/net_processing.cpp Outdated Show resolved Hide resolved
src/net_processing.cpp Outdated Show resolved Hide resolved
test/functional/p2p_node_network_limited.py Outdated Show resolved Hide resolved
test/functional/p2p_node_network_limited.py Show resolved Hide resolved
test/functional/p2p_node_network_limited.py Outdated Show resolved Hide resolved
@furszy furszy force-pushed the 2023_net_limited_peers branch 2 times, most recently from 9eba981 to 534996a Compare November 2, 2023 22:33
Copy link
Member Author

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated per feedback, thanks for the review!

Fixed the two blocks buffer extension (now is reduction). Small diff.

Copy link
Member Author

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated per feedback, thanks for the review @vasild!

The node will now request blocks within the limited peers threshold when the block download window contemplates them and not only when the previous blocks arrived or are in-flight. Added test coverage exercising this behavior and verifying that the node can get synced even when the blocks arrived out of order.

@bitcoin bitcoin deleted a comment from sicelo7 Feb 6, 2024
@DrahtBot DrahtBot requested review from andrewtoth and removed request for andrewtoth February 15, 2024 17:37
@@ -1451,6 +1451,7 @@ void PeerManagerImpl::FindNextBlocks(std::vector<const CBlockIndex*>& vBlocks, c
{
std::vector<const CBlockIndex*> vToFetch;
int nMaxHeight = std::min<int>(state->pindexBestKnownBlock->nHeight, nWindowEnd + 1);
bool is_limited_peer = IsLimitedPeer(peer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const bool is_limited_peer{IsLimitedPeer};

@@ -1475,6 +1476,11 @@ void PeerManagerImpl::FindNextBlocks(std::vector<const CBlockIndex*>& vBlocks, c
return;
}

// Don't request blocks that go further than what limited peers can provide
if (is_limited_peer && (state->pindexBestKnownBlock->nHeight - pindex->nHeight >= static_cast<int>(NODE_NETWORK_LIMITED_MIN_BLOCKS) - 2 /* two blocks buffer for possible races */)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what @naumenkogs is referring to here is that if this condition is true then pindex will hit this condition on the first pindex in the loop and every pindex up until this condition is no longer true. This can be more efficient by just bumping pindex to the first pindex that is past the network limited blocks threshold.

So, we can either change the loop from range based to regular for-loop and advance the index, or move this check out of this loop and into the loop on line 1464 and break early if the condition is met.

@DrahtBot DrahtBot requested review from andrewtoth and removed request for andrewtoth February 21, 2024 23:40
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK c5b5843

@DrahtBot DrahtBot requested review from andrewtoth and removed request for andrewtoth February 26, 2024 23:12
Copy link
Member

@pinheadmz pinheadmz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c5b5843

Built and ran all functional and unit tests on x86 Ubuntu. Confirmed test fails on master, passes with the patch in the branch. Reviewed code and existing discussion. Regarding the decision to keep the pruning peer connected at all, I agree with the author that the affected code is only responsible for requesting available blocks from a peer and it is reasonable to take the limited flag into account. I like the clean up in the first commit as well, it's easier to read the loop with continue.

For sanity I added checks in the test to confirm blocks from the pruned node were "found" by the full node but had 'confirmations': -1, which makes sense because there is a gap between the full node's chain tip and the blocks available from the pruning peer.

Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK c5b5843d8f10d96f76ee6b95f2b1b1b4ce755f75
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmXs7UsACgkQ5+KYS2KJ
yTpnUg//b9HSUh/s1VJRWSoX23pcmFJtmeA4ndd5dGM+Q2RbyBZenI4ZwVlkunxy
l5B+U/gpD7cL/6iZsV6Eph/ouX9Om8NKQcxMn0TXzvOBYcByCZB++r5oNwfeKohu
aUJWRce9MYDj4GtdbW7p5DwcsKaX6nC8oOnxfv9ivP473F7BDjnYZpX7F0nKsrx7
b5vKMWCY7fXxmHpe0y0FnWVAJ10hvIQFD6I0EMO1mQpq14U3NUcG+Axfm6kCrJRE
EbeAKtxz/tITNNiN5AZg2NdgnmOR1MaOatkOuCxWgMer+N2EOWZgeW+/42Iy0Isz
hOnv9BQi4MwewY/I5ruM7Yx2oghUEpIkIhBX8/JPeA9iD2pnVIcXi6Xkaawd31C1
Xl7OMpX1mOVtPjNPvmYxpbFjze6BhGpaa3h7UJFSBzqhFm0gpe70f+hza4HbXUQj
MihsnKASJPmCDLFY/kGddad/E+yJ5QNxSHhLqEugkWPGPnIa4CHyd91uU5qQwwOv
H1sN9hAlNYguDZs79jPY32yIXgpufQyUWQA1+emQpDQun68jHjTxTdgUFEbvlIDU
Nl+fxsY9Fqj4D3HpdnnwIUcX1W68itMhPPBQUDu3R1abdUHMc/ntE4gwr6BP3LcL
AFv+d4aflQrEG9RntgmkyUSnYw6VIj9INlQMS6ql4r70Kh/6Bhk=
=C8tL
-----END PGP SIGNATURE-----

pinheadmz's public key is on keybase

@DrahtBot DrahtBot requested review from andrewtoth and removed request for andrewtoth March 9, 2024 23:23
@achow101
Copy link
Member

ACK c5b5843

@DrahtBot DrahtBot requested review from andrewtoth and removed request for andrewtoth March 11, 2024 12:09
@achow101 achow101 merged commit 4a90374 into bitcoin:master Mar 11, 2024
16 checks passed
@furszy furszy deleted the 2023_net_limited_peers branch March 11, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants