Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce --rotate-external-peers flag to improve bootstrap client peering #3307

Merged
merged 10 commits into from
Oct 22, 2024

Conversation

vicsn
Copy link
Contributor

@vicsn vicsn commented Jun 10, 2024

Motivation

Currently, the bootstrap client connections fill up too quickly, making it cumbersome for new clients/provers to join the network. By letting bootstrap clients flush their peers more frequently, new clients have an opportunity to connect.

Naming nits are welcome, we can also go for:

  • --flush-external-peers
  • --evict-external-peers
  • --bootstrap-client

A complementary change would be to increase the number of bootstrap peers.

Why not just increase max peers with a --max-peers flag?

  1. It does not fully solve the underlying problem, a bootstrap node with more peers can fill up just as well
  2. It should be hard to mess up configuration for average joe. Having a larger number of peers has not been sufficiently tested and will likely lead to performance degredation, so if possible the option should not be available.
  3. Also note that unfortunately doing this cleanly requires a lot of code changes because the existing MAXIMUM_NUMBER_OF_PEERS variable is a compile-time variable and tighly coupled with other logic. Probably because @ljedrz mentions "~20 peers in a P2P setting being a desirable number is a popular notion".

Test Plan

After approval this should be deployed and tested.

Related PRs

Aims to tackle:

@zosorock zosorock requested review from a team, gluax and voximity June 10, 2024 13:53
gluax
gluax previously approved these changes Jun 10, 2024
Copy link

@gluax gluax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me, but I'd also like to propose a question of why not have a config file/cli commands with:

  • an allowlist
  • and denylist

This would not affect the maximum number of peers.
The allowlist could be greater than the max, but then arbitrary ones from the list are chosen.
This would not conflict with the initial peers list, unless one is explicitly deny listed(ow you can deny this behavior).
It's not an uncommon setting to give node operators control of in networks.
Allows for node operators to deny list those nodes that continuously spam your node as a malicious actor.

Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of peer rotation by bootstrappers, but we need to be cautious, as the heartbeat interval is currently only 25s, which may not provide enough time for the new peers to learn about others. I don't think we currently provide a list of peers as one of the initial post-handshake (OnConnect) actions, but we may want to do so now, at least for those that enable peer rotation.

@vicsn
Copy link
Contributor Author

vicsn commented Jun 13, 2024

why not have a config file/cli commands with: an allowlist and denylist

I can see that being useful, though out of scope for this PR.

I'm in favor of peer rotation by bootstrappers, but we need to be cautious, as the heartbeat interval is currently only 25s, which may not provide enough time for the new peers to learn about others. I don't think we currently provide a list of peers as one of the initial post-handshake (OnConnect) actions, but we may want to do so now, at least for those that enable peer rotation.

Indeed, that's an issue. Letting the bootstrap peer send a PeerResponse uninvited will not be accepted by the counterparty, but we can let the counterparty send a PeerRequest: efd5ef0

The 4 existing bootstrap peers may relay 4*MEDIAN_NUMBER_OF_PEERS - 4 = 36 other clients, of which some hopefully are able to welcome the new client. If that's experimentally shown not to be sufficient, more (light-weight) bootstrap peers can be added to the bootstrap peer list.

Additionally, the client could choose to rotate not every heartbeat, but once every 10 heartbeats using if rand(..10) < 1, or if ledger.height % 10 == 0. But I feel its overcomplicating. What do you think?

@vicsn vicsn marked this pull request as draft June 13, 2024 14:21
@ljedrz
Copy link
Collaborator

ljedrz commented Jun 13, 2024

Additionally, the client could choose to rotate not every heartbeat, but once every 10 heartbeats

I've considered this too; this certainly makes a lot of sense (assuming a 25s window IIRC) for peer shuffling, but this may also be useful for some other heartbeat activities that don't need to happen as often.

@vicsn
Copy link
Contributor Author

vicsn commented Jun 18, 2024

I've considered this too; this certainly makes a lot of sense (assuming a 25s window IIRC) for peer shuffling, but this may also be useful for some other heartbeat activities that don't need to happen as often.

2f7f799 - CI result

@vicsn vicsn marked this pull request as ready for review June 18, 2024 19:39
ljedrz
ljedrz previously approved these changes Jun 19, 2024
Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one nit

Co-authored-by: ljedrz <3750347+ljedrz@users.noreply.github.com>
@raychu86
Copy link
Contributor

raychu86 commented Jul 3, 2024

One consideration is that this may affect syncing in a non-trivial way. More frequent disconnect + connections will likely slow down syncing.

One possible approach is if we can prevent the client from rotating clients if it is currently syncing. Though, this does not protect from the other side where the the node sending blocks.

@vicsn
Copy link
Contributor Author

vicsn commented Jul 4, 2024

One consideration is that this may affect syncing in a non-trivial way. More frequent disconnect + connections will likely slow down syncing.
One possible approach is if we can prevent the client from rotating clients if it is currently syncing. Though, this does not protect from the other side where the the node sending blocks.

Good callout. Given that clients won't voluntarily disconnect from bootstrap peers, I don't think the "node sending blocks disconnecting" is a concern. But we should ensure bootstrap nodes (and even any client) does not disconnect from other clients they're syncing from. 7eac67f

@vicsn
Copy link
Contributor Author

vicsn commented Sep 25, 2024

@zosorock can you confirm if this has still been running succesfully on a bootstrap node? If so I think we can mark this ready to merge

@zosorock
Copy link
Contributor

zosorock commented Oct 3, 2024

@vicsn

Yes, it has been running on all 3 networks without any noticeable issues. I am of the opinion that one should not be syncing with a bootstrap node, especially given lots of people abuse them and connect dozens of times nonstop. I say rotate them often and make syncing with bootstrap nodes a wasteful endeavor.

@vicsn
Copy link
Contributor Author

vicsn commented Oct 3, 2024

Amazing. @zosorock noticed one more small improvement which should increase the frequency of disconnects: 9dccba4

Copy link
Contributor

@raychu86 raychu86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zosorock zosorock merged commit 38d0973 into AleoNet:staging Oct 22, 2024
21 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants