-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Network] Add support for latency aware peer dialing #11814
Conversation
.into_iter() | ||
.filter(|(_, peer)| peer.ping_latency_ms.is_none()) | ||
.collect::<Vec<_>>(); | ||
let num_peers_to_ping = peers_to_ping.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should often be zero, if so it would be nice to return
here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM -- added it 😄
// Identify the eligible peers that don't already have latency information | ||
let peers_to_ping = eligible_peers | ||
.into_iter() | ||
.filter(|(_, peer)| peer.ping_latency_ms.is_none()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next round: give ping_lattency a TTL after which we should ping again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if we want to make this more intelligent, we can definitely do that, e.g., give each ping measurement a TTL and refresh it periodically. We could then combine that with some type of support for disconnects, such that if you discover better peers to dial, you could "swap" a couple of your current connections with the more optimal peers (assuming you're already at your max outbound connection limit).
But, I figured that'd be best to leave to a future PR/effort (if we see the need). At least now, a simple node reboot should refresh the peer ping latencies and connections.
self.enable_latency_aware_dialing, | ||
) { | ||
// Ping the eligible peers (so that we can fetch missing ping latency information) | ||
self.ping_eligible_peers(eligible_peers.clone()).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little sad that no connections will be tried until ping is done; this will mostly slow down a newly started node connecting by a few seconds? It looks like in the steady state things already have a ping time so there's no waiting for pings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the pinging does unfortunately block the first dial (and any subsequent dials if there are peers we failed to ping last round). But, thankfully, the delay isn't super noticeable. For example, this dial loop is only invoked every 5 or 10 seconds anyway (depending on network/node). So, when you add the additional ping time on (a couple seconds), it's not very noticeable.
If we do notice it becoming a problem, we could try to make things async (or heavily reduce the timeout), but it should hopefully be reasonable enough to start with 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
8dbe271
to
bab1632
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
688e53a
to
52f9b21
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
52f9b21
to
1563256
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
Description
This PR updates the networking stack to add support for latency aware peer dialing. Instead of selecting peers to dial randomly, we now select peers based on peer ping latencies (i.e., the lower the peer latency, the higher the probability of dialing the peer). Experiments show this reduces end-to-end transaction latencies by several
100ms
(for high-latency PFNs).The PR offers several commits:
f64
for peer ping latencies (to improve metrics).A couple notes about the PR:
Test Plan
New and existing test infrastructure (as well as manual verification).