-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Mempool] Improve peer priority selection. #12537
Conversation
⏱️ 39h 28m total CI duration on this PR🚨 1 job on the last run was significantly faster/slower than expected
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #12537 +/- ##
===========================================
+ Coverage 64.1% 69.9% +5.7%
===========================================
Files 819 2284 +1465
Lines 182919 431953 +249034
===========================================
+ Hits 117397 302142 +184745
- Misses 65522 129811 +64289 ☔ View full report in Codecov by Sentry. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
dbf5335
to
86b450d
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
pub fn ready_for_update(&self, peers_changed: bool) -> bool { | ||
// If our peers have changed, or we haven't observed ping latencies | ||
// for all peers yet, we should update the prioritized peers again. | ||
if peers_changed || !self.observed_all_ping_latencies { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the reasoning behind updating this if we have not observed ping latency for all peers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, it's because ping latency information is only populated after a new peer connects (e.g., 30 seconds), but as soon as the peer connects, mempool gets notified and tries to prioritize the peer (without any latency information). We basically just keep updating our priorities (every few seconds) until all the ping latency information is populated for the current set of peers. This is enough to provide an optimal/best effort selection. After this point, we just move to the more stable schedule (e.g., every 10 mins).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, makes sense. It might be better to provide this context with comment in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM! Will add a comment 😄
.iter() | ||
.sorted_by(|peer_a, peer_b| { | ||
let ordering = &self.peer_comparator.compare(peer_a, peer_b); | ||
ordering.reverse() // Prioritize higher values (i.e., sorted by descending order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoshLind - May be I am missing something but this seems to be returning the list in ascending order of priority - because the peer_comparator
is returning the highest priority first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because sorted_by
sorts in ascending order, but we want high priority peers to be at the beginning of the list (i.e., descending order, as mempool selects peers from the front of the list). So, the easiest way to do that is reverse the ordering here 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We can also do this by flipping peer_a
and peer_b
, if that's clearer 🤔)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - this makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we gate the priority sorting based on ping times?
One case I am concerned about is when there are a lot of outstanding items in mempool, getting a new peer in the priority peers means all the outstanding items have to be broadcasted to the new peer (in FIFO order), incurring significant latency for incoming transactions as they wait in a queue to be broadcasted. Not sure how problematic this could be in practice though.
There's no foolproof solution I can think about to resolve this. Here's some ideas--
- Constrain the number of peers to be re-prioritized in each 10 min interval to half of the peers (1 in our case)?
- For a new peer, constrain the number of "old" txns we forward to the new peer.
// Update the prioritized peers | ||
let mut prioritized_peers = self.prioritized_peers.write(); | ||
if new_prioritized_peers != *prioritized_peers { | ||
info!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an explicit metric for how many prioritized peers were changed? This can help us debug in case there are issues with changing frequently.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
As discussed offline, I've updated the PR to do this. 😄 The "intelligent" sorting is enabled by default, but it can be disabled in the node config. If it is disabled, it means mempool will retain the same behaviour as before (i.e., only reprioritize on peer changes, and use a simpler prioritization algorithm). Note: I have modified the original behaviour slightly: instead of using the "peer role" in the simple prioritization algorithm, we no longer use it, and instead sort by network and then peer hash. This is because the peer roles are imprecise, so it makes sense to fallback to random selection (instead of prioritizing seeds), when the feature is disabled. Let me know what you think! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! One comment on the metrics, but approving to unblock
mempool/src/counters.rs
Outdated
@@ -429,6 +430,20 @@ pub fn shared_mempool_pending_broadcasts(peer: &PeerNetworkId) -> IntGauge { | |||
]) | |||
} | |||
|
|||
/// Counter tracking the number of peers that changed priority in shared mempool | |||
static SHARED_MEMPOOL_PRIORITY_CHANGE_COUNT: Lazy<IntGaugeVec> = Lazy::new(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like IntGauge
would be sufficient, instead of IntGaugeVec
?
But actually I'm wondering if just a Histogram
might work better? E.g., to count how many changes there were for the past 30 mins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, thanks @bchocho -- I missed IntGauge
🤦 Will change to that. Note: I'm not sure Histogram
makes sense because we update this metric really infrequently (e.g., every 10 minutes). It's probably easiest to just graph the raw gauge value instead of try to work with averages, rate changes, etc. But, we can always update this if we need more 😄
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
Note: most of this PR is new unit tests.
Description
This PR improves the peer selection logic for mempool (i.e. deciding which peers to forward transactions to):
Testing Plan
New and existing test infrastructure. I also ran several PFN-only tests to ensure that the average broadcast latencies are reduced.