Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mempool] Improve peer priority selection. #12537

Merged
merged 1 commit into from
Mar 26, 2024
Merged

[Mempool] Improve peer priority selection. #12537

merged 1 commit into from
Mar 26, 2024

Conversation

JoshLind
Copy link
Contributor

@JoshLind JoshLind commented Mar 14, 2024

Note: most of this PR is new unit tests.

Description

This PR improves the peer selection logic for mempool (i.e. deciding which peers to forward transactions to):

  • Currently, the logic just selects peers based on network types and roles, but this is somewhat inefficient (especially when nodes define seeds that override more peformant peers).
  • To avoid this, the PR updates the selection logic to prioritize peers based on: (i) network type; (ii) distance from the validators (e.g., to avoid misconfigured and/or disconnected VFNs); and (iii) peer ping latencies (i.e., to favour closer peers).
  • To avoid excessively reprioritizing peers (which can be detrimental to mempool under load), we only update the peer priorities when: (i) our peers change; (ii) we're still waiting for the peer monitoring service to populate the peer ping latencies; or (iii) if neither (i) or (ii) is true, we update every ~10 minutes (configurable). This avoids overly reprioritizing peers at steady state.

Testing Plan

New and existing test infrastructure. I also ran several PFN-only tests to ensure that the average broadcast latencies are reduced.

Copy link

trunk-io bot commented Mar 14, 2024

⏱️ 39h 28m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-unit-tests 7h 53m 🟩🟩🟩🟩🟩 (+14 more)
windows-build 5h 51m 🟩🟩🟩🟩🟩 (+17 more)
forge-e2e-test / forge 5h 42m 🟩🟩🟩🟩🟩 (+14 more)
rust-images / rust-all 5h 2m 🟩🟩🟩🟩🟩 (+13 more)
rust-unit-coverage 4h 21m 🟩
rust-smoke-coverage 3h 50m 🟩
rust-lints 2h 25m 🟩🟩🟩🟩🟩 (+14 more)
check 1h 20m 🟩🟩🟩🟩 (+18 more)
run-tests-main-branch 1h 8m 🟥🟥🟥🟥🟥 (+13 more)
general-lints 37m 🟩🟩🟩🟩🟩 (+14 more)
check-dynamic-deps 35m 🟩🟩🟩🟩🟩 (+16 more)
determine-test-metadata 19m 🟩🟩🟩🟩🟩 (+10 more)
semgrep/ci 9m 🟩🟩🟩🟩🟩 (+16 more)
file_change_determinator 3m 🟩🟩🟩🟩🟩 (+13 more)
file_change_determinator 3m 🟩🟩🟩🟩🟩 (+14 more)
file_change_determinator 3m 🟩🟩🟩🟩🟩 (+16 more)
permission-check 1m 🟩🟩🟩🟩🟩 (+16 more)
permission-check 1m 🟩🟩🟩🟩🟩 (+16 more)
permission-check 1m 🟩🟩🟩🟩🟩 (+16 more)
determine-docker-build-metadata 57s 🟩🟩🟩🟩🟩 (+13 more)
permission-check 51s 🟩🟩🟩🟩🟩 (+16 more)
permission-check 50s 🟩🟩🟩🟩🟩 (+15 more)
upload-to-codecov 12s 🟩

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
rust-images / rust-all 19m 15m +21%

settingsfeedbackdocs ⋅ learn more about trunk.io

@JoshLind JoshLind added the CICD:run-forge-e2e-perf Run the e2e perf forge only label Mar 14, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link

codecov bot commented Mar 15, 2024

Codecov Report

Attention: Patch coverage is 79.58115% with 78 lines in your changes are missing coverage. Please review.

Project coverage is 69.9%. Comparing base (b169adf) to head (31014fe).
Report is 1 commits behind head on main.

❗ Current head 31014fe differs from pull request most recent head f47df24. Consider uploading reports for the commit f47df24 to get more accurate results

Files Patch % Lines
mempool/src/shared_mempool/network.rs 79.5% 78 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main   #12537       +/-   ##
===========================================
+ Coverage    64.1%    69.9%     +5.7%     
===========================================
  Files         819     2284     +1465     
  Lines      182919   431953   +249034     
===========================================
+ Hits       117397   302142   +184745     
- Misses      65522   129811    +64289     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@JoshLind JoshLind force-pushed the mempool_opt branch 3 times, most recently from dbf5335 to 86b450d Compare March 15, 2024 20:33

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

pub fn ready_for_update(&self, peers_changed: bool) -> bool {
// If our peers have changed, or we haven't observed ping latencies
// for all peers yet, we should update the prioritized peers again.
if peers_changed || !self.observed_all_ping_latencies {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the reasoning behind updating this if we have not observed ping latency for all peers?

Copy link
Contributor Author

@JoshLind JoshLind Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, it's because ping latency information is only populated after a new peer connects (e.g., 30 seconds), but as soon as the peer connects, mempool gets notified and tries to prioritize the peer (without any latency information). We basically just keep updating our priorities (every few seconds) until all the ping latency information is populated for the current set of peers. This is enough to provide an optimal/best effort selection. After this point, we just move to the more stable schedule (e.g., every 10 mins).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense. It might be better to provide this context with comment in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM! Will add a comment 😄

.iter()
.sorted_by(|peer_a, peer_b| {
let ordering = &self.peer_comparator.compare(peer_a, peer_b);
ordering.reverse() // Prioritize higher values (i.e., sorted by descending order)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshLind - May be I am missing something but this seems to be returning the list in ascending order of priority - because the peer_comparator is returning the highest priority first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because sorted_by sorts in ascending order, but we want high priority peers to be at the beginning of the list (i.e., descending order, as mempool selects peers from the front of the list). So, the easiest way to do that is reverse the ordering here 😄

Copy link
Contributor Author

@JoshLind JoshLind Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We can also do this by flipping peer_a and peer_b, if that's clearer 🤔)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this makes sense to me.

Copy link
Contributor

@sitalkedia sitalkedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@bchocho bchocho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we gate the priority sorting based on ping times?

One case I am concerned about is when there are a lot of outstanding items in mempool, getting a new peer in the priority peers means all the outstanding items have to be broadcasted to the new peer (in FIFO order), incurring significant latency for incoming transactions as they wait in a queue to be broadcasted. Not sure how problematic this could be in practice though.

There's no foolproof solution I can think about to resolve this. Here's some ideas--

  • Constrain the number of peers to be re-prioritized in each 10 min interval to half of the peers (1 in our case)?
  • For a new peer, constrain the number of "old" txns we forward to the new peer.

// Update the prioritized peers
let mut prioritized_peers = self.prioritized_peers.write();
if new_prioritized_peers != *prioritized_peers {
info!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an explicit metric for how many prioritized peers were changed? This can help us debug in case there are issues with changing frequently.

This comment has been minimized.

This comment has been minimized.

@JoshLind
Copy link
Contributor Author

JoshLind commented Mar 25, 2024

Can we gate the priority sorting based on ping times?

As discussed offline, I've updated the PR to do this. 😄 The "intelligent" sorting is enabled by default, but it can be disabled in the node config. If it is disabled, it means mempool will retain the same behaviour as before (i.e., only reprioritize on peer changes, and use a simpler prioritization algorithm).

Note: I have modified the original behaviour slightly: instead of using the "peer role" in the simple prioritization algorithm, we no longer use it, and instead sort by network and then peer hash. This is because the peer roles are imprecise, so it makes sense to fallback to random selection (instead of prioritizing seeds), when the feature is disabled.

Let me know what you think!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

@bchocho bchocho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One comment on the metrics, but approving to unblock

@@ -429,6 +430,20 @@ pub fn shared_mempool_pending_broadcasts(peer: &PeerNetworkId) -> IntGauge {
])
}

/// Counter tracking the number of peers that changed priority in shared mempool
static SHARED_MEMPOOL_PRIORITY_CHANGE_COUNT: Lazy<IntGaugeVec> = Lazy::new(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like IntGauge would be sufficient, instead of IntGaugeVec?

But actually I'm wondering if just a Histogram might work better? E.g., to count how many changes there were for the past 30 mins.

Copy link
Contributor Author

@JoshLind JoshLind Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, thanks @bchocho -- I missed IntGauge 🤦 Will change to that. Note: I'm not sure Histogram makes sense because we update this metric really infrequently (e.g., every 10 minutes). It's probably easiest to just graph the raw gauge value instead of try to work with averages, rate changes, etc. But, we can always update this if we need more 😄

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on f47df2487f02d4758fefb1d1a43f938d5af2d81d

two traffics test: inner traffic : committed: 8604 txn/s, latency: 4564 ms, (p50: 4400 ms, p90: 5300 ms, p99: 9900 ms), latency samples: 3708680
two traffics test : committed: 100 txn/s, latency: 1909 ms, (p50: 1800 ms, p90: 2100 ms, p99: 6200 ms), latency samples: 1780
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.211, avg: 0.202", "QsPosToProposal: max: 0.224, avg: 0.206", "ConsensusProposalToOrdered: max: 0.434, avg: 0.420", "ConsensusOrderedToCommit: max: 0.366, avg: 0.353", "ConsensusProposalToCommit: max: 0.787, avg: 0.774"]
Max round gap was 1 [limit 4] at version 1813521. Max no progress secs was 4.733936 [limit 15] at version 1813521.
Test Ok

Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.9.5 ==> f47df2487f02d4758fefb1d1a43f938d5af2d81d

Compatibility test results for aptos-node-v1.9.5 ==> f47df2487f02d4758fefb1d1a43f938d5af2d81d (PR)
1. Check liveness of validators at old version: aptos-node-v1.9.5
compatibility::simple-validator-upgrade::liveness-check : committed: 7061 txn/s, latency: 4702 ms, (p50: 4800 ms, p90: 7000 ms, p99: 7600 ms), latency samples: 247160
2. Upgrading first Validator to new version: f47df2487f02d4758fefb1d1a43f938d5af2d81d
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1001 txn/s, latency: 28959 ms, (p50: 31700 ms, p90: 42100 ms, p99: 43100 ms), latency samples: 57080
3. Upgrading rest of first batch to new version: f47df2487f02d4758fefb1d1a43f938d5af2d81d
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 346 txn/s, submitted: 450 txn/s, expired: 104 txn/s, latency: 38337 ms, (p50: 42300 ms, p90: 56600 ms, p99: 69100 ms), latency samples: 36715
4. upgrading second batch to new version: f47df2487f02d4758fefb1d1a43f938d5af2d81d
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 2036 txn/s, latency: 14645 ms, (p50: 16900 ms, p90: 18300 ms, p99: 19300 ms), latency samples: 93680
5. check swarm health
Compatibility test for aptos-node-v1.9.5 ==> f47df2487f02d4758fefb1d1a43f938d5af2d81d passed
Test Ok

@JoshLind JoshLind merged commit 5fe67b8 into main Mar 26, 2024
78 of 98 checks passed
@JoshLind JoshLind deleted the mempool_opt branch March 26, 2024 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-forge-e2e-perf Run the e2e perf forge only
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants