Skip to content

feat: peers test#368

Open
mskrzypkows wants to merge 31 commits intomainfrom
peers_test
Open

feat: peers test#368
mskrzypkows wants to merge 31 commits intomainfrom
peers_test

Conversation

@mskrzypkows
Copy link
Copy Markdown
Contributor

No description provided.

@mskrzypkows
Copy link
Copy Markdown
Contributor Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 4, 2026

Claude finished @mskrzypkows's task in 4m 3s —— View job


Code Review — PR #368: Peers test

  • Gather context and read changed files
  • Review crates/cli/src/commands/test/peers.rs (main change, +889/-13)
  • Review crates/cli/src/commands/test/mod.rs
  • Review crates/eth2util/src/enr.rs
  • Review crates/cli/Cargo.toml and crates/cli/src/main.rs
  • Compile and summarize findings

Summary

This PR implements the pluto test peers command, porting Charon's peer connectivity test suite (~900 lines). The structure is sound — relay HTTP tests, self TCP-port tests, and per-peer libp2p tests (Ping/PingMeasure/PingLoad/DirectConn) run correctly in parallel. A few bugs and missing hardening need addressing before merge.


Issues

Bug — ENR deduplication is broken (peers.rs:220–228)

let self_enr = Record::from_key(&private_key)?;
let mut all_enrs = enr_strings.clone();
if !all_enrs.contains(&self_enr.to_string()) {
    all_enrs.push(self_enr.to_string());
}

Record::from_key creates an ENR with no address fields. Cluster ENRs fetched from --enrs or the lock file carry address info (IP, TCP port), so their string representation will always differ from self_enr.to_string(). The self ENR is therefore always appended even when the operator's key is already in enr_strings, producing a different hash than the relay server computes and breaking relay room routing. Verify against the Go source how the ENR set is constructed for this hash and either use the raw ENR string already present in all_enrs, or strip addresses from all ENRs before hashing.

Fix this →


Bug — relay_ping_test accepts HTTP 3xx as success (peers.rs:437–441)

Ok(resp) if resp.status().as_u16() <= 399 => result.ok(),

<= 399 accepts 1xx, 2xx, and 3xx. A redirect (301, 302…) means the relay is not at that URL — it should be a failure. Use resp.status().is_success() (2xx only), or explicitly allow a known set of codes. Note that request_rtt in helpers.rs:611 already follows the same pattern, so this is a pre-existing concern, but relay_ping_test introduces its own copy.

Fix this →


Bug — relay HTTP requests have no timeout (peers.rs:430–440)

relay_ping_test creates a plain reqwest::Client with no timeout, so if the relay server holds the TCP connection open without responding, the test hangs indefinitely (the CancellationToken is only checked between tests, not during a request). Add .timeout(Duration::from_secs(5)) to the client builder. request_rtt (used by relay_ping_measure_test) has the same issue in helpers.rs.

Fix this →


Bug — silent panic suppression (peers.rs:262–263)

let result = self_task.await.unwrap_or_default();

JoinHandle::await returns Err(JoinError) when the task panics. unwrap_or_default() silently swallows the panic and returns an empty HashMap, making the self-test disappear from results without any error. Change to:

let result = self_task.await
    .unwrap_or_else(|e| {
        tracing::error!(err = %e, "self-test task panicked");
        HashMap::new()
    });

Logic issue — DirectConn error message uses has_load_test as a proxy (peers.rs:826–832)

} else if has_load_test && !state.direct_dial_attempted {
    r.fail(CliError::Other(
        "no direct addresses available from identify".to_string(),
    ))

has_load_test means "PingLoad is in the queue", not "we were connected long enough for identify to run". These are unrelated: an operator can run DirectConn without PingLoad. Use a dedicated identify_received: bool field on PeerState instead, which handle_swarm_event sets to true on Identify::Received.


Correctness — synchronous file I/O in async context (peers.rs:363–370)

fetch_enrs_from_lock calls std::fs::read_to_string inside an async function, blocking the executor thread. Switch to tokio::fs::read_to_string(...).await.


Minor / Style

Double tracing::initrun() calls pluto_tracing::init(...) unconditionally, but main.rs already calls it before dispatching to peers::run. The second call is harmless (let _ = ...) but confusing. Remove the one inside run().

#[allow(clippy::too_many_arguments)] on run_peer_event_loop — 8 arguments. Consider grouping queued_tests, relay_peers, enr_strings, and args into a small PeerTestConfig struct to reduce the argument surface.

libp2p_tcp_port_open_test — hardcoded multistream header — The comment documents why it's there, which is good. Consider extracting the constant with a note that it's keyed to multistream-select V1 so future libp2p upgrades are easy to detect.

No unit tests — ~900 lines of new logic, but helper functions like format_enr, peer_is_done, build_peer_results, all_peers_done, and fetch_enrs_from_lock are all pure/testable. Even a small set of targeted tests would catch regressions.


Non-issues (confirmed correct)

  • CategoryScore::max()A < B < C per Rust discriminant order; max() returns the worst score. ✓
  • Keep-alive loop correctly polls the swarm and handles cancellation. ✓
  • PingLoad window logic (connect_time.elapsed() < load_test_duration in peer_is_done, filtered same way in build_peer_results) is consistent. ✓
  • mod.rs list_test_cases for Peers now enumerates all three groups. ✓
  • enr.rs Record::from_key is a clean, minimal constructor. ✓
  • ct (cancellation token) is correctly propagated to relay tests, peer loop, and keep-alive loop. ✓

@mskrzypkows mskrzypkows changed the title Peers test feat: peers test May 4, 2026
Copy link
Copy Markdown
Collaborator

@varex83agent varex83agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR turns the previously stubbed peers test category into a real implementation (~900 LOC). The overall shape of the suite (Ping / PingMeasure / PingLoad / DirectConn / Libp2pTCPPortOpen / Relay HTTP tests) maps to Charon's testpeers.go, and several non-trivial pieces (the relay reservation behaviour, ENR-hash relay routing, multistream self-probe) are well thought through. Self-tested as the commit messages note.

That said, there are two concrete bugs and a handful of meaningful parity / robustness gaps that should be addressed before merge:

Bugs

  1. build_peer_results for DirectConn (peers.rs:894) gates the "no direct addresses available from identify" diagnostic on has_load_test, which has nothing to do with whether identify produced direct addresses. Running --test-cases DirectConn alone makes that branch unreachable and yields a misleading error.
  2. PingLoad (peers.rs:866) is not actually a load test. Charon spawns a fresh continuous-ping goroutine every second over LoadTestDuration; Pluto only filters the libp2p ping-keepalive samples that happened to land in the window. Over the default 30s window that's ~2 RTT samples per peer with no real load applied — verdicts effectively duplicate PingMeasure.

Major parity / robustness issues

  • DirectConn attempts a single dial after the first Identify event and never retries; Charon retries up to ~120x over DirectConnectionTimeout (peers.rs:777).
  • Missing --definition-file flag and no mutual-exclusion validation between --enrs / --lock-file (peers.rs:86, 165).
  • Relay HTTP tests run sequentially before peer/self tests, with no per-request timeout and no in-flight cancellation; one stalled relay blocks the whole suite (peers.rs:216, 446).
  • Self-ENR dedup uses raw string comparison against the lock file's ENR strings; canonical-vs-padded base64url encodings can produce different enr_hash values across cluster peers, putting them in different relay routing rooms (peers.rs:221).
  • Libp2pTCPPortOpen probes addresses sequentially (and self-tests run sequentially); Charon parallelises both via errgroup (peers.rs:478, 504).
  • Zero unit tests for ~900 lines of new logic; Charon's testpeers_internal_test.go is ~576 lines. Several pure functions (peer_is_done, build_peer_results, format_enr, fetch_enrs) are easy to cover and would have caught the bug at line 894.

Minor / nit findings cover error typing, Option<PathBuf> instead of empty-string sentinel, overflow-prone Instant + Duration, swallowed JoinError, hard-coded relay defaults, and a few code-organisation suggestions.

Verdict: requesting changes for the two bugs and the missing-flag / no-cancellation / no-tests issues.

Comment thread crates/cli/src/commands/test/peers.rs Outdated
Comment thread crates/cli/src/commands/test/peers.rs
Comment thread crates/cli/src/commands/test/peers.rs Outdated
Comment thread crates/cli/src/commands/test/peers.rs
Comment thread crates/cli/src/commands/test/peers.rs
Comment thread crates/cli/src/commands/test/peers.rs Outdated
Comment thread crates/cli/src/commands/test/peers.rs
Comment thread crates/cli/src/commands/test/peers.rs
Comment thread crates/cli/src/commands/test/peers.rs
Comment thread crates/cli/src/commands/test/peers.rs Outdated
@mskrzypkows
Copy link
Copy Markdown
Contributor Author

Regarding ping load test: the 1s ping interval already generates load; PingLoad's evaluate_highest_rtt on 30 samples is semantically different from PingMeasure's first(). Fixed the filter to use saturating arithmetic.

@mskrzypkows mskrzypkows marked this pull request as ready for review May 6, 2026 10:23
@mskrzypkows mskrzypkows requested review from iamquang95 and varex83 May 6, 2026 10:23
Comment thread crates/cli/src/commands/test/peers.rs Outdated
Comment thread crates/cli/src/commands/test/peers.rs
Comment thread crates/cli/src/commands/test/peers.rs Outdated
Comment thread crates/cli/src/commands/test/peers.rs Outdated
Comment thread crates/cli/src/commands/test/peers.rs Outdated
Comment thread crates/cli/src/commands/test/peers.rs Outdated
Comment thread crates/cli/src/commands/test/peers.rs Outdated
Comment thread crates/cli/src/commands/test/peers.rs Outdated
@mskrzypkows
Copy link
Copy Markdown
Contributor Author

@iamquang95 Resolved comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants