fix: use per-family vote pools for dual-stack ENR auto-discovery#334
Conversation
On dual-stack hosts (IPv4 + IPv6), the node only auto-discovers the external IPv4 address. IPv6 never accumulates enough votes because AddrVotes uses a single pool for both families — when IPv4 reaches the threshold, clear() wipes all votes including IPv6. Changes: - Split addrVotes into per-family pools (ip4/ip6), initialized from ipMode. Each family independently tracks votes and updates the ENR. - Add getSocketAddressOnENRByFamily() to compare against the correct family when checking for ENR changes. - Extract maybeUpdateLocalEnrFromVote() to route votes by address type. - Fix pre-existing bug: MAX_VOTES eviction now decrements tallies. Closes ChainSafe/lodestar#8808
|
@codex review |
Code Review —
|
Add two mechanisms from Rust sigp/discv5 that are critical for IPv6 auto-discovery in dual-stack mode: 1. Immediate PING on outgoing session establishment: When a new peer is inserted into the routing table via an outgoing connection, send a PING immediately instead of waiting for the ping_interval (5 min default). This allows PONG-based address votes to arrive within seconds of connecting. 2. requireMoreIpVotes() bootstrap: In dual-stack mode, when a family (IPv4 or IPv6) hasn't accumulated enough votes to reach the update threshold, accept PONG votes from ANY peer (not just connected outgoing ones). This is critical for IPv6 bootstrap on networks where most peers are IPv4-only. 3. Ping on routing table insertion failure: When the kbuckets are full and reject a new outgoing peer, if we still need more IPv6 votes and the peer has an IPv6 address, ping them anyway just to gather the address vote. These mechanisms match the behavior documented in Rust sigp/discv5's service.rs (used by Lighthouse and Grandine).
|
Added a second commit with three mechanisms from Rust
Live testing contextIn my IPv6-only mainnet test:
Build ✅ | Lint ✅ | 57 tests ✅ |
Remote peers may report our IPv4 address as an IPv4-mapped IPv6 address (::ffff:x.x.x.x) in their PONG responses. Without normalization, these votes are classified as IPv6 (16-byte octets) and routed to the IPv6 vote pool, splitting the vote count between two different representations of the same IPv4 address. Live mainnet testing confirmed this: 3 out of 8 IPv6 votes were actually ::ffff:85.246.147.78 (our IPv4 address in mapped form), preventing both the IPv4 pool (missing 3 votes) and IPv6 pool (polluted with wrong votes) from reaching the threshold. Changes: - Add isIpv4MappedIpv6() to detect ::ffff:0:0/96 addresses - Add normalizeIp() to convert mapped addresses to native IPv4 - Apply normalization at the entry point of maybeUpdateLocalEnrFromVote() - 6 new unit tests for detection and conversion
✅ Live mainnet dual-stack test — IPv6 auto-discovered!Third commit (IPv4-mapped normalization) confirmed the fix works end-to-end. ENR after ~10 min on mainnet:Vote distribution:
Timeline:
All 3 commits working together:
Build ✅ | Lint ✅ | 63 tests ✅ |
SessionService emits 'established' before storing the session in its internal map (newSession). A synchronous sendPing from the connectionUpdated handler fires before the session exists, triggering a second handshake that collides with the first — causing 'Invalid Authentication header' and session drops. Defer both immediate PINGs (InsertResult.Inserted and FailedBucketFull) via setTimeout(fn, 0) so the session is stored before the PING is sent.
Bumps discv5 to v12.0.1 which includes per-family vote pools for dual-stack IPv6 auto-ENR discovery (ChainSafe/discv5#334). Key fixes in discv5 v12.0.1: - Per-family AddrVotes pools (IPv4/IPv6 votes no longer interfere) - Immediate PING on outgoing session establishment (faster vote bootstrap) - requireMoreIpVotes() for dual-stack IPv6 bootstrap - IPv4-mapped IPv6 address normalization (prevents vote pool pollution) - RangeError crash fix on malformed ENR port values Also bumps @chainsafe/enr to v6.0.1 (required by discv5 v12.0.1). Closes ChainSafe#8808
…#9049) ## Motivation Fixes #8808 — Lodestar nodes running dual-stack (IPv4 + IPv6) fail to auto-discover their external IPv6 address in the ENR. ## Description Bumps `@chainsafe/discv5` to v12.0.1 ([ChainSafe/discv5#334](ChainSafe/discv5#334)) and `@chainsafe/enr` to v6.0.1. ### Root causes fixed in discv5 v12.0.1 1. **Single vote pool** — IPv4 votes reach threshold first, `addVote()` calls `clear()` which wipes accumulated IPv6 votes 2. **No immediate ping on session establishment** — first PONG votes arrive only after `pingInterval` (5 min default) 3. **IPv4-mapped IPv6 addresses pollute IPv6 pool** — some peers report IPv4 as `::ffff:x.x.x.x`, routing to the wrong pool ### Changes in this PR - Bump `@chainsafe/discv5` from `^12.0.0` to `^12.0.1` - Bump `@chainsafe/enr` from `^6.0.0` to `^6.0.1` (required by discv5 v12.0.1) - Add `@chainsafe/discv5` and `@chainsafe/enr` to `minimumReleaseAgeExclude` in `pnpm-workspace.yaml` ### Testing Ran a local Lodestar beacon node (checkpoint sync + engineMock) on mainnet with the bumped discv5: - IPv4 discovered in ~1 min ✅ - IPv6 discovered in ~2 min ✅ - Both addresses stable for 8+ minutes across all samples - 75-92 connected peers *This PR was authored by an AI contributor — @lodekeeper, with human review by @nflaig.* --------- Co-authored-by: lodekeeper <lodekeeper@users.noreply.github.com>
Fix dual-stack IPv6 auto-ENR discovery
Fixes the issue where Lodestar nodes running both IPv4 and IPv6 fail to auto-discover their external IPv6 address in the ENR (lodestar#8808).
Root causes (3 bugs)
addVote()callsclear()which wipes accumulated IPv6 votes. IPv6 never accumulates enough.pingInterval(5 min default), starving both families during bootstrap.::ffff:x.x.x.x(IPv4-mapped IPv6), which routes to the wrong pool and splits the vote count.Changes
Commit 1: Per-family vote pools
AddrVotesinstance with per-family pools (ip4/ip6)addr.ip.typeto the correct poolgetSocketAddressOnENRByFamily(enr, family)helper for per-family ENR comparisonmaybeUpdateLocalEnrFromVote()for cleaner vote routingMAX_VOTESeviction didn't decrement talliesCommit 2: Immediate ping + vote bootstrap (matching Rust sigp/discv5)
pingInterval(5 min default). Matches Rustsigp/discv5connection_updatedbehavior.requireMoreIpVotes()bootstrap: In dual-stack mode, when a family hasn't accumulated enough votes, accept PONG votes from ANY peer (not just connected outgoing). Critical for IPv6 bootstrap on IPv4-dominated networks.udp6, ping anyway just for the vote.Commit 3: IPv4-mapped IPv6 normalization
::ffff:0:0/96) in PONG responsesisIpv4MappedIpv6()andnormalizeIp()helpers with full test coverageTest results
Unit tests: 63 passing (12 new: 5 addrVotes + 1 currentVoteCount + 6 normalizeIp/isIpv4MappedIpv6)
Live mainnet dual-stack test (all 3 fixes applied):
Design decisions
AddrVotesinstances (not internal redesign): Smallest behavioral deltaaddrVotesToUpdateEnrapplies per family (docstring updated)requireMoreIpVotesonly in dual-stack: Single-stack nodes don't need this bootstrapReferences
sigp/discv5IpVotestruct: separateipv4_votes/ipv6_votesHashMapssigp/discv5require_more_ip_votes(): vote bootstrap for underrepresented familiessigp/discv5connection_updated(): immediate ping onInsertResult::Inserted+Outgoing