Skip to content

fix: use per-family vote pools for dual-stack ENR auto-discovery#334

Merged
wemeetagain merged 4 commits intoChainSafe:masterfrom
lodekeeper:fix/dual-stack-ipv6-addr-votes
Mar 16, 2026
Merged

fix: use per-family vote pools for dual-stack ENR auto-discovery#334
wemeetagain merged 4 commits intoChainSafe:masterfrom
lodekeeper:fix/dual-stack-ipv6-addr-votes

Conversation

@lodekeeper
Copy link
Contributor

@lodekeeper lodekeeper commented Mar 15, 2026

Fix dual-stack IPv6 auto-ENR discovery

Fixes the issue where Lodestar nodes running both IPv4 and IPv6 fail to auto-discover their external IPv6 address in the ENR (lodestar#8808).

Root causes (3 bugs)

  1. Single vote pool — IPv4 votes reach threshold first, addVote() calls clear() which wipes accumulated IPv6 votes. IPv6 never accumulates enough.
  2. No immediate ping on session establishment — first PONG votes arrive only after pingInterval (5 min default), starving both families during bootstrap.
  3. IPv4-mapped IPv6 addresses pollute IPv6 pool — some peers report our IPv4 address as ::ffff:x.x.x.x (IPv4-mapped IPv6), which routes to the wrong pool and splits the vote count.

Changes

Commit 1: Per-family vote pools

  • Replace single AddrVotes instance with per-family pools (ip4 / ip6)
  • Route votes by addr.ip.type to the correct pool
  • Add getSocketAddressOnENRByFamily(enr, family) helper for per-family ENR comparison
  • Extract maybeUpdateLocalEnrFromVote() for cleaner vote routing
  • Fix pre-existing bug: MAX_VOTES eviction didn't decrement tallies

Commit 2: Immediate ping + vote bootstrap (matching Rust sigp/discv5)

  • Immediate PING on outgoing session establishment: Send a PING as soon as a new outgoing peer is inserted into the routing table, instead of waiting for pingInterval (5 min default). Matches Rust sigp/discv5 connection_updated behavior.
  • requireMoreIpVotes() bootstrap: In dual-stack mode, when a family hasn't accumulated enough votes, accept PONG votes from ANY peer (not just connected outgoing). Critical for IPv6 bootstrap on IPv4-dominated networks.
  • Ping on routing table insertion failure: When kbuckets are full, if we need more IPv6 votes and the peer has udp6, ping anyway just for the vote.

Commit 3: IPv4-mapped IPv6 normalization

  • Detect IPv4-mapped IPv6 addresses (::ffff:0:0/96) in PONG responses
  • Convert to native IPv4 and route to IPv4 pool instead of polluting IPv6 pool
  • Add isIpv4MappedIpv6() and normalizeIp() helpers with full test coverage

Test results

Unit tests: 63 passing (12 new: 5 addrVotes + 1 currentVoteCount + 6 normalizeIp/isIpv4MappedIpv6)

Live mainnet dual-stack test (all 3 fixes applied):

ip:  <redacted-ipv4>                          ✅ IPv4
ip6: <redacted-ipv6>                          ✅ IPv6
udp: 9100  udp6: 9100  seq: 10
  • 199 total votes after ~10 min
  • 14 unique IPv6 voters (threshold: 10)
  • 4 IPv4-mapped votes correctly redirected to IPv4 pool
  • IPv4 discovered in ~1 min, IPv6 in ~8 min

Design decisions

  • Two separate AddrVotes instances (not internal redesign): Smallest behavioral delta
  • No config/API changes: addrVotesToUpdateEnr applies per family (docstring updated)
  • requireMoreIpVotes only in dual-stack: Single-stack nodes don't need this bootstrap
  • No 30% clear majority or ConnectivityState: Deferred to follow-up

References

  • Rust sigp/discv5 IpVote struct: separate ipv4_votes/ipv6_votes HashMaps
  • Rust sigp/discv5 require_more_ip_votes(): vote bootstrap for underrepresented families
  • Rust sigp/discv5 connection_updated(): immediate ping on InsertResult::Inserted + Outgoing

On dual-stack hosts (IPv4 + IPv6), the node only auto-discovers the
external IPv4 address. IPv6 never accumulates enough votes because
AddrVotes uses a single pool for both families — when IPv4 reaches
the threshold, clear() wipes all votes including IPv6.

Changes:
- Split addrVotes into per-family pools (ip4/ip6), initialized from
  ipMode. Each family independently tracks votes and updates the ENR.
- Add getSocketAddressOnENRByFamily() to compare against the correct
  family when checking for ENR changes.
- Extract maybeUpdateLocalEnrFromVote() to route votes by address type.
- Fix pre-existing bug: MAX_VOTES eviction now decrements tallies.

Closes ChainSafe/lodestar#8808
@lodekeeper lodekeeper requested a review from a team as a code owner March 15, 2026 21:51
@nflaig
Copy link
Member

nflaig commented Mar 15, 2026

@codex review

@lodekeeper
Copy link
Contributor Author

lodekeeper commented Mar 15, 2026

Code Review — fix/dual-stack-ipv6-addr-votes

Summary

This PR fixes the dual-stack IPv6 auto-ENR discovery issue by splitting the single AddrVotes pool into per-family pools. The approach is clean and minimal. Also includes a bonus fix for a pre-existing bug where MAX_VOTES eviction didn't decrement tallies.


✅ Correctness

Root cause fix is sound. The single AddrVotes pool meant IPv4 wins (reaching threshold → clear()) would wipe accumulated IPv6 votes. With per-family pools (ip4?: AddrVotes; ip6?: AddrVotes), each family independently accumulates votes and updates the ENR without cross-contamination.

addVote already calls this.clear() internally before returning true, so there's no risk of pingConnectedPeers() spam after a winning vote — the pool resets on win.

getSocketAddressOnENRByFamily is correct. The ip.type !== family guard properly rejects malformed ENR entries (e.g., ip key containing 16-byte IPv6 octets). The refactored getSocketAddressOnENR preserves its existing semantics (IPv6 preferred in dual-stack).

Constructor reorder is safe. Moving this.ipMode assignment before this.addrVotes initialization is correct since this.sessionService is already available at that point.

✅ Bug Fix: MAX_VOTES Eviction

The decrementTally() extraction and its use during eviction correctly fixes the pre-existing accounting bug. Previously, evicting old votes didn't update tallies, which could lead to inflated counts and premature threshold crossings.

🟡 Minor Observations

  1. log("Local ENR (IP & UDP) updated: %s", isWinningVote)isWinningVote is a boolean (true), so this logs "Local ENR (IP & UDP) updated: true". Consider logging the actual address for better debugging: log("Local ENR (IP & UDP) updated for %s", observedAddr.ip.type === 4 ? "IPv4" : "IPv6").

  2. Double pingConnectedPeers() in dual-stack — If both IPv4 and IPv6 reach threshold in quick succession, pingConnectedPeers() fires twice. The second call sends the fully updated ENR (with both addresses), so correctness is fine, but it's a minor efficiency consideration. Low priority.

  3. decrementTally with missing key — If called when the key doesn't exist in tallies, nextTally = -1 and the delete is a no-op. This is safe and matches the original inline behavior, but a debug assertion could help catch accounting errors during development.

🟡 Test Coverage

The new tests cover the key scenarios:

  • ✅ Separate pools don't interfere
  • ✅ MAX_VOTES eviction decrements tallies
  • getSocketAddressOnENRByFamily per-family lookup
  • ✅ Dual-stack getSocketAddressOnENR IPv6 preference preserved

Could be stronger with:

  • A test that specifically proves "IPv4 winning does NOT clear IPv6 votes" (the exact root cause scenario) — currently implied by the separate-pools test but not explicitly demonstrated with interleaved votes
  • A test for the if (!votes) return path (IPv6 vote arriving on an IPv4-only node)

⚠️ Limitation: IPv6 Peer Bootstrapping

This fix is necessary but may not be sufficient on its own. Local testing on mainnet shows IPv4 auto-discovery works (<redacted-ipv4> discovered in ~3 min) but IPv6 is not discovered after 15+ minutes because 0 out of 50+ connected peers have IPv6 addresses in their ENR. Without IPv6-capable peers in the routing table, no PINGs are sent via IPv6, and no IPv6 PONGs arrive.

This is the exact chicken-and-egg problem described in the issue: "if the node is reachable via IPv4 then other nodes will just 'pong' us on that interface." The Rust sigp/discv5 implementation addresses this with require_more_ip_votes(is_ipv6) — a bootstrap mechanism that accepts votes from ALL peers (not just outgoing) when a family hasn't reached its threshold.

This fix correctly prevents vote pool contamination, which is a prerequisite. The bootstrap mechanism could be a follow-up PR.

Verdict

Approve with minor suggestions. The core fix is correct, well-structured, and well-tested. The vote pool separation directly addresses the reported bug. The minor log message improvement and the IPv6 bootstrap mechanism are follow-up items.

Add two mechanisms from Rust sigp/discv5 that are critical for IPv6
auto-discovery in dual-stack mode:

1. Immediate PING on outgoing session establishment:
   When a new peer is inserted into the routing table via an outgoing
   connection, send a PING immediately instead of waiting for the
   ping_interval (5 min default). This allows PONG-based address votes
   to arrive within seconds of connecting.

2. requireMoreIpVotes() bootstrap:
   In dual-stack mode, when a family (IPv4 or IPv6) hasn't accumulated
   enough votes to reach the update threshold, accept PONG votes from
   ANY peer (not just connected outgoing ones). This is critical for
   IPv6 bootstrap on networks where most peers are IPv4-only.

3. Ping on routing table insertion failure:
   When the kbuckets are full and reject a new outgoing peer, if we
   still need more IPv6 votes and the peer has an IPv6 address, ping
   them anyway just to gather the address vote.

These mechanisms match the behavior documented in Rust sigp/discv5's
service.rs (used by Lighthouse and Grandine).
@lodekeeper
Copy link
Contributor Author

Added a second commit with three mechanisms from Rust sigp/discv5 that are critical for making IPv6 auto-discovery actually work on mainnet:

  1. Immediate PING on outgoing session establishment — instead of waiting 5 min for pingInterval, send a PING right when a new outgoing peer is inserted. This makes ENR address votes arrive within seconds.

  2. requireMoreIpVotes() bootstrap — in dual-stack mode, when a family (e.g. IPv6) hasn't reached the vote threshold, accept PONG votes from ANY peer instead of only connected+outgoing. This expands the voter pool for the underrepresented family.

  3. Ping on routing table insertion failure — when kbuckets are full and reject a new peer, if we still need IPv6 votes and the peer has udp6, ping them anyway just for the address vote. The Rust code comments: "On large networks with limited IPv6 nodes, it is hard to get enough PONG votes."

Live testing context

In my IPv6-only mainnet test:

  • Without immediate ping: 0 votes for 5+ min, then only 8 unique IPv6 voters (below threshold of 10)
  • 5/17 mainnet bootnodes have IPv6 (4 Lighthouse, 1 Lodestar)
  • IPv4 discovery works correctly in dual-stack (confirms the per-family pools work)

Build ✅ | Lint ✅ | 57 tests ✅

Remote peers may report our IPv4 address as an IPv4-mapped IPv6 address
(::ffff:x.x.x.x) in their PONG responses. Without normalization, these
votes are classified as IPv6 (16-byte octets) and routed to the IPv6
vote pool, splitting the vote count between two different representations
of the same IPv4 address.

Live mainnet testing confirmed this: 3 out of 8 IPv6 votes were actually
::ffff:85.246.147.78 (our IPv4 address in mapped form), preventing both
the IPv4 pool (missing 3 votes) and IPv6 pool (polluted with wrong
votes) from reaching the threshold.

Changes:
- Add isIpv4MappedIpv6() to detect ::ffff:0:0/96 addresses
- Add normalizeIp() to convert mapped addresses to native IPv4
- Apply normalization at the entry point of maybeUpdateLocalEnrFromVote()
- 6 new unit tests for detection and conversion
@lodekeeper
Copy link
Contributor Author

lodekeeper commented Mar 16, 2026

✅ Live mainnet dual-stack test — IPv6 auto-discovered!

Third commit (IPv4-mapped normalization) confirmed the fix works end-to-end.

ENR after ~10 min on mainnet:

ip:  <redacted-ipv4>                            ✅ IPv4
ip6: <redacted-ipv6>  ✅ IPv6
udp: 9100  udp6: 9100  seq: 10

Vote distribution:

  • 199 total votes (after ~10 min)
  • 14 unique IPv6 voters → reached threshold of 10 → ENR updated
  • IPv4-mapped normalization working: 4 votes that arrived as ::ffff:<redacted-ipv4> were correctly normalized to IPv4 and routed to IPv4 pool (not polluting IPv6)

Timeline:

  • IPv4 discovery: ~1 min (immediate pings help)
  • IPv6 discovery: ~8 min (fewer IPv6 peers, but got there with the combined fixes)

All 3 commits working together:

  1. Per-family vote pools — IPv4 winning doesn't wipe IPv6 votes
  2. Immediate ping on outgoing session + requireMoreIpVotes bootstrap — votes arrive faster
  3. IPv4-mapped normalization::ffff:x.x.x.x mapped addresses redirected to IPv4 pool

Build ✅ | Lint ✅ | 63 tests ✅

SessionService emits 'established' before storing the session in its
internal map (newSession). A synchronous sendPing from the connectionUpdated
handler fires before the session exists, triggering a second handshake that
collides with the first — causing 'Invalid Authentication header' and
session drops.

Defer both immediate PINGs (InsertResult.Inserted and FailedBucketFull)
via setTimeout(fn, 0) so the session is stored before the PING is sent.
@wemeetagain wemeetagain merged commit 1198de5 into ChainSafe:master Mar 16, 2026
5 checks passed
This was referenced Mar 16, 2026
lodekeeper added a commit to lodekeeper/lodestar that referenced this pull request Mar 16, 2026
Bumps discv5 to v12.0.1 which includes per-family vote pools for
dual-stack IPv6 auto-ENR discovery (ChainSafe/discv5#334).

Key fixes in discv5 v12.0.1:
- Per-family AddrVotes pools (IPv4/IPv6 votes no longer interfere)
- Immediate PING on outgoing session establishment (faster vote bootstrap)
- requireMoreIpVotes() for dual-stack IPv6 bootstrap
- IPv4-mapped IPv6 address normalization (prevents vote pool pollution)
- RangeError crash fix on malformed ENR port values

Also bumps @chainsafe/enr to v6.0.1 (required by discv5 v12.0.1).

Closes ChainSafe#8808
nflaig pushed a commit to ChainSafe/lodestar that referenced this pull request Mar 16, 2026
…#9049)

## Motivation

Fixes #8808 — Lodestar nodes running dual-stack (IPv4 + IPv6) fail to
auto-discover their external IPv6 address in the ENR.

## Description

Bumps `@chainsafe/discv5` to v12.0.1
([ChainSafe/discv5#334](ChainSafe/discv5#334))
and `@chainsafe/enr` to v6.0.1.

### Root causes fixed in discv5 v12.0.1

1. **Single vote pool** — IPv4 votes reach threshold first, `addVote()`
calls `clear()` which wipes accumulated IPv6 votes
2. **No immediate ping on session establishment** — first PONG votes
arrive only after `pingInterval` (5 min default)
3. **IPv4-mapped IPv6 addresses pollute IPv6 pool** — some peers report
IPv4 as `::ffff:x.x.x.x`, routing to the wrong pool

### Changes in this PR

- Bump `@chainsafe/discv5` from `^12.0.0` to `^12.0.1`
- Bump `@chainsafe/enr` from `^6.0.0` to `^6.0.1` (required by discv5
v12.0.1)
- Add `@chainsafe/discv5` and `@chainsafe/enr` to
`minimumReleaseAgeExclude` in `pnpm-workspace.yaml`

### Testing

Ran a local Lodestar beacon node (checkpoint sync + engineMock) on
mainnet with the bumped discv5:
- IPv4 discovered in ~1 min ✅
- IPv6 discovered in ~2 min ✅
- Both addresses stable for 8+ minutes across all samples
- 75-92 connected peers

*This PR was authored by an AI contributor — @lodekeeper, with human
review by @nflaig.*

---------

Co-authored-by: lodekeeper <lodekeeper@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants