fix(discovery): Remove discovery limit #3548

guillaumemichel · 2024-07-03T12:40:11Z

Currently, FindPeers doesn't include a limit, resulting in using the default hardcoded limit of 100. Celestia mainnet now includes more than 100 peers, which means that some peers may not be discovered because of this default limit.

celestia-node/share/p2p/discovery/discovery.go

Line 293 in e2b2994

peers, err := d.disc.FindPeers(findCtx, d.tag)

Using the limit of 0 ensures that all advertisers are returned.

renaynay

@guillaumemichel Thanks for the contribution!

Something to note - celestia-node also performs "peer discovery" ambiently via its instance of gossipsub that is used for propagating recent headers throughout the network, so it's possible for nodes to discover nodes of all kinds as all node types subscribe to and rebroadcast headers.

We currently have 2 instances of discovery running in our latest release: one for full and one for archival -- both of which we try to find + maintain connectivity to 5 peers of the desired topic. So these instances of discovery aren't our only source of discovering peers, but they are there to guarantee connection to at least 5 archival and 5 full peers at any given time.

My LN for example is currently connected to ~250 peers.

Is there an observed reason for this change?

Also, if we were to remove the limit, it looks like some changes would have to be made in discovery to ensure that we're constantly draining the peer chan - whereas right now, we're only reading for a minute at a time per discovery loop. Additionally, passing -1 feels like an implicit hack to get around the limit (https://github.com/libp2p/go-libp2p-kad-dht/blob/25d299e64f3cb9a3da337e9d4af1a1b722ae503d/routing.go#L530).

guillaumemichel · 2024-07-03T14:46:16Z

Additionally, passing -1 feels like an implicit hack to get around the limit (https://github.com/libp2p/go-libp2p-kad-dht/blob/25d299e64f3cb9a3da337e9d4af1a1b722ae503d/routing.go#L530).

Actually the parameter should be 0 and not -1 (I edited the PR). So it isn't a hack around the limit, findAll will be set to true and all discovered peers will be returned.

If this doesn't seem necessary, and you want to optimize to only get 5 peers, then the limit could be set to size. But there is a caveat.

The order in which the peers are returned depends on the provider store implementation. A deterministic implementation would always return the peers in the same order, centralizing the network if the response is truncated to the 5 (or 100) first peers.

renaynay · 2024-07-03T14:57:54Z

@guillaumemichel wouldn't passing 0 as the option just wind up with the same issue here ? I thought passing -1 was the hack that allowed the bypassing of this check.

Good point re provider store, can you describe the problem a bit further? For context, we use badgerDB as an instance of datastore.Batching which is used as the dht.providerStore.

guillaumemichel · 2024-07-03T15:27:40Z

@renaynay you are right, I just opened an issue because I believe that a count of 0 should be allowed.

Basically, the DHT server storing the provider records will query the provider store with the requested key. This will return a list of peers, that is then serialized and sent over the wire to the requester. The requester will then deserialize it, and write the count first peers from the list in the peerOut channel (code).

So the order defined by the provider store Get has a direct influence on which peers are returned to the caller of FindPeers. Because all the peers are received by the DHT client anyway, they should all be returned to the caller, so that the peers can be randomly sampled if only a subset is required.

renaynay · 2024-07-03T16:33:07Z

Thanks for the context @guillaumemichel

It sounds like this is not really an issue with the data store we're using but rather with how we choose peers from those returned by FindPeers IIUC.

We'll audit the implementation shortly. Would be interesting if we can prove this with a visualisation of the current network topology.

Anyway, I'd close this PR as it is for now. Thanks again

guillaumemichel requested review from renaynay, Wondertan, vgonkivs, distractedm1nd, walldiss, ramin and cristaloleg as code owners July 3, 2024 12:40

github-actions bot added the external Issues created by non node team members label Jul 3, 2024

fix discovery limit

6d413c9

guillaumemichel force-pushed the fix/discovery-limit branch from 617f543 to 6d413c9 Compare July 3, 2024 13:54

renaynay reviewed Jul 3, 2024

View reviewed changes

renaynay closed this Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(discovery): Remove discovery limit #3548

fix(discovery): Remove discovery limit #3548

guillaumemichel commented Jul 3, 2024 •

edited

Loading

renaynay left a comment

guillaumemichel commented Jul 3, 2024

renaynay commented Jul 3, 2024

guillaumemichel commented Jul 3, 2024

renaynay commented Jul 3, 2024

fix(discovery): Remove discovery limit #3548

fix(discovery): Remove discovery limit #3548

Conversation

guillaumemichel commented Jul 3, 2024 • edited Loading

renaynay left a comment

Choose a reason for hiding this comment

guillaumemichel commented Jul 3, 2024

renaynay commented Jul 3, 2024

guillaumemichel commented Jul 3, 2024

renaynay commented Jul 3, 2024

guillaumemichel commented Jul 3, 2024 •

edited

Loading