Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(discovery): Remove discovery limit #3548

Closed

Conversation

guillaumemichel
Copy link

@guillaumemichel guillaumemichel commented Jul 3, 2024

Currently, FindPeers doesn't include a limit, resulting in using the default hardcoded limit of 100. Celestia mainnet now includes more than 100 peers, which means that some peers may not be discovered because of this default limit.

peers, err := d.disc.FindPeers(findCtx, d.tag)

Using the limit of 0 ensures that all advertisers are returned.

@github-actions github-actions bot added the external Issues created by non node team members label Jul 3, 2024
Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guillaumemichel Thanks for the contribution!

Something to note - celestia-node also performs "peer discovery" ambiently via its instance of gossipsub that is used for propagating recent headers throughout the network, so it's possible for nodes to discover nodes of all kinds as all node types subscribe to and rebroadcast headers.

We currently have 2 instances of discovery running in our latest release: one for full and one for archival -- both of which we try to find + maintain connectivity to 5 peers of the desired topic. So these instances of discovery aren't our only source of discovering peers, but they are there to guarantee connection to at least 5 archival and 5 full peers at any given time.

My LN for example is currently connected to ~250 peers.

Is there an observed reason for this change?

Also, if we were to remove the limit, it looks like some changes would have to be made in discovery to ensure that we're constantly draining the peer chan - whereas right now, we're only reading for a minute at a time per discovery loop. Additionally, passing -1 feels like an implicit hack to get around the limit (https://github.com/libp2p/go-libp2p-kad-dht/blob/25d299e64f3cb9a3da337e9d4af1a1b722ae503d/routing.go#L530).

@guillaumemichel
Copy link
Author

Additionally, passing -1 feels like an implicit hack to get around the limit (https://github.com/libp2p/go-libp2p-kad-dht/blob/25d299e64f3cb9a3da337e9d4af1a1b722ae503d/routing.go#L530).

Actually the parameter should be 0 and not -1 (I edited the PR). So it isn't a hack around the limit, findAll will be set to true and all discovered peers will be returned.

If this doesn't seem necessary, and you want to optimize to only get 5 peers, then the limit could be set to size. But there is a caveat.

The order in which the peers are returned depends on the provider store implementation. A deterministic implementation would always return the peers in the same order, centralizing the network if the response is truncated to the 5 (or 100) first peers.

@renaynay
Copy link
Member

renaynay commented Jul 3, 2024

@guillaumemichel wouldn't passing 0 as the option just wind up with the same issue here ? I thought passing -1 was the hack that allowed the bypassing of this check.

Good point re provider store, can you describe the problem a bit further? For context, we use badgerDB as an instance of datastore.Batching which is used as the dht.providerStore.

@guillaumemichel
Copy link
Author

@renaynay you are right, I just opened an issue because I believe that a count of 0 should be allowed.

Basically, the DHT server storing the provider records will query the provider store with the requested key. This will return a list of peers, that is then serialized and sent over the wire to the requester. The requester will then deserialize it, and write the count first peers from the list in the peerOut channel (code).

So the order defined by the provider store Get has a direct influence on which peers are returned to the caller of FindPeers. Because all the peers are received by the DHT client anyway, they should all be returned to the caller, so that the peers can be randomly sampled if only a subset is required.

@renaynay
Copy link
Member

renaynay commented Jul 3, 2024

Thanks for the context @guillaumemichel

It sounds like this is not really an issue with the data store we're using but rather with how we choose peers from those returned by FindPeers IIUC.

We'll audit the implementation shortly. Would be interesting if we can prove this with a visualisation of the current network topology.

Anyway, I'd close this PR as it is for now. Thanks again

@renaynay renaynay closed this Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external Issues created by non node team members
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants