bug(share/discovery): amount of peers in discovery shrinks over time #2107

walldiss · 2023-04-20T11:32:49Z

Problem

It was observed in multiple runs that amount of peers connection established by discovery is shrinking over time and could even reach only 1 peer. Currently shrex uses only peers maintained by discovery for historical syncing. Low amount of peers available from discovery means all blocksync requests are hitting few (or 1) peers and resulting in very high load on those peers. Need to investigate the root cause.

@Wondertan

Closes #2107 There was a discrepancy between the amount of peers inside of the peer manager full node pool, and the limitedSet inside of discovery. It was possible for the node to get in a state where it had no more full nodes to sample from, because the peers inside the limitedSet were blocking on the connection, and never being handed off to the OnPeersUpdate callback. To fix this, we (@Wondertan, @walldiss and I): * Set a context deadline for the call to Connect with a peer. Previously, it would never timeout. This is because RoutedHost will block until the context is canceled, even though there is a peer dial timeout. * Only add peers to the limitedSet after we have successfully connected. This turns the previous peerLimit into a "soft" limit. This prevents in-progress connections from clogging spots in the limitedSet, but also allows for more peers to end up in the limited set than the set limit (we don't throw away already connected peers). * Removed the timer reset upon a peer disconnecting. This peer disconnection happens often, causing a significant delay in the next call to FindPeers. * Added logs, cleaned up various comments * Backoff * Adds a RemoveBackoff method to the backoffConnector, to remove cancelled connections from the cache * Adds Backoff method * Cleans up limited set * Added tests for Discovery * Handle the case where discovered peer is already connected * Introduces importable Discovery params * Revisits constans Co-authored-by: Wondertan <hlibwondertan@gmail.com> Co-authored-by: Vlad <vlad@celestia.org> Co-authored-by: Vlad <13818348+walldiss@users.noreply.github.com>

## Problem #2107 ## Overview Could hotfix the problem by allowing peer manager to save nodes discovered through shrexSub to be used in full nodes pool. The solution would also benefit shrex performance by allowing distribution of request to more peers. --------- Co-authored-by: Ryan <ryan@celestia.org> Co-authored-by: Hlib Kanunnikov <hlibwondertan@gmail.com>

…tiaorg#2105) ## Problem celestiaorg#2107 ## Overview Could hotfix the problem by allowing peer manager to save nodes discovered through shrexSub to be used in full nodes pool. The solution would also benefit shrex performance by allowing distribution of request to more peers. --------- Co-authored-by: Ryan <ryan@celestia.org> Co-authored-by: Hlib Kanunnikov <hlibwondertan@gmail.com>

walldiss added bug Something isn't working area:shares Shares and samples labels Apr 20, 2023

walldiss self-assigned this Apr 20, 2023

walldiss mentioned this issue Apr 20, 2023

[EPIC] shrex stability + performance #2106

Open

walldiss assigned Wondertan Apr 20, 2023

walldiss mentioned this issue Apr 20, 2023

feat(share/p2p/peer-manager): use shrexSub peers as full nodes #2105

Merged

distractedm1nd mentioned this issue Apr 21, 2023

fix(share/discovery)!: revamp Discovery #2117

Merged

Wondertan closed this as completed in #2117 May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(share/discovery): amount of peers in discovery shrinks over time #2107

bug(share/discovery): amount of peers in discovery shrinks over time #2107

walldiss commented Apr 20, 2023 •

edited

Loading

bug(share/discovery): amount of peers in discovery shrinks over time #2107

bug(share/discovery): amount of peers in discovery shrinks over time #2107

Comments

walldiss commented Apr 20, 2023 • edited Loading

Problem

walldiss commented Apr 20, 2023 •

edited

Loading