-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug(share/discovery): amount of peers in discovery shrinks over time #2107
Labels
Comments
Wondertan
added a commit
that referenced
this issue
May 3, 2023
Closes #2107 There was a discrepancy between the amount of peers inside of the peer manager full node pool, and the limitedSet inside of discovery. It was possible for the node to get in a state where it had no more full nodes to sample from, because the peers inside the limitedSet were blocking on the connection, and never being handed off to the OnPeersUpdate callback. To fix this, we (@Wondertan, @walldiss and I): * Set a context deadline for the call to Connect with a peer. Previously, it would never timeout. This is because RoutedHost will block until the context is canceled, even though there is a peer dial timeout. * Only add peers to the limitedSet after we have successfully connected. This turns the previous peerLimit into a "soft" limit. This prevents in-progress connections from clogging spots in the limitedSet, but also allows for more peers to end up in the limited set than the set limit (we don't throw away already connected peers). * Removed the timer reset upon a peer disconnecting. This peer disconnection happens often, causing a significant delay in the next call to FindPeers. * Added logs, cleaned up various comments * Backoff * Adds a RemoveBackoff method to the backoffConnector, to remove cancelled connections from the cache * Adds Backoff method * Cleans up limited set * Added tests for Discovery * Handle the case where discovered peer is already connected * Introduces importable Discovery params * Revisits constans Co-authored-by: Wondertan <hlibwondertan@gmail.com> Co-authored-by: Vlad <vlad@celestia.org> Co-authored-by: Vlad <13818348+walldiss@users.noreply.github.com>
distractedm1nd
added a commit
that referenced
this issue
May 22, 2023
## Problem #2107 ## Overview Could hotfix the problem by allowing peer manager to save nodes discovered through shrexSub to be used in full nodes pool. The solution would also benefit shrex performance by allowing distribution of request to more peers. --------- Co-authored-by: Ryan <ryan@celestia.org> Co-authored-by: Hlib Kanunnikov <hlibwondertan@gmail.com>
vgonkivs
pushed a commit
to vgonkivs/celestia-node
that referenced
this issue
May 22, 2023
…tiaorg#2105) ## Problem celestiaorg#2107 ## Overview Could hotfix the problem by allowing peer manager to save nodes discovered through shrexSub to be used in full nodes pool. The solution would also benefit shrex performance by allowing distribution of request to more peers. --------- Co-authored-by: Ryan <ryan@celestia.org> Co-authored-by: Hlib Kanunnikov <hlibwondertan@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem
It was observed in multiple runs that amount of peers connection established by discovery is shrinking over time and could even reach only 1 peer. Currently shrex uses only peers maintained by discovery for historical syncing. Low amount of peers available from discovery means all blocksync requests are hitting few (or 1) peers and resulting in very high load on those peers. Need to investigate the root cause.
The text was updated successfully, but these errors were encountered: