-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seednode CLOSE_REQUESTED_BY_PEER + Timeout == 90s delay in startup #3797
Comments
The seednodes should not intentionally disconnect nodes. We should find out why that is the case (as you stated they are probably overloaded - or maybe its at the restart process.... ). |
@wiz Could you have a look at this issue? |
this is a tricky one.
Making a change to this behavior (in order to address this issue) might result in huge testing efforts. I suggest a wont-fix as we have other countermeasures coming up. @ripcurlx @sqrrm? |
Can you add monitoring for the number of connections on each seednode and display it on grafana? |
I expect the number of connections for each seednode to be close to their maximum (50 or something?). Simply because we only have 8 seednodes and each client fires up 2 connections -> 200 clients can be connected to the seednodes at any point in time. If another client comes along, the seednode decides which connection to close according to the rules. However, you might be able to grep a |
Description
During startup, the Bisq client connects to 2 seednodes to sync the initial date. In the event that one seednode closes the connection w/ CLOSE_REQUESTED_BY_PEER and the other times out, the client hangs at (3/4) for the full 90s timeout window.
There isn't much that can be done w.r.t. timeouts, but if a seednode closes the connection intentionally, we can do a better job of sending another request in its place in the event the "backup" connection is going to time out.
I'm not sure if we track metrics on seednode intentional disconnects, but this might be a sign of overloaded seednodes.
Version
v1.2.4
Steps to reproduce
I haven't fabricated this scenario for testing, but in my normal usage of Bisq there have been a few instances where (3/4) will hang for the full 90s before cycling to another seed node and starting. The logs indicate the error described above.
Expected behaviour
If a seednode intentionally disconnects a starting client, the client will retry with another valid node "immediately".
Actual behaviour
Connections intentionally closed by the seednode are not restarted with another valid seednode.
Screenshots
Device or machine
Ubuntu 19.10
Additional info
I don't have a lot of experience with the networking code, but it looks like
RequestDataManager::onDisconnect
only removes the handler from the internal map and doesn't trigger any action.Dec-16 09:24:25.738 - CLOSE_REQUESTED_BY_PEER (Seednode 1)
Dec-16 09:25:40.094 - Timeout (Seednode 2)
Dec-16 09:25:43.691 - Retry
The text was updated successfully, but these errors were encountered: