-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xud will not reconnect after connection is lost #1699
Comments
I think i faced this issue once on testnet env but without 30 min offline and when i tried to connect manually i got connection refused err. Restart of env fixed problem for me and for @erkarl. |
it seems i can easily reproduce it by switching on/off my VPN |
I noticed that I didn't have any peers on testnet today. Restarting
And at times failing to reconnect multiple times in a row because peer says we are already connected:
|
At least the latter should be fixable by checking connected peers (which can be inbound) before firing a reconnect. Looks like we don't do that and continue trying to reconnect even though the peer already connected to us. What is your take? @LePremierHomme |
@michael1011 observed a similar scenario on the boltz xud testnet node:
|
@kilrau from your first log paragraph I don’t see that the peer is rejecting the reconnection (perhaps because the log is partial), and in the second one I don’t see what precedes the rejection. We always check for connected peers before trying to open a new outbound connection. This makes me think that maybe the peer view is that we were still connected, which i’m not sure is possible (because we don’t disconnect upon receiving @kilrau @raladev Can you reproduce this and get the full logs ( |
@michael1011 can you still digg up these logs? #1699 (comment) |
Description: Steps:
Logs: Note: UPDATE after 1hour: Update after 45 min of offline |
Just ran a test a machine running testnet and mainnet arby, unplugging the network cable at 9:44, re-plugging at 10:25: testnet_xud_1:
*UNPLUG mainnet_xud_1:
*UNPLUG Observations: xud reconnected to all peers just fine and arby started issuing orders again. Since we don't know how exactly to reproduce this until someone (cc @raladev ) found a way to reliably reproduce this. |
Mainnet xud-docker environment did not reconnect to existing peers in 6 hours when host machine lost connectivity and came back online 30 minutes later.
I'm gathering logs and findings from future investigation into this issue.
The text was updated successfully, but these errors were encountered: