New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channels shutting down with "bad reestablish revocation_number" #5240
Comments
I've had contact with the admin of one of the other peers. Their node (running c-lightning 6ced555) wasn't restarting either at the time of the disconnect, and doesn't have any other problems. I struggle to understand how this sudden state loss(?) could have came to be. I do think all three channels were fairly active, so it could be some kind of race condition. Messages in their log around the event:
I've started my node again (had shut it down in a kind of panic 😄) and am seeing no further channel closes. |
Looks like #5235 describes a similar issue. I did not make any change to the CLTV setting, though. |
First, it's worth pointing out that post channel open the code path is the same. In other words, after a channel state has moved from *The exception here is for 'leased' channels, which include an extra update message for the CSV lock that meters the lease timeout; but given that you've experienced the same issue for a non-leased channel, that's a good indication that that's likely not the problem here. Second, just as a point of interest, in the Thanks for the detailed report. |
Thanks for the explanation. I agree. It was pretty far-fetched for it to be about the kind of opening but couldn't think of anything else.
I wondered about who was blaming whom ! |
We had multiple reports of channels being unilaterally closed because it seemed like the peer was sending old revocation numbers. Turns out, it was actually old reestablish messages! When we have a reconnection, we would put the new connection aside, and tell lightningd to close the current connection: when it did, we would restart processing of the initial reconnection. However, we could end up with *multiple* "reconnecting" connections, while waiting for an existing connection to close. Though the connections were long gone, there could still be messages queued (particularly the channel_reestablish message, which comes early on). Eventually, a normal reconnection would cause us to process one of these reconnecting connections, and channeld would see the (perhaps very old!) messages, and get confused. (I have a test which triggers this, but it also hangs the connect command, due to other issues we will fix in the next release...) Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had multiple reports of channels being unilaterally closed because it seemed like the peer was sending old revocation numbers. Turns out, it was actually old reestablish messages! When we have a reconnection, we would put the new connection aside, and tell lightningd to close the current connection: when it did, we would restart processing of the initial reconnection. However, we could end up with *multiple* "reconnecting" connections, while waiting for an existing connection to close. Though the connections were long gone, there could still be messages queued (particularly the channel_reestablish message, which comes early on). Eventually, a normal reconnection would cause us to process one of these reconnecting connections, and channeld would see the (perhaps very old!) messages, and get confused. (I have a test which triggers this, but it also hangs the connect command, due to other issues we will fix in the next release...) Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had multiple reports of channels being unilaterally closed because it seemed like the peer was sending old revocation numbers. Turns out, it was actually old reestablish messages! When we have a reconnection, we would put the new connection aside, and tell lightningd to close the current connection: when it did, we would restart processing of the initial reconnection. However, we could end up with *multiple* "reconnecting" connections, while waiting for an existing connection to close. Though the connections were long gone, there could still be messages queued (particularly the channel_reestablish message, which comes early on). Eventually, a normal reconnection would cause us to process one of these reconnecting connections, and channeld would see the (perhaps very old!) messages, and get confused. (I have a test which triggers this, but it also hangs the connect command, due to other issues we will fix in the next release...) Fixes: #5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We had multiple reports of channels being unilaterally closed because it seemed like the peer was sending old revocation numbers. Turns out, it was actually old reestablish messages! When we have a reconnection, we would put the new connection aside, and tell lightningd to close the current connection: when it did, we would restart processing of the initial reconnection. However, we could end up with *multiple* "reconnecting" connections, while waiting for an existing connection to close. Though the connections were long gone, there could still be messages queued (particularly the channel_reestablish message, which comes early on). Eventually, a normal reconnection would cause us to process one of these reconnecting connections, and channeld would see the (perhaps very old!) messages, and get confused. (I have a test which triggers this, but it also hangs the connect command, due to other issues we will fix in the next release...) Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: #5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't have to put aside a peer which is reconnecting and wait for lightningd to remove the old peer, we can now simply free the old and add the new. Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Issue and Steps to Reproduce
In the last day, on a fairly large node, three channels have shut down on me with these messages:
I am sure two of them are with c-lightning nodes, probably all three. Two of them are fairly recent channels (opened last week).
I have never restored from a backup, nor shut down c-lightning in an unclean way in the last month at least. Also, maybe important: the closes did not happen upon my daemon starting, but while it was running and the channels were active for at least a few hours.
Using c-lightning master as of 43ace03.
getinfo
outputPlugins and configuration:
Logs before close:
Context information that may be useful:
funderupdate fixed 100 false
. Opening logs:FreeBSD 13.0-RELEASE-p2
, c-lightning is compiled withclang version 11.0.1
, default sqlite database backendThe text was updated successfully, but these errors were encountered: