Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clboss nodes force close on peers for unknown reasons #121

Closed
heyambob opened this issue May 9, 2022 · 6 comments
Closed

clboss nodes force close on peers for unknown reasons #121

heyambob opened this issue May 9, 2022 · 6 comments

Comments

@heyambob
Copy link

heyambob commented May 9, 2022

Not sure if it's a clboss issue or something else, but there are complaints about clboss nodes fire mysterious force close on peers and I got one force close by a clboss node myself.

One node operator said a clboss node opened to him, had decent routing traffic, about a month, the clboss node force closed the channel. I also had a channel with the clboss node, but I was the one who opened. Same story, got pretty good traffic, and force closed about a month.

On my clightning log, it reads, "Onchain funding spend".

I thought, maybe this is a specific issue of this node, but then today another node operator complained that he opened to a different clboss node, had good traffic, then force closed within a day.

Could it be some legitimate causes to force close, like stuck htlcs but clboss fire force close without giving a reason to peer?

@ZmnSCPxj
Copy link
Owner

ZmnSCPxj commented May 10, 2022

CLBOSS only closes channels if you have configured --clboss-auto-close, in which case if the peer is offline for too long (less than 25% of the time for 3 days) it will close. "Offline" here means "not connected, even though I look like I am connected to the Internet (by pinging other peers and checking for www.google.com and other well-known large servers)". Offlineness checks are suspended if we are offline.

You can verify that close is only done by the --clboss-auto-close by fgrep -R "close" Boss/Mod. Most instances of the world close exist only in Boss/Mod/PeerComplaintsDesk/, which records complaints about nodes being offline, and Boss/Mod/PeerJudge, which acts on those complaints by doing the actual closing. Boss/Mod/Complainer* are part of this facility --- they are the ones that feed complaints to the complaints desk. Boss/Mod/JitRebalancer.cpp mentions "close" because it looks at the unilateral_close feerate but does not actually do any closing --- you can look for an rpc.command( "close" or rpc->command( "close", which only exists in PeerJudge/Executioner.cpp.

"Onchain funding spend seen" means it was force closed, yes.

Channel closure does not give any reason. Nothing in the wire-level protocol allows for it. The most usual one is HTLC time out, still a problem that is not really focused on by most devs, and is compounded by high feerates:

  • A route is built from A -> B -> C -> D -> E -> F.
  • The owner of E accidentally trips on the power cord, shutting down the node, and hits their head on the wall, knocking them unconscious for several hours and unable to put their node back online.
  • D->E HTLC times out, forcing D to force close.
  • High feerates means that D is unable to put the D->E channel close in a block, preventing it from claiming the D->E HTLC, and also preventing it from offchain-failing the C->D HTLC (because until D can onchain-timelock-claim the D->E HTLC, the D->E HTLC can still be onchain-hashlock-claimed by the E if it is only pretending to be asleep).
  • C->D HTLC times out before D->E can be clawed back onchain, forcing C to force close.
  • Feerates are higher now since more transactions are competing for block space (D->E channel close and timeout clawback, C->D channel close and timeout clawback).
  • B->C HTLC times out before D->E and C->D can be clawed back onchain, forcing B to force close.
  • Feerates are even higher now....
  • etc.

High feerates and a random node going down simultaneously can interrupt many payment routings, leading to the above situation playing out over multiple payment routings and thus close many channels, which further compounds the high feerates.

@btweenthebars
Copy link

Thanks for explaining the chain of force closing, I never understood it.

If the cause is HTLC time out, why the log doesn't say "Offered HTLC x SENT_ADD_ACK_REVOCATION cltv y hit deadline" ?

@ZmnSCPxj
Copy link
Owner

It would say so if you are closing for that reason, not if the other side is closing. Again, closing does not give a reason to the remote side, the remote side simply has to handle the onchain case correctly.

Alternately, it is possible that the other node is CLBOSS and has clboss-auto-close enabled. Some older versions (0.11B or so) can be fairly agressive about closing channels, though that has been toned down on 0.12.

@ZmnSCPxj
Copy link
Owner

ZmnSCPxj commented May 13, 2022

Possibly related? ElementsProject/lightning#5240 If the remote is a CLBOSS node it is a C-Lightning node, too. Closes triggered by the remote side do not give you any particular reports on your side, so it would really only be seen by the remote.

@heyambob
Copy link
Author

Possibly related? ElementsProject/lightning#5240 If the remote is a CLBOSS node it is a C-Lightning node, too. Closes triggered by the remote side do not give you any particular reports on your side, so it would really only be seen by the remote.

I don't think so as when my node fired the revocation force closes, my CLN peers also received the reason of force closing too.

@ZmnSCPxj
Copy link
Owner

I don't think so as when my node fired the revocation force closes, my CLN peers also received the reason of force closing too.

Do you mean "revoked transaction close"? The reason for force closing is a suspected attempt to cheat (which may be accidental, e.g. you restored from an obsolete state). This can be detected onchain. But presumably you know if you tried to do this.

There is no message that sends a human-readable message from the closing node. If there is, show it to me on https://github.com/lightning/bolts

There is an error message which sends a human-readable message. However, it is the receiver of that message which initiates the unilteral close, as per the spec. Maybe this is what is confusing?

clboss does not trigger any error messages. It only closes using the close command, which first tries a mutual close, then after some time simply force-closes without sending any other messages. Force closes are detected onchain, not via the communication channel.

Even if CLBOSS wanted to send some kind of error message, it cannot, as there is no C-Lightnning API that sends error to a peer. In addition, sending error in order to close is arguably bad design --- it forces the peer to do the unilateral close, which means they suffer the CSV locktime while our own node gets the funds onchain ASAP. Closing via this method is arguably unfair, since We are the one who wants to close yet it is our peer who suffers. The error message is intended to be a signal "our understanding of the protocol spec is too different", not for typical management of channels.

Closing, wontfix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants