New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PostRestartHtlcCleaner handle channel closing #1338
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1338 +/- ##
==========================================
- Coverage 86.42% 86.39% -0.04%
==========================================
Files 119 119
Lines 9261 9306 +45
Branches 390 387 -3
==========================================
+ Hits 8004 8040 +36
- Misses 1257 1266 +9
|
1520ca3
to
5936b1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fail to see how 8815b91 is related to dust HTLCs. Overridden/Timed out yes, but dust?
BTW I think dust HTLCs should be failed quickly (as soon as the commitment tx confirms), but I don't think we do it?
eclair-core/src/main/scala/fr/acinq/eclair/payment/relay/PostRestartHtlcCleaner.scala
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/payment/relay/PostRestartHtlcCleaner.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/channel/ChannelExceptions.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/payment/relay/PostRestartHtlcCleaner.scala
Outdated
Show resolved
Hide resolved
@@ -149,10 +149,10 @@ class Relayer(nodeParams: NodeParams, router: ActorRef, register: ActorRef, comm | |||
|
|||
case Status.Failure(addFailed: AddHtlcFailed) => | |||
addFailed.origin match { | |||
case Origin.Local(id, None) => log.error(s"received unexpected add failed with no sender (paymentId=$id)") | |||
case Origin.Local(id, None) => postRestartCleaner forward addFailed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beautiful, makes so much sense to have all those cases properly handled!
case Origin.Local(id, None) => postRestartCleaner forward addFailed | |
case Origin.Local(_, None) => postRestartCleaner forward addFailed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, I've always felt uneasy with those unhandled cases, I knew I was missing something but I didn't know what...Now I know, and I'll sleep better xD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're back and still not handled now, but this is because they really should never appear now that AddHtlcFailed
is used properly...still feels dangerous though, I don't know how I could improve that.
8815b91
to
813ad9c
Compare
813ad9c
to
4aee086
Compare
eclair-core/src/main/scala/fr/acinq/eclair/channel/Helpers.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/channel/Helpers.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/transactions/Scripts.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/payment/relay/PostRestartHtlcCleaner.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/payment/relay/PostRestartHtlcCleaner.scala
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/payment/relay/PostRestartHtlcCleaner.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/channel/Helpers.scala
Outdated
Show resolved
Hide resolved
When a channel is closed we want to remove its HTLCs from our list of pending broken HTLCs (they are being resolved on-chain). We should also ignore outgoing HTLCs that have already been settled upstream (which can happen when downstream is closing).
When a downstream channel is closing, we can safely fail upstream the HTLCs that were either timed out on-chain or not included in the broadcast commit transaction. Channels will not always raise events about those after a reboot, so we need to inspect the channel state and detect such HTLCs.
To extract the payment_hash or preimage from an HTLC script seen on-chain.
With MPP, it's possible that a channel contains multiple HTLCs for the same payment hash, and potentially even for the same expiry and amount. We add more fine-grained handling of HTLC timeouts that share the same payment hash. This allows a cleaner handling after a restart, and makes sure we correctly detect failure that should be propagated upstream. Otherwise we wouldn't be losing any money, but some channels may be closed that we can avoid.
6576390
to
0791853
Compare
eclair-core/src/main/scala/fr/acinq/eclair/channel/Helpers.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/channel/Helpers.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/payment/relay/PostRestartHtlcCleaner.scala
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/channel/Helpers.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/payment/relay/PostRestartHtlcCleaner.scala
Outdated
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/payment/relay/PostRestartHtlcCleaner.scala
Show resolved
Hide resolved
eclair-core/src/main/scala/fr/acinq/eclair/payment/relay/PostRestartHtlcCleaner.scala
Outdated
Show resolved
Hide resolved
A couple refactorings to avoid duplication and some clean-up.
It may happen that a commit tx and some htlc-timeout txs end up in the same block. In that case, there is no guarantee on the order we'll receive the confirmation events. If any tx in a local/remoteCommitPublished is confirmed, that implicitly means that the commit tx is confirmed (because it spends from it). So we can consider the closing type known and forward the failure upstream.
79f8d37 fixes the transient integration tests failure. These failures happened because we may receive There are many ways we can fix that:
I chose solution 1 as I think it's the one that makes more sense. Maybe we should rename the |
eclair-core/src/main/scala/fr/acinq/eclair/channel/ChannelTypes.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was because doing what I did for revoked created a regression on the |
There are a few edges cases where we currently don't reconcile correctly after a restart (which may lead to channels being closed, which is inconvenient). It's recommended to review commit by commit.
There are dangling elements in the map that can be ignored: HTLCs that still appear downstream in a closing channel, but were correctly resolved upstream (adde74c).
A more complex scenario to handle is when the downstream channel is closing (because our peer didn't send a revocation in time, an HTLC timed out or some other failure) and we missed the notification from the channel because of a reboot. In some cases the channel will not re-emit an event, so we need to look at the channel state to correctly fail upstream (d90500a).
There was also some clean-up that could be done on the scripts helpers and htlc-timeout txs post-MPP (2245197 and 4aee086).