Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More detailed error when HTLC could not be sent #634

Merged
merged 7 commits into from
May 14, 2024

Conversation

t-bast
Copy link
Member

@t-bast t-bast commented May 2, 2024

We use more detailed errors when we fail to send an HTLC, since this may happen because the channel is offline, syncing, opening or closing.

It may also happen that we have a channel in the Normal state that has enough funds, but we cannot send the HTLC because it hits a channel limit (too many HTLCs pending, a splice is ongoing, etc). When that happens we will retry ignoring the channel, which leads to the default NoAvailableChannels error: in that case, the "real" error can be found in the failures list of the OutgoingPaymentFailure instance.

We now explicitly map every failure to a potentially user-friendly error (PartFailure), whenever possible. When the error cannot be easily interpreted, we use a PartFailure.Uninterpretable error to wrap the actual error. Applications like Phoenix should define localized error messages for each of these errors and print them on failures.

⚠️ We change the format of LightningOutgoingPayment.Part.Status.Failed, which is stored in the DB. The migration should be handled by:

  • adding a new column error_code and a mapping from PartFailure to an error code
  • for PartFailure.Uninterpretable, the existing details column should be used for the error message
  • old rows should use the error_code of PartFailure.Uninterpretable and just display the stored details message

What do you think @dpad85, is that an acceptable way of handling backwards-compat for previously failed payments?

Fixes #616

We use more detailed errors when we fail to send an HTLC, since this
may happen because the channel is offline, syncing, opening or closing.

It may also happen that we have a channel in the `Normal` state that
has enough funds, but we cannot send the HTLC because it hits a channel
limit (too many HTLCs pending, a splice is ongoing, etc). When that
happens we will retry ignoring the channel, which leads to the default
`NoAvailableChannels` error: in that case, the "real" error can be found
in the `failures` list of the `OutgoingPaymentFailure` instance.

Fixes #616
Some payment errors can be converted into friendly user errors, telling
them whether they should retry or not and what should change in their
payment attempt.

This isn't the case for all errors: some of them cannot be interpreted
easily when they come from remote nodes, but that's ok, we're doing
the best we can when the error can be interpreted.
Copy link
Member

@pm47 pm47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just nits

And use sealed class hierarchy instead.
@dpad85
Copy link
Member

dpad85 commented May 3, 2024

What do you think @dpad85, is that an acceptable way of handling backwards-compat for previously failed payments?

For backward compat, it would work, yes.

However, these changes do not fix #616. When a payment fails, Phoenix does not display the failures for each parts. Payment parts are confusing for users, so we don't show them (except on iOS but we'll have to remove that).

Instead we show the FinalFailure of the overall LightningOutgoingPayment. This is how users can end up with the FinalFailure.NoAvailableChannels("no channels available to send payment") error message.

What would be great is a method that takes the FinalFailure + the list of PartFailure and returns a single, most pertinent type.

t-bast added 2 commits May 3, 2024 16:42
And update the `PaymentsDb` trait to use this type instead of an
`Either<ChannelException, FailureMessage>`.
@t-bast
Copy link
Member Author

t-bast commented May 3, 2024

What would be great is a method that takes the FinalFailure + the list of PartFailure and returns a single, most pertinent type.

I did something in 2932347, I'm not sure we can do better. There are always cases that won't be easy to programmatically analyze and will require the exact details of each payment attempt to figure out what went wrong, but it should work in most cases.

@t-bast
Copy link
Member Author

t-bast commented May 3, 2024

At some point we could simplify all of this greatly by assuming that lightning-kmp is always connected to a single trampoline peer with at most one channel available for payments, but that's a much larger refactoring.

The actual balance isn't useful and makes localization harder.
@dpad85
Copy link
Member

dpad85 commented May 6, 2024

Testing this PR on Phoenix, and trying to pay an already paid invoice a second time, I hoped to get a Part.Status...RecipientRejectedPayment failure.

Instead, the parent payment status is FinalFailure.RetryExhausted. The part child status is Part.Status...RecipientIsOffline (seems to have failed with a UnknownNextPeer error which maps to that failure).

FYI, the explain method in that case returns FinalFailure.RetryExhausted.

So we are still not able to provide a meaningful error message to the user, even with the improvements added in the PR.

dpad85 added a commit to ACINQ/phoenix that referenced this pull request May 6, 2024
Outgoing LN parts have a new failure type, which will
help provide more meaningful error messages. The
payments database interface has also changed to
accomodate these new types.

See ACINQ/lightning-kmp#634
Copy link
Member

@dpad85 dpad85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it's a nice improvement over what we have already.

@t-bast t-bast merged commit 6fd2a4e into master May 14, 2024
2 checks passed
@t-bast t-bast deleted the channel-error-sending-failed branch May 14, 2024 14:57
dpad85 added a commit to ACINQ/phoenix that referenced this pull request May 22, 2024
Outgoing LN parts have a new failure type, which will
help provide more meaningful error messages. The
payments database interface has also changed to
accomodate these new types.

See ACINQ/lightning-kmp#634
dpad85 added a commit to ACINQ/phoenix that referenced this pull request May 22, 2024
Outgoing LN parts have a new failure type, which will
help provide more meaningful error messages. The
payments database interface has also changed to
accomodate these new types.

See ACINQ/lightning-kmp#634
dpad85 added a commit to ACINQ/phoenix that referenced this pull request May 22, 2024
Outgoing LN parts have a new failure type, which will
help provide more meaningful error messages. The
payments database interface has also changed to
accomodate these new types.

See ACINQ/lightning-kmp#634
dpad85 added a commit to ACINQ/phoenix that referenced this pull request May 28, 2024
Outgoing LN parts have a new failure type, which will
help provide more meaningful error messages. The
payments database interface has also changed to
accomodate these new types.

See ACINQ/lightning-kmp#634

---------

Co-authored-by: Robbie Hanson <304604+robbiehanson@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More useful errors when sending fails
3 participants