Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split by fibonacci sequence #3951

Closed
wants to merge 3 commits into from

Conversation

ZmnSCPxj
Copy link
Collaborator

@ZmnSCPxj ZmnSCPxj commented Aug 17, 2020

See: https://lists.linuxfoundation.org/pipermail/lightning-dev/2020-August/002778.html

Closes #3942

I now prefer this over #3942 , as I conjecture that a Fibonacci split would be able to fit into channel capacities better (fewer splits and better utilization of capacity).

Also sneaked in a tightening of the split-based amount fuzzing. Larger fuzzes cause us to deviate from the ideal Fibonacci schedule more.

Copy link
Collaborator

@niftynei niftynei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my general sense reading over this and the few other PRs related to splitting seem to indicate that for certain (i.e. large) payments, splitting at the 10k boundary is infeasible because of htlc count bandwidth limits -- too many parts for too little htlc space. which is to say that we're going to move away from the '80% good' number anyway -- using the fibonnaci sequence as a scaling factor for payment splits seems as reasonable as any for this.

one thing that confuses me a bit about this PR is that the fibonnacci sequence steps, aside from the first step, aren't close to the 10k 80% pass-able limit. if we're using a ladder of values, doesn't that push our payment reliability for the parts away from the 80% metric? what level of reliability can we expect at ever step in the split ladder?

i do wonder if there's a more intelligent way to split large payments, i.e. one large piece and then a bunch of smaller near the 10k limit? though if the larger payment won't get through, the smaller ones are useless.

at what point is pre-emptive splitting really more harmful than helpful? what are we hoping to gain from doing splits on this ladder schedule?

plugins/libplugin-pay.c Outdated Show resolved Hide resolved
@ZmnSCPxj
Copy link
Collaborator Author

if we're using a ladder of values, doesn't that push our payment reliability for the parts away from the 80% metric? what level of reliability can we expect at ever step in the split ladder?

For the steps on the ladder far above the 10k-sat boundary, the point is not so much reliability as it is privacy. Here I describe "standardized multipart splitting amounts": https://lists.linuxfoundation.org/pipermail/lightning-dev/2020-January/002431.html

If the payment succeeds at higher ladder steps, it leaks less information to use standardized split sizes.

An idea is to just remove the presplitter entirely as well, but now you leak the initial payment size to some nodes.

Though note that a lot of the privacy boost of standardized multipart splitting amounts only really exists once we have path decorrelation.

@niftynei
Copy link
Collaborator

For example, suppose we define standardized payment amounts 16, 8, 4, 2, 1, 0.5, 0.25, 0.125, 0.0625.
If we want to pay 1.3 units, we could split our payment into 1, 0.25, and 0.0625, leading to a total of 1.3125 units, for a 0.96% overpayment.

This PR doesn't do this though. afaict it picks a common target and splits the payment into equal sized parts. It'd be nice if it did the adaptive fitting that you outlined in your post.

@ZmnSCPxj
Copy link
Collaborator Author

Was thinking of implementing that as well. Give me a few days, am working on multifundchannel...

@ZmnSCPxj ZmnSCPxj force-pushed the fibonacci-split branch 5 times, most recently from 53d38e9 to 9e57d4f Compare August 21, 2020 04:33
@cdecker cdecker self-assigned this Aug 24, 2020
Changelog-Changed: pay: Adaptive splitter now splits by golden ratio, which should cause us to split approximately by a Fibonacci sequence.  This is conjectured to result in fewer splits.
…nd cap the number of presplits.

Changelog-Changed: pay: Cap the number of presplits, and use a Fibonacci schedule for presplits.
The fuzz only exists to prevent intermediate nodes from guessing
the distance to the payee by subtracting the nearest lower bin,
which gives us the fee paid to reach the payee.
But since we have a default limit on the max fee budget of 0.5%
(that some of our users complain is *too big*), fuzzing around the
bin size by +/-1% would be sufficient to obscure the fee paid to
reach the payee.

This reduces the presplitter +/-25% and the adaptive splitter
+/-10% to just 5%, which is more than enough to obscure the fee
paid to reach the payee, and could probably be made even smaller.
@ZmnSCPxj
Copy link
Collaborator Author

Rebased to adapt to paymod_log.

Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you got carried away with "fibonacci everything", even where there's no good rationale.

Overall I think having a schedule that covers larger amounts is definitely needed, and fibonacci seems at first sight to be just as good as power of two (they share the same asymptotic behavior btw), so I'm not opposed to that. However there are a number of things that I don't like in this PR:

  • The choice of fibonacci, or any unequal splitting schedule, is based on the assumption that we can guarantee that the larger of the two splits reaches the channel that'd be the bottleneck before the smaller one. Just queuing them in the correct order does not provide that guarantee even locally where we have that information, and any ordering we might want to impose breaks down pretty quickly. If we want to take advantage of the larger preceding the smaller ones there's little we can do other than sending each HTLC sequentially, and even in that case how do you detect that it is time to start the next one?
  • From what I could gather there are now two splitting schedules that are mixed: a desired amount split and a fibonacci sequence split. Please stick to one, especially since it seems the desired amount split is a special case of the general fibonacci sequence one.

Comment on lines +2901 to +3267
/* Payment `a` is the larger split, thus less likely
* to succeed, so start `a` first so that it gets into
* channels earlier. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that calling payment_start first on a does not guarantee that it'll actually result in its sendonion call to be first (which is what gives the HTLC its ID and thus it's the order in which they will be applied to channels. Furthermore even if we order them correctly locally, they might get reordered if the path prefix to the failing channel is not identical:

a -> b -> c -> d [large payment]
a -> c -> d [small payment]

In this case if c -> d is the bottleneck, the smaller payment may obstruct the larger one (notice that this is more related to latency than to path length, so it's hard to discern these locally). Generally speaking we have very little influence over the order the HTLCs are applied to remote channels, which account for the vast majority of capacity failures.

Copy link
Collaborator Author

@ZmnSCPxj ZmnSCPxj Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to add a path diversity paymod later which will getroute once for the shortest route, then generate alternate routes from there, by trial-banning some of the nodes/channels on the original route to generate various routes. That paymod will specifically delay later payments until it has generated alternate routes, as long as the payment goes to the same destination (which is significant for payees with multiple routehints).

The effect is that later splits will go through longer paths, and thus more likely to get delayed reaching the destination relative to the earlier payments, and since it defers later payments to the same destination (it immediately triggers on the earlier payments as soon as it receives the shortest path, then delays subsequent payments to the same destination until it has acquired alternate, and presumably longer, paths).

You want path diversity anyway, regardless of using a non-equal or equal split, since dumping all your payments on the same path is a waste. If you presplit into 2 parts and route both parts on the exact same path, like the current paymod system does, you just wasted fees and HTLC capacity.

The rule-of-thumb "earlier started is more likely to get available capacity" does not need to be perfect, just good enough to improve payment success rates.

Or just give up on MPP and use JIT routing, it is not as if we have any decent research on MPP, whereas JIT routing gets you 90% of what you need from MPP, is far simpler, and does not have any of this uncertainty of whether it would work out or not given this or that condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to add a path diversity paymod later which will getroute once for the shortest route, then generate alternate routes from there, by trial-banning some of the nodes/channels on the original route to generate various routes. That paymod will specifically delay later payments until it has generated alternate routes, as long as the payment goes to the same destination (which is significant for payees with multiple routehints).

Sounds like you're implementing Yen's k-shortest path algorithm :-)

The effect is that later splits will go through longer paths, and thus more likely to get delayed reaching the destination relative to the earlier payments, and since it defers later payments to the same destination (it immediately triggers on the earlier payments as soon as it receives the shortest path, then delays subsequent payments to the same destination until it has acquired alternate, and presumably longer, paths).

This is far from a guarantee however, given that we don't select by roundtrip latency, but rather costs that our payment incur. So it can very well be that even though a "longer" path has numerically more hops, it is shorter latency-wise. Using the time to compute longer routes as a delay between parts of the same payment also seems strange, given that there are several orders of magnitude between network latency and the time to compute routes, so we'd be back at the artificial introduction of delays, adding to the overall time to completion.

The rule-of-thumb "earlier started is more likely to get available capacity" does not need to be perfect, just good enough to improve payment success rates.

I'm not saying it has to be perfect, just wanted to play devils advocate and highlight this assumption as it doesn't have any guarantees.

You want path diversity anyway, regardless of using a non-equal or equal split, since dumping all your payments on the same path is a waste. If you presplit into 2 parts and route both parts on the exact same path, like the current paymod system does, you just wasted fees and HTLC capacity.

I think the path-diversity discussion is independent of the splitter discussion, but I totally agree that having more diversity is a good thing here. I have a patch that issues the getroute calls in sequential order, allowing a call to update the channel_hints before collecting exclusions for the next call, which can help avoid oversubscribing channels and introduce path diversity.

* 13 seems like a reasonable fibonacci-sequence
* number to use.
*/
#define REASONABLE_MAX_PRESPLIT 13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this have to be a fibonacci number? I get that aligning part amounts to fibonacci numbers might be sensible, but why the number of splits as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. If you can give a better one I am all ears.

Copy link
Collaborator Author

@ZmnSCPxj ZmnSCPxj Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably get a better one by asking node operators what their max-concurrent-htlcs are, ask them for their routing logs and check how often HTLCs get failed due to lack of fees, and figuring out the average length paths get, the number of channels intermediate nodes tend to be, and so on. The number currently there is derived ex asino anyway.

Comment on lines +2883 to +2890
if (**comment)
tal_append_fmt(comment, ", ");

/* Append a description of the new child. */
tal_append_fmt(comment, "new partid %"PRIu32" (%s)",
c->partid,
type_to_string(tmpctx, struct amount_msat,
&child_amount));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two can be combined to have a single allocation:

Suggested change
if (**comment)
tal_append_fmt(comment, ", ");
/* Append a description of the new child. */
tal_append_fmt(comment, "new partid %"PRIu32" (%s)",
c->partid,
type_to_string(tmpctx, struct amount_msat,
&child_amount));
/* Append a description of the new child. */
tal_append_fmt(comment, "%snew partid %"PRIu32" (%s)",
**commen d!= NULL ? ", ": "",
c->partid,
type_to_string(tmpctx, struct amount_msat,
&child_amount));

#define PRESPLIT_MAX_HTLC_SHARE 3
#define MPP_PRESPLIT_PREFERRED_AMOUNT 9227465L
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just keep the 10ksat amount? It's unlikely to make any difference since we randomize on each split, so any precision here is wasted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think the big randomization is valuable, which is why a later commit reduces those randomizations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue it is, in particular for the remainder amounts that do not fit nicely in a bucket

Comment on lines +2893 to +2899
/* Split by equal units of our preferred amount.
* If dividing the amount by the preferred amount resulted in fewer
* splits than the HTLC limit, we use this method of splitting, in
* order to get the theoretical 80% percentile success.
* Returns a one-line report, allocated off @ctx, of how we split up
* the payment.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please chose one way of splitting and do that all the way, no switching between splitting schedules. Any unnecessary complexity will likely come to bite us later on.

Comment on lines +3021 to +3023
/* Flip a coin, if it comes up heads, stop the presplit. */
if (pseudorand(2) == 0)
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason we have a similar randomization in shadow routing.

plugins/libplugin-pay.c Show resolved Hide resolved
@cdecker
Copy link
Member

cdecker commented Aug 27, 2020

Didn't expect #3986 to be merged so quickly, and come to think of it, it was probably not the right choice to mark this as Closed by #3986. @ZmnSCPxj shall we reopen this to continue looking for optimal splitting strategies?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants