Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permuteroute #2890

Closed
wants to merge 14 commits into from
Closed

Permuteroute #2890

wants to merge 14 commits into from

Conversation

ZmnSCPxj
Copy link
Collaborator

@ZmnSCPxj ZmnSCPxj commented Aug 1, 2019

This is a new command/routing heuristic, inspired by @renepickhardt JIT-Routing.

It turns out to be an actual technique used in real-time strategy games (whose pathfinding issues actually surprisingly match that of LN: low acceptable latency for UX, dynamically-changing world, incomplete information (at least the really good games will not let your units know the "real" state of the world, only what your fog-of-war knows), large maps).

I will be posting a very lengthy article on lightning-dev mailinglist regarding permuteroute and other heuristics inspired by real-time strategy game techniques.

This is incomplete. Help wanted.

  • Test for permuteroute, especially for route smoothing (i.e. backing out of a node when all alternatives are excluded).
  • Modify pay plugin to use permuteroute and fall back on getroute if failed / too expensive.

@ZmnSCPxj
Copy link
Collaborator Author

ZmnSCPxj commented Aug 1, 2019

Article on permuteroute and other routing techniques on lightning-dev: https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-August/002095.html

@ZmnSCPxj ZmnSCPxj requested a review from cdecker as a code owner August 1, 2019 07:00
tests/test_gossip.py Outdated Show resolved Hide resolved
tests/test_gossip.py Outdated Show resolved Hide resolved
tests/test_gossip.py Outdated Show resolved Hide resolved
@darosior
Copy link
Collaborator

darosior commented Aug 1, 2019

I gave it a look and especially to integrate it in pay, so should we use getroute, use permaroute on error, and then fall back to getroute if permaroute's route is not acceptable ?

@ZmnSCPxj
Copy link
Collaborator Author

ZmnSCPxj commented Aug 2, 2019

Yes, that is approximately what is needed.

Some subtleties:

  1. If we keep repeating permuteroute on the same pivot (erring_index remains same) then the node might have just incoming capacity and we might be better off backing off and using getroute.
  2. In case of a NODE (bit 0x2000 in failure_code) error, we should really add the node to the exclude list. Unfortunately we do not support this yet (and we need a node_is_routable function to determine that a node is in the exclude list; alternately we just exclude every channel the node has). So for initial cut of pay using permuteroute just falling back to getroute for NODE failures would be doable. Also, for the purpose of permuteroute we should back off by one erring_index i.e. use erring_index - 1 for NODE errors; also, a local NODE error (erring_index == 0) means our own node is breaking down and we should probably fail all pay anyway.

@ZmnSCPxj ZmnSCPxj force-pushed the permuteroute branch 3 times, most recently from f17c5f4 to 072e815 Compare August 2, 2019 07:37
@ZmnSCPxj
Copy link
Collaborator Author

ZmnSCPxj commented Aug 2, 2019

Test passing whee!

@ZmnSCPxj ZmnSCPxj force-pushed the permuteroute branch 2 times, most recently from 676fa65 to 9ba5ba5 Compare August 4, 2019 06:59
@ZmnSCPxj
Copy link
Collaborator Author

ZmnSCPxj commented Aug 5, 2019

Re-marking as WIP. I am reconceptualizing the search algorithm, as the current naive depth-first search algorithm is likely to rescan the same nodes over and over when at nodes with many channels.

Sketch:

  • Make the struct node djikstra field a union among different search scratch spaces. union { struct { struct amount_msat total_fee; struct amount_msat total_risk; } djikstra; } s;
  • Define a new struct { struct chan *from_chan; int depth; } permute; in the above union (since we cannot do getroute and permuteroute in parallel anyway).
  • Have a fixed-length FIFO queue of struct node *. When full and we want to add nodes we just drop the nodes (and probably break out of the scanning loop).
  • Put the pivot as the first node in the queue, with depth = 1 for that pivot.
  • While the queue is not empty:
    • Get a node.
    • If it is a goal node, extract the channels to the pivot and exit the loop: we have found a route.
    • If node has depth == PERMUTE_DEPTH + 1 then skip back to the top of the loop.
    • For each channel of the node:
      • If channel is not routable, skip to next channel.
      • Get the other node of the channel.
      • If the other node has depth != 0 skip to the next channel.
      • Set the other node depth to this node depth plus 1, and its from_chan to the channel.
      • Put the node into the queue.
      • If queue is full, exit this loop (continue the outer loop).

@ZmnSCPxj
Copy link
Collaborator Author

ZmnSCPxj commented Aug 6, 2019

Tweaked algorithm seems good.

I need help with someone who can run a mainnet node to compare getroute vs. permuteroute times.

@rustyrussell
Copy link
Contributor

In general, I prefer to adapt new low-level commands and do fancy things in plugins.

route permutation can be done by asking to route from the node before the erroring channel to a node afterwards. That's currently no more efficient than doing a complete new route, since we don't limit our Dijkstra (we normally want to know if there's a possible route). Here's a patch which adds that (untested!) though:

https://gist.github.com/rustyrussell/3c752686991fb73ce256eefabee2ee2a

@ZmnSCPxj
Copy link
Collaborator Author

ZmnSCPxj commented Aug 8, 2019

True, but the currently-exposed getroute only allows one destination. Preferentially we want to have multiple points after the break to connect to, as multiple possible destinations allows for earlier return (and our main point is to reduce the time it takes to get an alternate route). Current getroute also runs from the destination to the source, so I am uncertain how to handle multiple potential destinations, hmm. It probably can be made to work.

We also need to somehow expose smoothen_route. Or if not that, expose a recalcroute, where we recompute the fees of a route. The latter is preferred, because recalcroute is needed to handle UPDATE failure codes i.e. we should not be excluding channels on UPDATE failure codes, we should instead recalcroute and see if the recalculated route has different fees, and fail the attempt (and exclude the channel) if the fee has not changed (meaning the UPDATE did not, in fact, change the costs).

smoothen_route can be done simply by performing the smoothening (i.e. if a node is duplicated in the route, cut from the first point to the duplicate point) and then doing a recalcroute. Smoothening is needed when splicing paths together, because loops are just useless, but could be formed by a naive pathfinding algo.

In any case I am now working on getroutequick, so what is our plan for this now? Modify getroute to make it multi-destination limited-hop and make permuteroute a plugin on top? Or go with this for now and make an issue to move permuteroute to a plugin later?

@ZmnSCPxj
Copy link
Collaborator Author

ZmnSCPxj commented Aug 8, 2019

https://gist.github.com/rustyrussell/3c752686991fb73ce256eefabee2ee2a

As I noted there, this adds at least 4 bytes to each struct node, and if we are not careful about alignment issues on 64-bit may end up adding 8 bytes, for a feature of getroute that only really benefits permuteroute.

Now, the reason to limit the number of hops to scan is really to limit the number of nodes to scan, because O(n log n) on number of nodes scanned. So a different thing we can do is to instead a max_nodes_scanned. This requires a single counter on the stack rather than having an extra counter per node. We can backward-compatibly default this to U32_MAX. We can also report number of nodes scanned back over the RPC interface (we are maintaining the variable anyway during the Dijkstra run, might as well report it back).

Then we can do some test getroute runs to find out how many nodes we typically scan during a typical getroute run, then divide by 10 or something and use that as the max_nodes_scanned for permuteroute.

Then permuteroute can just getroute from the before-break node to the after-break node. If this results in a route that passes through a later node before reaching the after-break node, route smoothening should be able to shorten the path.

@ZmnSCPxj ZmnSCPxj added the state::shelved This issue/PR has been shelved since an alternative/better approach has been proposed. label Sep 12, 2019
@ZmnSCPxj
Copy link
Collaborator Author

Shelved. In favor of #3001 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gossip help wanted pay-plugin routing state::shelved This issue/PR has been shelved since an alternative/better approach has been proposed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants