Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossip Store Optimization III: Now i can spell Dijkstra #2546

Merged

Conversation

Projects
None yet
2 participants
@rustyrussell
Copy link
Contributor

commented Apr 9, 2019

Based on #2545

This cuts down the size of 'struct node' since Dijkstra needs 1/20 of the room that Bellman Ford Gibson did, but with a bit of algorithmic violence we can make it do what we want.


There was a bug in the test.  We go from about 60 seconds to 12 though, not bad.

@rustyrussell rustyrussell requested a review from cdecker as a code owner Apr 9, 2019

@rustyrussell rustyrussell added this to the 0.7.1 milestone Apr 9, 2019

@rustyrussell rustyrussell force-pushed the rustyrussell:now-i-can-spell-dijkstra branch 3 times, most recently from 569ded5 to 3c5a4ac Apr 11, 2019

rustyrussell added some commits Apr 17, 2019

tools/bench-gossipd.sh: fix routing.
Channels have a htlc_minimum_msat of 10000, which is why we didn't
find routes.

This makes a significant speed drop:

-routing_sec:26.940000-27.990000(27.616+/-0.39)
+routing_sec:60.70

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd/test: add test for handling overlong routes.
This is a weakness with Dijkstra, so write an explicit unit test that
we can find a short enough (but more expensive) route.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: extract common functionality.
This will be needed by Dijkstra as well.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: speed Dijkstra a little.
Our uintmap can be a little slow with all the reallocation, so leave
NULL entries and walk to find the first one.  Since we don't clean
them up, keep a cache of where the min non-all-NULL value is in the
heap.

It's clearer benefit on really large tests, so here's 1M nodes:

Comparison using gossipd/test/run-bench-find_route 1000000 10:

Before:
	10 (10 succeeded) routes in 1000000 nodes in 91995 msec (9199532898 nanoseconds per route)

After:
	10 (10 succeeded) routes in 1000000 nodes in 20605 msec (2060539287 nanoseconds per route)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: implement Dijkstra.
Use a uintmap as our minheap.

Note that Dijkstra can give overlength routes, so some checks are disabled.

Comparison using gossipd/test/run-bench-find_route 100000 10:

Before:
	10 (10 succeeded) routes in 100000 nodes in 120087 msec (12008708402 nanoseconds per route)
After:
	10 (10 succeeded) routes in 100000 nodes in 2269 msec (226925462 nanoseconds per route)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: temporarily disable fuzz in routing.
This allows precise comparison between Dijkstra and Bellman-Ford without
worrying about fuzz.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd/routing: remove BFG implementation.
Now we can benchmark, and remove 500 bytes per node.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:35093-37907(36146+/-1.1e+03)
	vsz_kb:555168
	store_rewrite_sec:12.120000-13.750000(12.7+/-0.6)
	listnodes_sec:1.270000-1.370000(1.322+/-0.039)
	listchannels_sec:29.770000-31.600000(30.82+/-0.64)
	routing_sec:0.00
	peer_write_all_sec:63.630000-67.850000(65.432+/-1.7)

MCP notable changes from pre-Dijkstra (>1 stddev):
	-vsz_kb:577456
	+vsz_kb:555168
	-routing_sec:60.70
	+routing_sec:12.04

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: re-add fuzz logic to routing.
Do it inside the can_reach() function, which is less optimal for BFG
which does 20 ops on the same channel, but fine for Dijkstra.

This does have a measurable cost, so we might want to use
non-cryptographic fuzz in future:

$ gossipd/test/run-bench-find_route 100000 100:

Before:
	100 (100 succeeded) routes in 100000 nodes in 97346 msec (973461784 nanoseconds per route)

After:
	100 (100 succeeded) routes in 100000 nodes in 113381 msec (1133813412 nanoseconds per route)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd/routing: Iterate on Dijkstra when route is too long.
If a route is too long, we try to bias Dijkstra towards choosing a
shorter route by adding a per-hop cost.  We do a naive "shortest path"
pass, then using that cost as a ceiling on per-hop cost, we do a
binary search.

There are some subtleties: we use risk rather than total as our
counter field (we normally bias this by 1 anyway, so it's easy to make
that a variable), and we set riskfactor to a mimimal value once we're
iterating.  It's good enough to get a solution, we don't need to do a
2-dimensional search on riskfactor and riskbias.

Of course, this is extremely slow if we hit it on our benchmark,
though it doesn't happen in a more realistic network:

$ gossipd/test/run-bench-find_route 100000 100:

Before:
	100 (79 succeeded) routes in 100000 nodes in 25341 msec (253412314 nanoseconds per route)

After:
	100 (100 succeeded) routes in 100000 nodes in 97346 msec (973461784 nanoseconds per route)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

@rustyrussell rustyrussell force-pushed the rustyrussell:now-i-can-spell-dijkstra branch from 3c5a4ac to 73cf67d Apr 17, 2019

@rustyrussell

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2019

Rebased with benchmark fix prepended

@cdecker
Copy link
Member

left a comment

ACK 73cf67d

@cdecker

This comment has been minimized.

Copy link
Member

commented Apr 17, 2019

Excellent PR, had a lot of fun reading through and refreshing my memory about Dijkstra as well 😉

@rustyrussell rustyrussell merged commit 0fc4241 into ElementsProject:master Apr 18, 2019

2 checks passed

ackbot PR ack'd by cdecker
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@rustyrussell rustyrussell added the MCP label May 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.