Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossip store optimizations II: the gossipd parts #2545

Merged

Conversation

Projects
None yet
2 participants
@rustyrussell
Copy link
Contributor

commented Apr 9, 2019

Based on #2541

At the end of this, we leave all the gossip on disk (except as needed for routing). There are more optimizations available, but vitally this cuts gossipd mem usage from 2.6G to 577M for the million channels project.

Three other statistically significant changes:

  1. Store rewrite goes from 36 to 13 seconds.
  2. listnodes goes from 0.6 to 1.3 seconds.
  3. Sending all gossip to peers increases from 50 to 67 seconds.

Across this entire PR:

-store_load_msec:32825-36365(34615.6+/-1.1e+03)
+store_load_msec:36034-37853(37109.8+/-5.9e+02)
-vsz_kb:2637488
+vsz_kb:577456
-store_rewrite_sec:35.150000-36.200000(35.59+/-0.4)
+store_rewrite_sec:12.490000-13.250000(12.862+/-0.27)
-listnodes_sec:0.590000-0.710000(0.682+/-0.046)
+listnodes_sec:1.250000-1.480000(1.364+/-0.09)
-listchannels_sec:28.650000-31.080000(29.966+/-0.81)
+listchannels_sec:30.820000-31.480000(31.068+/-0.24)
-routing_sec:29.980000-33.100000(31.284+/-1.5)
+routing_sec:26.940000-27.990000(27.616+/-0.39)
-peer_write_all_sec:49.020000-52.890000(50.376+/-1.5)
+peer_write_all_sec:65.690000-68.600000(66.698+/-0.99)

rustyrussell added some commits Apr 9, 2019

gossipd: use u32 for timestamp.
We used an s64 so we could use -1 and save a check, but that's just
silly as we have adjacent non-u64 fields: wastes 7 bytes per node
and 16 per channel.

Interestingly, this seemed to make us a little slower for some reason.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:35569-38776(37169.8+/-1.2e+03)
	vsz_kb:2621808
	store_rewrite_sec:35.870000-40.290000(38.14+/-1.6)
	listnodes_sec:0.740000-0.800000(0.768+/-0.023)
	listchannels_sec:29.820000-32.730000(30.972+/-0.99)
	routing_sec:30.110000-30.590000(30.346+/-0.18)
	peer_write_all_sec:52.420000-59.160000(54.692+/-2.5)

MCP notable changes from previous patch (>1 stddev):
	-store_load_msec:32825-36365(34615.6+/-1.1e+03)
	+store_load_msec:35569-38776(37169.8+/-1.2e+03)
	-vsz_kb:2637488
	+vsz_kb:2621808
	-store_rewrite_sec:35.150000-36.200000(35.59+/-0.4)
	+store_rewrite_sec:35.870000-40.290000(38.14+/-1.6)
	-listnodes_sec:0.590000-0.710000(0.682+/-0.046)
	+listnodes_sec:0.740000-0.800000(0.768+/-0.023)
	-peer_write_all_sec:49.020000-52.890000(50.376+/-1.5)
	+peer_write_all_sec:52.420000-59.160000(54.692+/-2.5)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossip_store: avoid gratuitous copy on load.
Doesn't make measurable difference, but an obvious optimization.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: embed broadcast information into each structure.
This is more compact, but also required once we replace the arbitrary
"index" with an actual offset into the gossip store.  That will let us
remove the in-memory variants entirely.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:35685-38538(37090.4+/-9.1e+02)
	vsz_kb:2288768
	store_rewrite_sec:35.530000-41.230000(37.904+/-2.3)
	listnodes_sec:0.720000-0.810000(0.762+/-0.041)
	listchannels_sec:30.750000-35.990000(32.704+/-2)
	routing_sec:29.570000-34.010000(31.374+/-1.8)
	peer_write_all_sec:51.140000-58.350000(55.69+/-2.4)

MCP notable changes from previous patch (>1 stddev):
	-vsz_kb:2621808
	+vsz_kb:2288768

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: make routing_add_channel_update discard old timestamps.
This is currently done higher up, in handle_channel_update(), but
that's one reason why handle_channel_update() has to do a channel
lookup.  Moving the check down means handle_channel_update() can do a
minimal "get node id for this channel" so it can check the signature.

This helps, because the chan lookup semantics are changing in the next
few patches.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: handle duplicate nodes from unverified channel_announces pro…
…perly.

If we have a channel_announcement, we catch any node_announcement for
either end while we validate the channel_announcement.  But if we have
multiple channel_announcements and the first one failed to verify, it
would remove this catch, meaning we'd discard following node_announcements
even though there was a pending channel_announcement.

The answer is to use a simple reference count, and as a further
optimization, only place the `pending_node_announce` if there's no
node already.

We also move the process_pending_node_announcement() calls lower down,
so *any* new channel creation checks it.  This is more robust, and
will prove useful for the next patch, where we can use the same
mechanism to handle node_announcements on channel_announcements which
are verified, but don't yet have a channel_update.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: make gossip loading stats accurate.
They didn't count the header sizes when reporting bytes, which is
misleading.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: gossip_store_get() to load a single store entry.
This will allow us to load on demand, and not keep all messages in
memory.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: use file offset within store as broadcast index.
Instead of an arbitrary counter, we can use the file offset for our
partial ordering, removing a field.  It takes some care when we compact
the store, however, as this field changes.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:34271-35283(34789.6+/-3.3e+02)
	vsz_kb:2288784
	store_rewrite_sec:38.060000-39.130000(38.426+/-0.39)
	listnodes_sec:0.750000-0.850000(0.794+/-0.042)
	listchannels_sec:30.740000-31.760000(31.096+/-0.35)
	routing_sec:29.600000-33.560000(30.472+/-1.5)
	peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)

MCP notable changes from previous patch (>1 stddev):
	-store_load_msec:35685-38538(37090.4+/-9.1e+02)
	+store_load_msec:34271-35283(34789.6+/-3.3e+02)
	-vsz_kb:2288768
	+vsz_kb:2288784
	-peer_write_all_sec:51.140000-58.350000(55.69+/-2.4)
	+peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
broadcast: don't keep payload pointer.
If we need the payload, pull it from the gossip store.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:30189-52561(39416.4+/-8.8e+03)
	vsz_kb:1812904
	store_rewrite_sec:21.390000-27.070000(23.596+/-2.4)
	listnodes_sec:1.120000-1.230000(1.176+/-0.044)
	listchannels_sec:38.900000-50.580000(44.716+/-3.9)
	routing_sec:45.080000-48.160000(46.814+/-1.1)
	peer_write_all_sec:58.780000-87.150000(72.278+/-9.7)

MCP notable changes from previous patch (>1 stddev):
	-vsz_kb:2288784
	+vsz_kb:1812904
	-store_rewrite_sec:38.060000-39.130000(38.426+/-0.39)
	+store_rewrite_sec:21.390000-27.070000(23.596+/-2.4)
	-listnodes_sec:0.750000-0.850000(0.794+/-0.042)
	+listnodes_sec:1.120000-1.230000(1.176+/-0.044)
	-listchannels_sec:30.740000-31.760000(31.096+/-0.35)
	+listchannels_sec:38.900000-50.580000(44.716+/-3.9)
	-routing_sec:29.600000-33.560000(30.472+/-1.5)
	+routing_sec:45.080000-48.160000(46.814+/-1.1)
	-peer_write_all_sec:49.220000-52.690000(50.892+/-1.3)
	+peer_write_all_sec:58.780000-87.150000(72.278+/-9.7)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

@rustyrussell rustyrussell force-pushed the rustyrussell:gossip-store-optimizations branch from ee6c4da to f2edb66 Apr 10, 2019

@rustyrussell

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2019

Fixed some bugs found by travis, and rebased since merge of prereq.

@rustyrussell rustyrussell self-assigned this Apr 10, 2019

rustyrussell added some commits Apr 11, 2019

gossipd: don't create struct chan for yet-to-be-updated channels.
We currently create a struct chan when we receive a `channel_announcement`,
but we can only broadcast once we have a `channel_update` (since that
provides the timestamp).

This means a `struct chan` can be in a weird state where it exists,
but is unusable (can't use without an update), and also means we need to
keep the channel_announcement message around until an update arrives, so
we can put it in the gossip_store.

Instead, keep track of these "unupdated" channels separately, and check
for them in all the places we search for a specific channel to update.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:30640-33236(32202+/-8.7e+02)
	vsz_kb:1812956
	store_rewrite_sec:13.410000-16.970000(14.438+/-1.3)
	listnodes_sec:0.590000-0.660000(0.62+/-0.033)
	listchannels_sec:28.140000-29.560000(28.816+/-0.56)
	routing_sec:29.530000-32.590000(30.352+/-1.1)
	peer_write_all_sec:60.380000-61.320000(60.836+/-0.37)

MCP notable changes from previous patch (>1 stddev):
	-vsz_kb:1812904
	+vsz_kb:1812956
	-store_rewrite_sec:21.390000-27.070000(23.596+/-2.4)
	+store_rewrite_sec:13.410000-16.970000(14.438+/-1.3)
	-listnodes_sec:1.120000-1.230000(1.176+/-0.044)
	+listnodes_sec:0.590000-0.660000(0.62+/-0.033)
	-listchannels_sec:38.900000-50.580000(44.716+/-3.9)
	+listchannels_sec:28.140000-29.560000(28.816+/-0.56)
	-routing_sec:45.080000-48.160000(46.814+/-1.1)
	+routing_sec:29.530000-32.590000(30.352+/-1.1)
	-peer_write_all_sec:58.780000-87.150000(72.278+/-9.7)
	+peer_write_all_sec:60.380000-61.320000(60.836+/-0.37)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: don't use cached node_announcement for redundancy checking
Re-parse the existing message, since we'e going to get rid of those
fields.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: remove some fields from struct chan.
The txout_script field is unused; the local_disable only applies to
the handful of local channels, so move that into a hash table.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:39207-45089(41374.6+/-2.2e+03)
	vsz_kb:1202316
	store_rewrite_sec:15.090000-16.790000(15.654+/-0.63)
	listnodes_sec:1.290000-3.790000(1.938+/-0.93)
	listchannels_sec:30.190000-32.120000(31.31+/-0.69)
	routing_sec:28.220000-31.340000(29.314+/-1.2)
	peer_write_all_sec:66.830000-76.850000(71.976+/-3.6)

MCP notable changes from previous patch (>1 stddev):
	-store_load_msec:35107-37944(36686+/-1e+03)
	+store_load_msec:39207-45089(41374.6+/-2.2e+03)
	-vsz_kb:1218036
	+vsz_kb:1202316
	-listchannels_sec:28.510000-30.270000(29.6+/-0.6)
	+listchannels_sec:30.190000-32.120000(31.31+/-0.69)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: remove info fields from struct node.
Reload them from disk if they do listnodes.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:35390-38659(37336.4+/-1.3e+03)
	vsz_kb:1780516
	store_rewrite_sec:13.800000-16.800000(15.02+/-0.98)
	listnodes_sec:1.280000-1.530000(1.382+/-0.096)
	listchannels_sec:28.700000-30.440000(29.34+/-0.68)
	routing_sec:30.120000-31.080000(30.526+/-0.35)
	peer_write_all_sec:65.910000-76.850000(69.462+/-4.1)

MCP notable changes from previous patch (>1 stddev):
	-vsz_kb:1792996
	+vsz_kb:1780516
	-listnodes_sec:1.030000-1.120000(1.068+/-0.032)
	+listnodes_sec:1.280000-1.530000(1.382+/-0.096)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: don't keep channel_announcement messages in memory.
MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:35107-37944(36686+/-1e+03)
	vsz_kb:1218036
	store_rewrite_sec:14.060000-17.970000(15.966+/-1.6)
	listnodes_sec:1.270000-1.350000(1.314+/-0.034)
	listchannels_sec:28.510000-30.270000(29.6+/-0.6)
	routing_sec:30.230000-31.510000(30.83+/-0.44)
	peer_write_all_sec:67.390000-70.710000(68.568+/-1.2)

MCP notable changes from previous patch (>1 stddev):
	-vsz_kb:1780516
	+vsz_kb:1218036

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: don't keep node_announcement messages in memory.
MCP results from 5 runs, min-max(mean +/- stddev):
store_load_msec:34779-38628(36903.4+/-1.4e+03)
vsz_kb:1792996
store_rewrite_sec:14.440000-15.040000(14.672+/-0.24)
listnodes_sec:1.030000-1.120000(1.068+/-0.032)
listchannels_sec:27.860000-32.850000(30.05+/-1.7)
routing_sec:30.020000-31.700000(31.044+/-0.56)
peer_write_all_sec:65.100000-70.600000(68.422+/-2)

-vsz_kb:1780516
+vsz_kb:1792996
-listnodes_sec:1.280000-1.530000(1.382+/-0.096)
+listnodes_sec:1.030000-1.120000(1.068+/-0.032)

MCP notable changes from previous patch (>1 stddev):
	-store_load_msec:30640-33236(32202+/-8.7e+02)
	+store_load_msec:34779-38628(36903.4+/-1.4e+03)
	-vsz_kb:1812956
	+vsz_kb:1792996
	-listnodes_sec:0.590000-0.660000(0.62+/-0.033)
	+listnodes_sec:1.030000-1.120000(1.068+/-0.032)
	-peer_write_all_sec:60.380000-61.320000(60.836+/-0.37)
	+peer_write_all_sec:65.100000-70.600000(68.422+/-2)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: allow reading from the store during load.
When we no longer keep channel_updates in memory, there's a path where
we access them on load: when we promote a local channel to an
announced channel.

This breaks at the moment, since gs->fd == -1; change it to a writable
flag instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: unify is_chan_public / is_chan_announced.
We used to have a `struct chan` while we're waiting for an update; now we
keep that internally.  So a `struct chan` without a channel_announcement
in the store is private, and other is public.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: use pread in the store.
The next patch causes us to access the store while loading (we read
channel_updates for local peers), which messes up loading due to the
lseek involved.

Using pread() is atomic with seek & read, and also a bit more
efficient.  Make the header contiguous too, while we're here.

We don't need pwrite: we always open with O_APPEND which means the
seek-to-end is implicit.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:36771-38289(37529.6+/-5.3e+02)
	vsz_kb:1202316
	store_rewrite_sec:12.460000-13.280000(12.784+/-0.29)
	listnodes_sec:1.240000-1.410000(1.34+/-0.058)
	listchannels_sec:29.850000-31.840000(30.908+/-0.69)
	routing_sec:27.800000-31.790000(28.822+/-1.5)
	peer_write_all_sec:66.200000-68.720000(67.44+/-0.84)

MCP notable changes from previous patch (>1 stddev):
	-store_load_msec:39207-45089(41374.6+/-2.2e+03)
	+store_load_msec:36771-38289(37529.6+/-5.3e+02)
	-store_rewrite_sec:15.090000-16.790000(15.654+/-0.63)
	+store_rewrite_sec:12.460000-13.280000(12.784+/-0.29)
	-peer_write_all_sec:66.830000-76.850000(71.976+/-3.6)
	+peer_write_all_sec:66.200000-68.720000(67.44+/-0.84)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: free tmpctx children in gossip_store_load loop.
We're accumulating children, and we'll get more in the successive
patches.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: don't keep channel_updates in memory.
This requires some trickiness when we want to re-add unannounced channels
to the store after compaction, so we extract a common "copy_message" to
transfer from old store to new.

MCP results from 5 runs, min-max(mean +/- stddev):
	store_load_msec:36034-37853(37109.8+/-5.9e+02)
	vsz_kb:577456
	store_rewrite_sec:12.490000-13.250000(12.862+/-0.27)
	listnodes_sec:1.250000-1.480000(1.364+/-0.09)
	listchannels_sec:30.820000-31.480000(31.068+/-0.24)
	routing_sec:26.940000-27.990000(27.616+/-0.39)
	peer_write_all_sec:65.690000-68.600000(66.698+/-0.99)

MCP notable changes from previous patch (>1 stddev):
	-vsz_kb:1202316
	+vsz_kb:577456

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
gossipd: adjust peers' broadcast_offset when compacting store.
When we compact the store, we need to adjust the broadast index for
peers so they know where they're up to.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

@rustyrussell rustyrussell force-pushed the rustyrussell:gossip-store-optimizations branch from f2edb66 to 169207a Apr 11, 2019

@niftynei

This comment has been minimized.

Copy link
Collaborator

commented Apr 12, 2019

ACK 169207a

@niftynei niftynei merged commit 261921d into ElementsProject:master Apr 12, 2019

2 checks passed

ackbot PR ack'd by niftynei
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@rustyrussell rustyrussell added the MCP label May 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.