Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autoclean gains SUPER POWERS (and other optimizations) #5594

Merged
merged 30 commits into from
Sep 22, 2022

Conversation

rustyrussell
Copy link
Contributor

autoclean is generalized to delete old listforwards/listpays/listinvoices (not just expired invoices). And much optimization is done, so that it takes only 3-6 minutes to delete 1M entries.

This should help with larger nodes, in particular.

There are some new APIs in here! listhtlcs and delforward in particular...

@rustyrussell
Copy link
Contributor Author

Rebased....

@rustyrussell
Copy link
Contributor Author

Fixed up rust generated stuff, and also updated delpay docs (and added missing request schema!).

Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubts regarding the RPC batching support, which I don't see how it can work without jeopardizing correctness.

lightningd/pay.c Show resolved Hide resolved
wallet/wallet.c Show resolved Hide resolved
wallet/wallet.c Outdated Show resolved Hide resolved
wallet/db.c Show resolved Hide resolved
plugins/autoclean.c Show resolved Hide resolved
doc/lightning-batching.7.md Show resolved Hide resolved
@rustyrussell
Copy link
Contributor Author

rustyrussell commented Sep 19, 2022

Rebased on master, and:

  • Final commit to null-out the now-unused short_channel_id field in channels (use scid!)
  • expand documentation of "batching" command:
The **batching** RPC command allows (but does not guarantee!) database
commitments to be deferred when multiple commands are issued on this RPC
connection.  This is only useful if many commands are being given at once, in
which case it can offer a performance improvement (the cost being that if
there is a crash, it's unclear how many of the commands will have been
persisted).

@rustyrussell
Copy link
Contributor Author

Ack 1de8bca

@rustyrussell
Copy link
Contributor Author

Extended timeouts in test_autoclean: CI is slow esp under valgrind!

Ack e134332

@rustyrussell rustyrussell force-pushed the guilt/more-autoclean branch 2 times, most recently from b77f1fd to 282c8b2 Compare September 22, 2022 03:41
@rustyrussell
Copy link
Contributor Author

Trivial rebase for generated code clash (contrib/pyln-testing/pyln/testing/node_pb2.py).

Ack 282c8b2

This can happen, and in fact does below in our test_autoclean_once
test where we update the datastore, and return from the cmd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
contrib/giantnode.py shows we're spending a lot of time looking up
payments by payment_hash: sendpays does it to see if we've already
paid, and bookkeeper does it in listsendpays:

```
-   94.52%     0.00%  lightningd  lightningd            [.] read_json
   - 94.52% read_json                                                
      - 94.48% parse_request                                         
         - 94.46% plugin_hook_call_rpc_command                       
            - plugin_hook_call_                                      
               - rpc_command_hook_final                              
                  - 94.46% command_exec                              
                     - 49.08% json_sendpay                           
                        - 49.01% send_payment                        
                           - 48.86% send_payment_core                
                              - 48.84% wallet_payment_list           
                                 - 48.80% db_step                    
                                    + db_sqlite3_step                
                     - 45.38% json_listsendpays                      
                        - 45.36% wallet_payment_list                 
                           - 45.30% db_step                          
                              + 45.30% db_sqlite3_step               
```

This doesn't actually make much of a difference, so see next patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Filtering by status is rare, so we can do it in the caller; just let sqlite3
filter by payment_hash.

With ~650,000 payments in db:

Before:
```
129/300000 complete 5.60/sec (33078 invs, 169 pays, 0 retries) in 30 seconds. 19 hours-14 hours remaining.
201/300000 complete 7.20/sec (43519 invs, 241 pays, 0 retries) in 40 seconds. 16 hours-11 hours remaining.
257/300000 complete 5.60/sec (54568 invs, 289 pays, 0 retries) in 50 seconds. 16 hours-14 hours remaining.
305/300000 complete 4.80/sec (65772 invs, 337 pays, 0 retries) in 60 seconds. 16 hours-17 hours remaining.
361/300000 complete 5.60/sec (75875 invs, 401 pays, 0 retries) in 70 seconds. 16 hours-14 hours remaining.
```

After:
```
760/300000 complete 40.00/sec (19955 invs, 824 pays, 0 retries) in 20 seconds. 2 hours-2 hours remaining.
1176/300000 complete 41.60/sec (30082 invs, 1224 pays, 0 retries) in 30 seconds. 2 hours-119 minutes remaining.
1584/300000 complete 40.80/sec (40224 invs, 1640 pays, 0 retries) in 40 seconds. 2 hours-2 hours remaining.
1984/300000 complete 40.00/sec (49938 invs, 2048 pays, 0 retries) in 50 seconds. 2 hours-2 hours remaining.
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This one directly contains the scids of the channels involved, not
references, so can outlive the channels.  As a side-effect, however,
it now never lists `payment_hash`.  Having it listed (via join) is not
possible as it is a *string* in the channels table, and difficult
anyway because of channel aliases.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…_id_str, add db_bind_scid.

Although it's deprecated already (because it stores as string), it's
better to make the name explicit.  And create a new helper which stores as BIGINT.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Without helpers, we were using the u64 functions.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Normally, we'd use the delete_columns function to remove the old
`short_channel_id` string field, *but* we can't do that for sqlite, as
there are other tables with references to it.  So add a FIXME to do
it once everyone has upgraded to an sqlite3 which has native support
for column deletion.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…s table.

And remove the now-unused string-based helper functions.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And document that we never know payment_hash.

Changelog-Added: JSON-RPC: `listforwards` now shows `in_htlc_id` and `out_htlc_id`
Changelog-Changed: JSON-RPC: `listforwards` now never shows `payment_hash`; use `listhtlcs`.
Using `listfowards` for this wrong; expose this directly if people
care (and unlike listforwards, which could be deleted, we have to
remember these while the channel is still open!).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: JSON-RPC: `listhtlcs` new command to list all known HTLCS.
Changelog-Added: JSON-RPC: `delforward` command to delete listforwards entries.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And take the opportunity to rename l0 and l1 in the tests to the
more natural l1 l2 (since we add l3).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's more natural: we will eventually support dynamic config variables,
so this will be quite nice.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Plugins: `autoclean` can now delete old forwards, payments, and invoices automatically.
And remove deprecated autocleaninvoice docs.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: Plugins: `autoclean-status` command to see what autoclean is doing.
Changelog-Deprecated: JSON-RPC: `autocleaninvoice` (use option `autoclean-expiredinvoices-age`)
Changelog-Added: Plugins: `autoclean-once` command for a single cleanup.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They slow down benchmarking, which is kind of unfair!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
autoclean was using 98% of its time in memmove; we should simply keep
an offset, and memmove when it's empty.  And also, only memmove the
used region, not the entire buffer!

Running on product of giantnodes.py:

	$ time l1-cli autoclean-once failedpays 1
	{
	   "autoclean": {
	      "failedpays": {
	         "cleaned": 26895,
	         "uncleaned": 1000000
	      }
	   }
	}

Before:
	real	20m46.579s
	user	0m0.000s
	sys	0m0.001s
	
After:
	real	2m10.568s
	user	0m0.001s
	sys	0m0.000s

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use linked list, not an array.

```
+   97.89%     0.01%  autoclean  autoclean             [.] next_plan
-   97.08%     0.01%  autoclean  autoclean             [.] json_stream_output_write
   - 97.06% json_stream_output_write
      - 84.29% ld_stream_complete
         - 83.87% tal_arr_remove_
            + 83.71% __memcpy_avx_unaligned_erms (inlined)
      + 12.76% rpc_stream_complete
+   96.59%     0.03%  autoclean  autoclean             [.] tal_arr_remove_
+   96.48%     0.00%  autoclean  libc.so.6             [.] __memcpy_avx_unaligned_erms (inlined)
+   94.98%    94.98%  autoclean  libc.so.6             [.] __memmove_avx_unaligned_erms
+   84.29%     0.01%  autoclean  autoclean             [.] ld_stream_complete
+   12.76%     0.00%  autoclean  autoclean             [.] rpc_stream_complete
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This shows the benefits of batching when being slammed by autoclean:


Before:

```
   plugin-autoclean: last 1000 deletes took 6699022 nsec each
   plugin-autoclean: last 1000 deletes took 6734025 nsec each
   plugin-autoclean: last 1000 deletes took 6681189 nsec each
   plugin-autoclean: last 1000 deletes took 6597148 nsec each
   plugin-autoclean: last 1000 deletes took 6637085 nsec each
   plugin-autoclean: last 1000 deletes took 6801425 nsec each
   plugin-autoclean: last 1000 deletes took 6788572 nsec each
   plugin-autoclean: last 1000 deletes took 6603641 nsec each
   plugin-autoclean: last 1000 deletes took 6642947 nsec each
   plugin-autoclean: last 1000 deletes took 6590495 nsec each
   plugin-autoclean: last 1000 deletes took 6695076 nsec each
   plugin-autoclean: last 1000 deletes took 6477981 nsec each
```

After:
```
   plugin-autoclean: last 1000 deletes took 342764 nsec each
   plugin-autoclean: last 1000 deletes took 375031 nsec each
   plugin-autoclean: last 1000 deletes took 357564 nsec each
   plugin-autoclean: last 1000 deletes took 381581 nsec each
   plugin-autoclean: last 1000 deletes took 337989 nsec each
   plugin-autoclean: last 1000 deletes took 329391 nsec each
   plugin-autoclean: last 1000 deletes took 328322 nsec each
   plugin-autoclean: last 1000 deletes took 372810 nsec each
   plugin-autoclean: last 1000 deletes took 351228 nsec each
   plugin-autoclean: last 1000 deletes took 413885 nsec each
   plugin-autoclean: last 1000 deletes took 348317 nsec each
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Previous commit was a hack which *always* batched where possible, this
is a more sophisticated opt-in varaint, with a timeout sanity check.

Final performance for cleaning up 1M pays/forwards/invoices:

```
$ time l1-cli autoclean-once succeededpays 1
{
   "autoclean": {
      "succeededpays": {
         "cleaned": 1000000,
         "uncleaned": 26895
      }
   }
}

real	6m9.828s
user	0m0.003s
sys	0m0.001s
$ time l2-cli autoclean-once succeededforwards 1
{
   "autoclean": {
      "succeededforwards": {
         "cleaned": 1000000,
         "uncleaned": 40
      }
   }
}

real	3m20.789s
user	0m0.004s
sys	0m0.001s
$ time l3-cli autoclean-once paidinvoices 1
{
   "autoclean": {
      "paidinvoices": {
         "cleaned": 1000000,
         "uncleaned": 0
      }
   }
}

real	6m47.941s
user	0m0.001s
sys	0m0.000s
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Added: JSON-RPC: `batching` command to allow database transactions to cross multiple back-to-back JSON commands.
Schema was too loose since we did deprecations.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is actually what the autoclean plugin wants, especially since
you can't otherwise delete a payment which has failed then succeeded.

But insist on neither or both being specified, at least for now.

Changelog-Added: JSON-RPC: `delpay` takes optional `groupid` and `partid` parameters to specify exactly what payment to delete.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
```
"label": "db/db_postgres.c:294:char[]",
"backtrace": [
  "ccan/ccan/tal/tal.c:442 (tal_alloc_)",
  "ccan/ccan/tal/tal.c:471 (tal_alloc_arr_)",
  "ccan/ccan/tal/str/str.c:91 (tal_vfmt_)",
  "ccan/ccan/tal/str/str.c:44 (tal_fmt_)",
  "db/db_postgres.c:294 (db_postgres_delete_columns)",
  "wallet/db.c:1546 (migrate_payments_scids_as_integers)",
  "wallet/db.c:986 (db_migrate)",
  "wallet/db.c:1019 (db_setup)",
  "wallet/wallet.c:101 (wallet_new)",
  "lightningd/lightningd.c:1023 (main)",
  "../csu/libc-start.c:308 (__libc_start_main)"
],
"parents": [
  "db/utils.c:317:struct db",
  "lightningd/lightningd.c:107:struct lightningd"
]
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…r migration

Suggested-by: @cdecker
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CI is really slow: it sees all three expire at once.  But making the
timeouts too long is painful in non-VALGRIND, so I ended up making it
conditional.

```
         # First it expires.
        wait_for(lambda: only_one(l3.rpc.listinvoices('inv1')['invoices'])['status'] == 'expired')
        # Now will get autocleaned
        wait_for(lambda: l3.rpc.listinvoices('inv1')['invoices'] == [])
>       assert l3.rpc.autoclean_status()['autoclean']['expiredinvoices']['cleaned'] == 1
E       assert 3 == 1

tests/test_plugin.py:2975: AssertionError
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@cdecker
Copy link
Member

cdecker commented Sep 22, 2022

ACK 2ac334d

@cdecker cdecker merged commit 651753b into ElementsProject:master Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants