Faster block sync #7101

jgriffiths · 2024-02-21T21:44:57Z

Per discussion on #6984, adding another introspection wrapper to avoid non-interesting matches is non-optimal. It's a duplicate of the is_xxx script call already exposed, and it doesn't fix the many other places where allocations are made unnecessarily.

It seems the original rationale for the copies is behaviour that was since fixed (see commit descriptions for details). But also, the script introspection functions are inconsistent w.r.t taking lengths. Removing this inconsistency by adding length args exposes a lot of places that were allocating redundantly which this PR addresses. In particular, we avoid allocations for interesting scripts which #6984 does not. We are then able to remove the allocating wrapper calls completely, ending up with less code overall.

There are 2 other drive-by fixes to improve performance: Avoid DB queries for non-interesting script types, and avoid computing unused key fingerprints when identifying wallet addresses. The latter in particular avoids a hash160 for every key tested, which is all used wallet keys plus the gap limit in the case of negative lookups, for every potentially interesting tx output.

The end result is less code, more code consistency and much less memory/cpu usage, especially for large wallets/after many txs have been made.

Reverts/replaces #6984.

cc: @cdecker

See the next commit for context on this revert. This reverts commit d185b0f. Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

…ating" The is_xxx script functions already perform these checks for us without allocating. They currently expect their memory to be tal-allocated, which is why a copy is made. However: - The memory already *is* tal-allocated since wally is configured to do so. - The tal-dependency is artifical and is removed in a future commit in this PR. This reverts commit c329756. Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

Standardizes the is_xxx script function all take a script length, and changes their first-level callers to pass it. This has several knock on benefits: - We remove the repeated tal_count/tal_bytelen calls on the script, in particular the redundant calls that result when we must check for multiple types of script - which is almost all cases. - We remove the dependency on the memory being tal-allocated (It is, in all cases, but theres no reason we need to require that). - We remove all cases where we create a copy of the script just to id it. - We remove all allocations for non-interesting scripts while iterating block txs in process_getfilteredblock_step1(). - We remove all allocations *including for potentially interesting scripts* in topo_add_utxos(). Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

Only one caller needs it, and they can trivially and cheaply compute it themselves. Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

Avoids a hash160 for every key we derive to test. Callers that need the key re-derive it without the skip flag, so there are no side effects from this optimization. Changelog-Changed: core: Processing blocks should now be faster Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

The pattern of making tal-allocated copies of wally data to pass around was made redundant after these calls were added by the use of tal_wally_start/tal_wally_end to parent wally allocations. We can thus just pass the data directly and avoid the allocations. Removes redundant allocations when checking tx filters and computing fees. Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

rustyrussell · 2024-02-22T01:54:16Z

This, like the previous patch it replaces, ARE MISSING BENCHMARKS.

Without that, I have no way of evaluating it. Is processing blocks really a bottleneck? Under what circumstances? Should we stop re-evaluating block on restart, instead? Should we query bitcoind via the bitcoin protocol instead of this hexdump?

cdecker · 2024-02-22T11:50:00Z

I still have the benchmarking setup from #6984 and will run this a couple of times.

jgriffiths · 2024-02-22T20:16:08Z

@rustyrussell despite the PR name, the primary motivation for this change was clean up the code, and remove the non-general optimization from the reverted commit. In particular, removing the extra tx wrapper calls, two of which are based on an outdated premise and are now an anti-pattern (*), and the other which duplicate already available logic. It also makes the script functions more general (they can operate on memory from anywhere) and internally consistent vs the previous mix of some taking lengths and other not. If you look at the complete diff you should be able to see that a bunch of code has been removed and what remains is now simpler/more consistent. Its a happy coincidence that this also removes a bunch of memory allocations and other logic on a previously identified hotspot.

The 2 drive-by optimizations are trivial, especially the key fingerprint one. I probably should have called this PR something else, that's my bad. IMO even if there were no performance improvement, this PR should still be merged. The code will be faster since we are avoiding doing a bunch of stuff we didn't need to, but the e.g. wall-time difference is not the primary metric by which the changes should be judged.

@cdecker Thanks for benchmarking, much appreciated! One thing to note is that as the wallet has more addresses generated, the effect of e.g. not computing the fingerprint grows larger. A simple test of syncing with a new/empty wallet will not reflect this - the longer a wallet has been used, the more difference this will make. If you generate a few hundred/thousand addresses before testing, your result will be more indicative of the effects in the real world.

(*) i.e. force the user to create redundant copies, when the need to do this has been long fixed.

cdecker · 2024-02-26T17:20:01Z

Took me a while to benchmark, since I redid all 3 comparison runs as well, but the results are in:


|-----------------------+------------------------------------------+--------+---------+-----------------+-----------|
| Name                  | Commit                                   | Blocks |    Time | Blocks / second |   Speedup |
|-----------------------+------------------------------------------+--------+---------+-----------------+-----------|
| Baseline              | 3cbe35e7da71c918434510b8051cc9d33fc8a23f |   1000 |  216.57 |       4.6174447 |        0. |
| #6583                 | 5650dfdc24cc19e8534885c5ec6a89eb8f38a262 |   1000 | 183.233 |       5.4575322 | 18.193775 |
| #6583 & #6984         | 0a18732b8b8af7b039835ea5c6c276e0b4f8095e |   1000 |  176.38 |       5.6695770 | 22.786030 |
| #6953 & #6984 & #7101 | afe613cc1a4a2aff6975d288848777c09eae5fa1 |   1000 |  157.20 |       6.3613232 | 37.767176 |

This is a different machine than the one I used to test in #6984, hence the different results. So there is quite a visible performance improvement with this patch added 👍

jgriffiths added 10 commits February 22, 2024 10:25

Revert "core: Defer extracting the script until we're sure we'll use it"

ad54ef9

See the next commit for context on this revert. This reverts commit d185b0f. Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

script: remove repeated tal_count calls from scripteq

48f2625

Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

wallet: avoid database lookups for non-interesting script types

5ba3a4b

Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

addr: avoid a redundant allocation for unknown script types

6caba6c

Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

wallet: remove output_is_p2sh from wallet_can_spend

b12d005

Only one caller needs it, and they can trivially and cheaply compute it themselves. Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

wallet: perform wallet_can_spend exit logic only once

2984458

Signed-off-by: Jon Griffiths <jon_p_griffiths@yahoo.com>

jgriffiths changed the title ~~WIP: Faster block sync~~ Faster block sync Feb 21, 2024

rustyrussell merged commit f39d2ee into ElementsProject:master Mar 18, 2024
36 checks passed

jgriffiths deleted the faster_block_sync branch March 18, 2024 02:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster block sync #7101

Faster block sync #7101

jgriffiths commented Feb 21, 2024 •

edited

rustyrussell commented Feb 22, 2024

cdecker commented Feb 22, 2024

jgriffiths commented Feb 22, 2024

cdecker commented Feb 26, 2024

Faster block sync #7101

Faster block sync #7101

Conversation

jgriffiths commented Feb 21, 2024 • edited

rustyrussell commented Feb 22, 2024

cdecker commented Feb 22, 2024

jgriffiths commented Feb 22, 2024

cdecker commented Feb 26, 2024

jgriffiths commented Feb 21, 2024 •

edited