cluster mempool: cluster linearization algorithm #30126

sipa · 2024-05-16T20:29:15Z

Part of cluster mempool: #30289

This introduces low-level cluster linearization code, including tests and some benchmarks. It is currently not hooked up to anything.

Ultimately, what this PR adds is a function Linearize which operates on instances of DepGraph (instances of which represent pre-processed transaction clusters) to produce and/or improve linearizations for that cluster.

To provide assurance, the code heavily relies on fuzz tests. A novel approach is used here, where the fuzz input is parsed using the serialization.h framework rather than FuzzedDataProvider, with a custom serializer/deserializer for DepGraph objects. By including serialization, it's possible to ascertain that the format can represent every relevant cluster, as well as potentially permitting the construction of ad-hoc fuzz inputs from clusters (not included in this PR, but used during development).

The Linearize(depgraph, iteration_limit, rng_seed, old_linearization) function is an implementation of the (single) LIMO algorithm, with the $S$ in every iteration found as the best out of (a) the best remaining ancestor set and (b) randomized computationally-bounded search. It incrementally builds up a linearization by finding good topologically-valid subsets to move to the front, in such a way that the resulting linearization has a diagram that is at least as good as the old_linearization passed in (if any).

Despite using both best ancestor set and search, this is not Double LIMO, as no intersections between these are involved; just the best of the two.
The iteration_limit and rng_seed only control the (b) randomized search. Even with 0 iterations, the result will be as good as the old linearization, and the included sets at every point will have a feerate at least as high as the best remaining ancestor set at that point.

The search algorithm used in the (b) step is very basic, and largely matches Section 2.1 of How to Linearize your Cluster.. See #30286 for optimizations to make it more efficient.

For background and references, see Introduction to cluster linearization.

DrahtBot · 2024-05-16T20:29:17Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	instagibbs, glozow, sdaftuar

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30286 (cluster mempool: optimized candidate search by sipa)
#30285 (cluster mempool: merging & postprocessing of linearizations by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot · 2024-05-16T22:28:59Z

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

_{Debug: https://github.com/bitcoin/bitcoin/runs/25072594213}

sipa · 2024-05-20T18:53:52Z

Benchmarks on my Ryzen 5950X system:

ns/op	op/s	err%	total	benchmark
2,373.94	421,240.11	0.1%	1.10	`LinearizeNoIters16TxWorstCase`
7,530.22	132,798.26	0.0%	1.07	`LinearizeNoIters32TxWorstCase`
16,585.34	60,294.20	0.1%	1.10	`LinearizeNoIters48TxWorstCase`
28,591.70	34,975.18	0.1%	1.10	`LinearizeNoIters64TxWorstCase`
53,918.56	18,546.49	0.0%	1.10	`LinearizeNoIters75TxWorstCase`
93,589.21	10,684.99	0.1%	1.10	`LinearizeNoIters99TxWorstCase`

ns/iters	iters/s	err%	total	benchmark
45.36	22,045,550.98	0.5%	1.10	`LinearizePerIter16TxWorstCase`
35.57	28,111,376.58	0.1%	1.10	`LinearizePerIter32TxWorstCase`
33.04	30,262,951.89	0.0%	1.10	`LinearizePerIter48TxWorstCase`
33.21	30,107,745.17	0.1%	1.10	`LinearizePerIter64TxWorstCase`
75.98	13,161,530.63	0.4%	1.07	`LinearizePerIter75TxWorstCase`
76.62	13,051,066.77	0.5%	1.08	`LinearizePerIter99TxWorstCase`

ns/op	op/s	err%	total	benchmark
332.97	3,003,274.74	0.0%	1.10	`PostLinearize16TxWorstCase`
1,121.92	891,330.77	0.0%	1.10	`PostLinearize32TxWorstCase`
3,358.33	297,767.01	0.3%	1.13	`PostLinearize48TxWorstCase`
5,826.72	171,623.05	0.5%	1.11	`PostLinearize64TxWorstCase`
7,453.31	134,168.55	0.1%	1.07	`PostLinearize75TxWorstCase`
12,476.44	80,151.09	0.1%	1.10	`PostLinearize99TxWorstCase`

This means that for a 64-transaction cluster, it should be possible to linearize (28.59 µs) with 100 candidate search iterations (3.32 µs) plus postlinearize (5.83 µs), within a total of 37.74 µs, on my system.

src/util/bitset.h

sipa · 2024-05-23T15:18:30Z

I've dropped the dependency on #29625, and switched to using FastRandomContext instead; there is a measurable slowdown from using the (ChaCha20-based) FastRandomContext over the (xoroshiro128++-based) InsecureRandomContext introduced there, but it's no more than 1-2%. I can switch back to that approach if 29625 were to make it in.

DrahtBot · 2024-05-23T18:33:11Z

Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use]

File	commit `83ae1ba` (master)	commit `e5cbc23` (master and this pull)
SHA256SUMS.part	`24fd016e03e8c7da...`	`15fae3483445e33b...`
*-aarch64-linux-gnu-debug.tar.gz	`94942cf7dedf3604...`	`23eeccf77ee5799d...`
*-aarch64-linux-gnu.tar.gz	`4b30ca93b6788f48...`	`ed8e5024d960f53e...`
*-arm-linux-gnueabihf-debug.tar.gz	`a0f57c45e5f02bb1...`	`f22f89c1eba49dda...`
*-arm-linux-gnueabihf.tar.gz	`9f0376baaf54b988...`	`17da8a968635c492...`
*-arm64-apple-darwin-unsigned.tar.gz	`9b952b32db70d099...`	`16d805ab4bcf8d54...`
*-arm64-apple-darwin-unsigned.zip	`d49361bbbc5529fc...`	`e225d79a24b058a5...`
*-arm64-apple-darwin.tar.gz	`34e9cf4b79cbc190...`	`29b28e6d57761201...`
*-powerpc64-linux-gnu-debug.tar.gz	`5f322a7b213e244e...`	`cb5f37b036b5c52c...`
*-powerpc64-linux-gnu.tar.gz	`bb57b46482c5b1e6...`	`57adf954458a27d5...`
*-riscv64-linux-gnu-debug.tar.gz	`d1a3a405c5b45fff...`	`237eb467f8547d22...`
*-riscv64-linux-gnu.tar.gz	`68d7e6671e2dba30...`	`29d9f1e9052e96d3...`
*-x86_64-apple-darwin-unsigned.tar.gz	`6fb22000e8c14c40...`	`67e5bd5b86483c8a...`
*-x86_64-apple-darwin-unsigned.zip	`1c5f2a216e87cbf5...`	`abdbca97fafc146f...`
*-x86_64-apple-darwin.tar.gz	`66f17a574163ecaf...`	`f002830b4b8da330...`
*-x86_64-linux-gnu-debug.tar.gz	`a5044f956a824228...`	`0791685e39e80672...`
*-x86_64-linux-gnu.tar.gz	`23af1dc6cb921b37...`	`ff9625165c3f19c2...`
*.tar.gz	`caac4a182deb1e04...`	`ba8abeef4165dafb...`
guix_build.log	`c7cc0190f7085f04...`	`100da60c2f0e6686...`
guix_build.log.diff		`7c460aa3b1aafc32...`

sipa · 2024-07-25T12:15:33Z

At this point could we maybe not rebase on master unless necessary? :)

I shall resist.

instagibbs · 2024-07-25T12:19:13Z

reACK 448374f

via git range-diff master df749c06e9d45ba92a4699e1d3e58522f93ad6bb 448374f2c901c48824904357b4e1297105e97315

only changes to MakeHardGraph to reflect the intended topology

glozow · 2024-07-25T13:08:20Z

reACK 448374f via range-diff

…ypes This primarily adds the DepGraph class, which encapsulates precomputed ancestor/descendant information for a given transaction cluster, with a number of utility features (inspectors for set feerates, computing reduced parents/children, adding transactions, adding dependencies), which will become needed in future commits.

This introduces a bespoke fuzzing-focused serialization format for DepGraphs, and then tests that this format can represent any graph, roundtrips, and then uses that to test the correctness of DepGraph itself. This forms the basis for future fuzz tests that need to work with interesting graphs.

This is a class that encapsulates precomputed ancestor set feerates, and presents an interface for getting the best remaining ancestor set.

Similar to AncestorCandidateFinder, this encapsulates the state needed for finding good candidate sets using a search algorithm.

A fuzz test is added which verifies various of its expected properties, including correctness

This adds a first version of the overall linearization interface, which given a DepGraph constructs a good linearization, by incrementally including good candidate sets (found using AncestorCandidateFinder and SearchCandidateFinder).

Add benchmarks for known bad graphs for the purpose of search (as an upper bound on work per search iterations) and ancestor sorting (as an upper bound on linearization work with no search iterations).

Switch to BFS exploration of the search tree in SearchCandidateFinder instead of DFS exploration. This appears to behave better for real world clusters. As BFS has the downside of needing far larger search queues, switch back to DFS temporarily when the queue grows too large.

To make search non-deterministic, change the BFS logic from always picking the first queue item to randomly picking the first or second queue item.

It encapsulates a given linearization in chunked form, permitting arbitrary subsets of transactions to be removed from the linearization. Its purpose is adding the Intersect function, which is a crucial operation that will be used in a further commit to make Linearize improve existing linearizations.

This implements the LIMO algorithm for linearizing by improving an existing linearization. See https://delvingbitcoin.org/t/limo-combining-the-best-parts-of-linearization-search-and-merging for details.

sipa · 2024-07-25T14:19:06Z

Ok, the mermaid diagrams should be fixed now, match the code, and I've verified (post- #30286) that the claims about difficulty of linearization hold for the code.

I shall resist.

I failed.

instagibbs · 2024-07-25T15:04:06Z

reACK 647fa37

via git range-diff master 448374f2c901c48824904357b4e1297105e97315 647fa37cdbadbeebba147ca6b24e138559cffaaf

I traced through the generated diagrams + code, seems to match. Didn't verify difficulty.

glozow · 2024-07-25T15:05:11Z

reACK 647fa37, both code and mermaid diagram look correct to me

The git diff is always greener
When you rebase by mistake
We dream about cluster mempool
Don care how many reACKs it take
Just look at the code around you
The bench was correct before!
Under the sea, Under the sea...

sdaftuar · 2024-07-25T20:41:09Z

ACK 647fa37

In addition to some manual testing, I ran all the fuzz targets on my machine for a while as well.

hebasto · 2024-07-31T09:55:39Z

Ported to the CMake-based build system in hebasto#290.

…ions bbcee5a clusterlin: improve rechunking in LinearizationChunking (optimization) (Pieter Wuille) 04d7a04 clusterlin: add MergeLinearizations function + fuzz test + benchmark (Pieter Wuille) 4f8958d clusterlin: add PostLinearize + benchmarks + fuzz tests (Pieter Wuille) 0e2812d clusterlin: add algorithms for connectedness/connected components (Pieter Wuille) 0e52728 clusterlin: rename Intersect -> IntersectPrefixes (Pieter Wuille) Pull request description: Part of cluster mempool: #30289 Depends on #30126, and was split off from it. #28676 depends on this. This adds the algorithms for merging & postprocessing linearizations. The `PostLinearize(depgraph, linearization)` function performs an in-place improvement of `linearization`, using two iterations of the [Linearization post-processing](https://delvingbitcoin.org/t/linearization-post-processing-o-n-2-fancy-chunking/201/8) algorithm. The first running from back to front, the second from front to back. The `MergeLinearizations(depgraph, linearization1, linearization2)` function computes a new linearization for the provided cluster, given two existing linearizations for that cluster, which is at least as good as both inputs. The algorithm is described at a high level in [merging incomparable linearizations](https://delvingbitcoin.org/t/merging-incomparable-linearizations/209). For background and references, see [Introduction to cluster linearization](https://delvingbitcoin.org/t/introduction-to-cluster-linearization/1032). ACKs for top commit: sdaftuar: ACK bbcee5a glozow: code review ACK bbcee5a instagibbs: ACK bbcee5a Tree-SHA512: d2b5a3f132d1ef22ddf9c56421ab8b397efe45b3c4c705548dda56f5b39fe4b8f57a0d2a4c65b338462d80bb5b9b84a9a39efa1b4f390420a8005ce31817774e

ismaelsadeeq · 2024-08-26T16:57:50Z

post merge ACK 647fa37

9ad2fe7 clusterlin: only start/use search when enough iterations left (Pieter Wuille) bd04435 clusterlin: improve heuristic to decide split transaction (optimization) (Pieter Wuille) 71f2629 clusterlin: include topological pot subsets automatically (optimization) (Pieter Wuille) e20fda7 clusterlin: reduce computation of unnecessary pot sets (optimization) (Pieter Wuille) 6060a94 clusterlin bench: add example hard cluster benchmarks (Pieter Wuille) 2965fbf clusterlin: track upper bound potential set for work items (optimization) (Pieter Wuille) 9e43e4c clusterlin: use feerate-sorted depgraph in SearchCandidateFinder (Pieter Wuille) b80e6df clusterlin: add reordering support for DepGraph (Pieter Wuille) 85a285a clusterlin: separate initial search entries per component (optimization) (Pieter Wuille) e4faea9 clusterlin bench: have low/high iter benchmarks instead of per-iter (Pieter Wuille) Pull request description: Part of cluster mempool: #30289 Depends on #30126, and was split off from it. This improves the candidate search algorithm introduced in the previous PR with a variety of optimizations. The resulting search algorithm largely follows Section 2 of [How to linearize your cluster](https://delvingbitcoin.org/t/how-to-linearize-your-cluster/303#h-2-finding-high-feerate-subsets-5), though with a few changes: * Connected component analysis is performed inside the search algorithm (creating initial work items per component for each candidate), rather than once at a higher level. This duplicates some work but is significantly simpler in implementation. * No ancestor-set based presplitting inside the search is performed; instead, the `best` value is initialized with the best topologically valid set known to the LIMO algorithm before search starts: the better one out of the highest-feerate remaining ancestor set, and the highest-feerate prefix of remaining transactions in `old_linearization`. * Work items are represented using an included set *inc* and an undefined set *und*, rather than included and excluded. * Potential sets *pot* are not computed for work items with empty *inc*. At a high level, the only missing optimization from that post is bottleneck analysis; my thinking is that it only really helps with clusters that are already relatively cheap to linearize (doing so would need to be done at a higher level, not inside the search algorithm). --- Overview of the impact of each commit here on linearize performance: * **[clusterlin bench: have low/high iter benchmarks instead of per-iter](https://github.com/bitcoin/bitcoin/pull/30286/commits/21a184db63bd13cf0c2dff9e6e61ca85f2ff4454)**: no impact * **[separate initial search entries per component (optimization)](https://github.com/bitcoin/bitcoin/pull/30286/commits/c84c5c86ba6c12ce11e997d0c59d60694c667488)**: reduce iterations, increase start-up cost * **[add reordering support for DepGraph](https://github.com/bitcoin/bitcoin/pull/30286/commits/019ff2960976b677d91bab47e75be6ed180b37a3)**: no impact * **[use feerate-sorted depgraph in SearchCandidateFinder](https://github.com/bitcoin/bitcoin/pull/30286/commits/8e27dd5a220237da5d362376db6406e47f063776)**: typically reduce iterations, increase start-up cost * **[track upper bound potential set for work items](https://github.com/bitcoin/bitcoin/pull/30286/commits/781e0fb3aa5cbd0983b01547481eefa0804dfb5b)**: reduce iterations, increase cost per iteration * **[reduce computation of unnecessary pot sets](https://github.com/bitcoin/bitcoin/pull/30286/commits/9fe834fa979c8ab038aec57dac2f468ca70c41a9)**: reduce cost per iteration * **[include topological pot subsets automatically](https://github.com/bitcoin/bitcoin/pull/30286/commits/30612710a4e5fc8c950bd1ff4ab9fa843b59132d)**: reduce iterations, increase cost per iteration * **[improve heuristic to decide split transaction](https://github.com/bitcoin/bitcoin/pull/30286/commits/1880c00ab1661dbfc867f69e7556b28bc3cf7a42)**: typically reduce iterations, increase cost per iteration * **[only start/use search when enough iterations left](https://github.com/bitcoin/bitcoin/pull/30286/commits/12760a57b3a6cd1aeb3b7532311f648de2b45aa4)**: just account for start-up cost as equivalent iterations ACKs for top commit: sdaftuar: ACK 9ad2fe7 instagibbs: reACK 9ad2fe7 glozow: reACK 9ad2fe7, just have a question about the docs Tree-SHA512: 108bcbb0676f36071eb83954059b5f3d6646c745015b644a2a5d7f5a8ac9424c2d01d339fa6318a3aff4cf313308e85bb80b0090899720a3fcba027b8025590a

DrahtBot added the CI failed label May 16, 2024

sipa force-pushed the 202405_clusterlin branch 5 times, most recently from 079f02d to 07f68f9 Compare May 17, 2024 13:38

DrahtBot removed the CI failed label May 17, 2024

laanwj added the Mempool label May 17, 2024

sipa force-pushed the 202405_clusterlin branch 2 times, most recently from 079f02d to b4bb178 Compare May 20, 2024 01:35

DrahtBot added the CI failed label May 20, 2024

DrahtBot mentioned this pull request May 20, 2024

scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] #29641

Draft

DrahtBot removed the CI failed label May 20, 2024

sipa force-pushed the 202405_clusterlin branch from b4bb178 to 88fe1e3 Compare May 20, 2024 18:49

sipa force-pushed the 202405_clusterlin branch from 88fe1e3 to c9558d5 Compare May 20, 2024 21:33

sipa added the DrahtBot Guix build requested label May 21, 2024

glozow mentioned this pull request May 23, 2024

Package Relay Project Tracking #27463

Open

54 tasks

theuni reviewed May 23, 2024

View reviewed changes

src/util/bitset.h Outdated Show resolved Hide resolved

sipa force-pushed the 202405_clusterlin branch from c9558d5 to 19fb843 Compare May 23, 2024 15:15

sipa force-pushed the 202405_clusterlin branch from 19fb843 to c05a487 Compare May 23, 2024 15:28

sipa mentioned this pull request May 23, 2024

util: add BitSet #30160

Merged

sipa removed the DrahtBot Guix build requested label May 23, 2024

sipa mentioned this pull request May 23, 2024

util: add VecDeque #30161

Merged

sipa force-pushed the 202405_clusterlin branch from c05a487 to 03bc4a5 Compare May 23, 2024 18:23

DrahtBot mentioned this pull request May 24, 2024

Several randomness improvements #29625

Merged

sipa force-pushed the 202405_clusterlin branch from df749c0 to 448374f Compare July 25, 2024 12:14

DrahtBot requested a review from glozow July 25, 2024 12:19

sipa added 12 commits July 25, 2024 10:16

clusterlin: add AncestorCandidateFinder class

4828079

This is a class that encapsulates precomputed ancestor set feerates, and presents an interface for getting the best remaining ancestor set.

clusterlin: add SearchCandidateFinder class

2a41f15

Similar to AncestorCandidateFinder, this encapsulates the state needed for finding good candidate sets using a search algorithm.

clusterlin: add chunking algorithm

ee0ddfe

A fuzz test is added which verifies various of its expected properties, including correctness

clusterlin: add Linearize function

46aad9b

This adds a first version of the overall linearization interface, which given a DepGraph constructs a good linearization, by incrementally including good candidate sets (found using AncestorCandidateFinder and SearchCandidateFinder).

bench: Candidate finding and linearization benchmarks

d9b235e

Add benchmarks for known bad graphs for the purpose of search (as an upper bound on work per search iterations) and ancestor sorting (as an upper bound on linearization work with no search iterations).

clusterlin: randomize the SearchCandidateFinder search order

d5918dc

To make search non-deterministic, change the BFS logic from always picking the first queue item to randomly picking the first or second queue item.

clusterlin: permit passing in existing linearization to Linearize

2854979

This implements the LIMO algorithm for linearizing by improving an existing linearization. See https://delvingbitcoin.org/t/limo-combining-the-best-parts-of-linearization-search-and-merging for details.

bench: add cluster linearization improvement benchmark

647fa37

sipa force-pushed the 202405_clusterlin branch from 448374f to 647fa37 Compare July 25, 2024 14:17

glozow merged commit 37bd70a into bitcoin:master Jul 26, 2024
16 checks passed

hebasto added the Needs CMake port label Jul 31, 2024

hebasto mentioned this pull request Jul 31, 2024

cmake: Regular rebasing of the cmake-staging branch hebasto/bitcoin#290

Closed

hebasto removed the Needs CMake port label Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster mempool: cluster linearization algorithm #30126

cluster mempool: cluster linearization algorithm #30126

sipa commented May 16, 2024 •

edited

Loading

DrahtBot commented May 16, 2024 •

edited

Loading

DrahtBot commented May 16, 2024

sipa commented May 20, 2024 •

edited

Loading

sipa commented May 23, 2024 •

edited

Loading

DrahtBot commented May 23, 2024

sipa commented Jul 25, 2024

instagibbs commented Jul 25, 2024

glozow commented Jul 25, 2024

sipa commented Jul 25, 2024

instagibbs commented Jul 25, 2024 •

edited

Loading

glozow commented Jul 25, 2024

sdaftuar commented Jul 25, 2024

hebasto commented Jul 31, 2024

ismaelsadeeq commented Aug 26, 2024

cluster mempool: cluster linearization algorithm #30126

cluster mempool: cluster linearization algorithm #30126

Conversation

sipa commented May 16, 2024 • edited Loading

DrahtBot commented May 16, 2024 • edited Loading

Code Coverage

Reviews

Conflicts

DrahtBot commented May 16, 2024

sipa commented May 20, 2024 • edited Loading

sipa commented May 23, 2024 • edited Loading

DrahtBot commented May 23, 2024

Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use]

sipa commented Jul 25, 2024

instagibbs commented Jul 25, 2024

glozow commented Jul 25, 2024

sipa commented Jul 25, 2024

instagibbs commented Jul 25, 2024 • edited Loading

glozow commented Jul 25, 2024

sdaftuar commented Jul 25, 2024

hebasto commented Jul 31, 2024

ismaelsadeeq commented Aug 26, 2024

sipa commented May 16, 2024 •

edited

Loading

DrahtBot commented May 16, 2024 •

edited

Loading

sipa commented May 20, 2024 •

edited

Loading

sipa commented May 23, 2024 •

edited

Loading

instagibbs commented Jul 25, 2024 •

edited

Loading