-
Notifications
You must be signed in to change notification settings - Fork 36.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Index for UTXO Set Statistics #18000
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
0376a22
to
663dbfb
Compare
I'm reviewing and testing it right now but from some benchmarks I did, I saw how the time goes down by 50%. However the performance increases by 19% percent. Regtest results only right now On master (ef8e2ce):
With this PR
|
Can you say more about current use cases of the |
I think it's perfectly fine to drop the "transactions" statistics. It doesn't really mean anything; it just happened to be easy to compute pre-pertxout, but even now it's mostly a hack. |
@emilengler Thanks for giving it a try! But I think the benchmarks you are comparing are not well suited in this case. I assume you are running a standard regtest network, maybe generated a few hundred blocks and made some transactions by hand. In this case the UTXO set is very small and iterating over it will be extremely fast, so that it would not surprise me if there is no measurable upside to using the index (maybe it will be even a little slower as you saw). But the goal of this index is that the time of the call will stay constant for a realistically sized UTXO set. So a better indicator would be to try it on a synced mainnet or testnet node, where @ryanofsky Sure, I will go along the different statistics the call provides to highlight which ones I think are more or less important:
For future use cases (aside from integration with |
@fjahr I'm a bit confused now, I ran the tests again on a full chain and master is MUCH faster than this Pull Request (4 MINUTES faster)
On fjahr:utxo-stats-index-rebase (663dbfb)
|
Thanks, this information is really clarifying. I wonder if some of it could be incorporated in the reference documentation (without mentioning specific CVE's or software projects), or maybe if it would be useful as part of some wiki. In any case, this definitely helps motivate the PR, so concept ACK from me |
@emilengler did you actually start |
no, sorry oversaw that |
Hmmm I get this error message now:
the chain state is about ~2 days old |
Yeah, that means the index is syncing at the moment. You should see the progress in the logs of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
benchmark: have you run
gettxoutsetinfo
with-and-without this PR on few UTXO sets across the years to check than performance holds linearly with utxo set growth (or a least give an intuition of it) ? -
assume-utxo: As security model of assume-utxo lays on public audit of a hardcoded hash of a utxo set, lowering the bar to compute it will increase review process trustworthiness with a higher number of participants. So IMO this point is worthy enough to Concept ACK this proposal (though @jamesob may already have work here?)
Quickly skimmed over the new code, it's clean enough if we want later to memory-isolate indexes in their own space. Don't have opinion on either MuHash or ECMH (but no lattice).
663dbfb
to
cce22aa
Compare
First of all, thanks for taking early looks and comments @ariard , @ryanofsky, @emilengler , @jnewbery (offline). Your comments should be mostly addressed in the latest code changes and in further information below. With the latest fixes and performance improvements, this should be now ready for real review and testing. Short recap of changes (this changed slightly)
Benchmarking
How long does the sync of the CoinStatsIndex take?
How about the growth of the UTXO set (ariard's question above)?
TBD
|
cce22aa
to
f7bcb12
Compare
Rebased |
In #19145 (missing in OP) you're adding a Adding MuHash to the index later might be annoying, however if the plan is to support different hash types, it sounds like MuHash should be in a separate index from the coin stats anyway? |
I needed to reflect a bit on this but now do I think it would be good to offer the index already without the hash. But I think to keep the hash in a different index and letting users set these with separate settings might be too much responsibility for the users. I would suggest a more opinionated approach. Rather than having I will prepare the alternate PRs with only |
40506bf test: Test gettxouttsetinfo hash_type option (Fabian Jahr) f17a4d1 rpc: Add hash_type NONE to gettxoutsetinfo (Fabian Jahr) a712cf6 rpc: gettxoutsetinfo can specify hash_type (only legacy option for now) (Fabian Jahr) 605884e refactor: Extract GetBogoSize function (Fabian Jahr) Pull request description: This is another intermediate part of the Coinstats Index (tracked in #18000). Sjors suggested [here](#18000 (comment)) that the part of the changes in #19145 that don't rely on the new `hash_type` muhash, i.e. that are for `hash_type=none`, could be merged separately from everything involving muhash. So these changes are extracted from #19145 here and can be merged without any other requirements. Building the index with no UTXO set hash is still valuable because `gettxoutsetinfo` can still be used to audit the `total_amount` for example. By itself this PR is not a huge improvement, `hash_type=none` is speeding up `gettxoutsetinfo` by about 10%, but it enables the implementation of an index on top of it in a follow-up and that means large parts of the index code of Coinstats Index can be merged while reviews for the hashing algorithm might take longer. ACKs for top commit: MarcoFalke: ACK 40506bf 🖨 Sjors: tACK 40506bf Tree-SHA512: 3964c2b8eed427511b1aa9b2ef285dff27dc4d1537d72c3911e435b6e6b40912232da4acb3a09bd19a0372ddffa44103388d8a650169d95a4a727b970d210add
40506bf test: Test gettxouttsetinfo hash_type option (Fabian Jahr) f17a4d1 rpc: Add hash_type NONE to gettxoutsetinfo (Fabian Jahr) a712cf6 rpc: gettxoutsetinfo can specify hash_type (only legacy option for now) (Fabian Jahr) 605884e refactor: Extract GetBogoSize function (Fabian Jahr) Pull request description: This is another intermediate part of the Coinstats Index (tracked in bitcoin#18000). Sjors suggested [here](bitcoin#18000 (comment)) that the part of the changes in bitcoin#19145 that don't rely on the new `hash_type` muhash, i.e. that are for `hash_type=none`, could be merged separately from everything involving muhash. So these changes are extracted from bitcoin#19145 here and can be merged without any other requirements. Building the index with no UTXO set hash is still valuable because `gettxoutsetinfo` can still be used to audit the `total_amount` for example. By itself this PR is not a huge improvement, `hash_type=none` is speeding up `gettxoutsetinfo` by about 10%, but it enables the implementation of an index on top of it in a follow-up and that means large parts of the index code of Coinstats Index can be merged while reviews for the hashing algorithm might take longer. ACKs for top commit: MarcoFalke: ACK 40506bf 🖨 Sjors: tACK 40506bf Tree-SHA512: 3964c2b8eed427511b1aa9b2ef285dff27dc4d1537d72c3911e435b6e6b40912232da4acb3a09bd19a0372ddffa44103388d8a650169d95a4a727b970d210add
124e1ee doc: Add release notes for getindexinfo RPC (Fabian Jahr) c447b09 test: Add tests for getindexinfo RPC (Fabian Jahr) 667bc7a rpc: Add getindexinfo RPC (Fabian Jahr) Pull request description: As I was playing with indices a I was missing an RPC that gives information about the active indices in the node. I think this can be helpful for many users, especially since there are some new index candidates coming up (#14053, #18000) that can give a quick overview without the user having to parse the logs. Feature summary: - Adds new RPC `listindices` (placed in Util section) - That RPC only lists the actively running indices - For each index it gives the name, whether it is synced and up to which block height it is synced ACKs for top commit: laanwj: Re-ACK 124e1ee jonatack: Code review re-ACK 124e1ee per `git range-diff a57af89 47a5372 124e1ee` no change since my last re-ACK, rebase only Tree-SHA512: 3b7174c87951e6457fef099f530337803906baf32fb64261410b8def2c0917853d6a1bf3059cd590b1cc1523608f8916dafb327a431d27ecbf8d7454406b5b35
124e1ee doc: Add release notes for getindexinfo RPC (Fabian Jahr) c447b09 test: Add tests for getindexinfo RPC (Fabian Jahr) 667bc7a rpc: Add getindexinfo RPC (Fabian Jahr) Pull request description: As I was playing with indices a I was missing an RPC that gives information about the active indices in the node. I think this can be helpful for many users, especially since there are some new index candidates coming up (bitcoin#14053, bitcoin#18000) that can give a quick overview without the user having to parse the logs. Feature summary: - Adds new RPC `listindices` (placed in Util section) - That RPC only lists the actively running indices - For each index it gives the name, whether it is synced and up to which block height it is synced ACKs for top commit: laanwj: Re-ACK 124e1ee jonatack: Code review re-ACK 124e1ee per `git range-diff a57af89 47a5372 124e1ee` no change since my last re-ACK, rebase only Tree-SHA512: 3b7174c87951e6457fef099f530337803906baf32fb64261410b8def2c0917853d6a1bf3059cd590b1cc1523608f8916dafb327a431d27ecbf8d7454406b5b35
36ec980 test: Add chacha20 test vectors in muhash (Fabian Jahr) 0e2b400 test: Add basic Python/C++ Muhash implementation parity unit test (Fabian Jahr) b85543c test: Add Python MuHash3072 implementation to test framework (Pieter Wuille) ab30cec test: Move modinv to util and add unit test (Fabian Jahr) Pull request description: This is the second in a [series of pull requests](#18000) to implement an Index for UTXO set statistics. This pull request adds a Python implementation of Muhash3072, a homomorphic hashing algorithm to be used for hashing the UTXO set. The Python implementation can then be used to compare behavior with the C++ version. ACKs for top commit: jnewbery: utACK 36ec980 laanwj: Code review ACK 36ec980 Tree-SHA512: a3519c6e11031174f1ae71ecd8bcc7f3be42d7fc9c84c77f2fbea7cfc5ad54fcbe10b55116ad8d9a52ac5d675640eefed3bf260c58a02f2bf3bc0d8ec208baa6
36ec980 test: Add chacha20 test vectors in muhash (Fabian Jahr) 0e2b400 test: Add basic Python/C++ Muhash implementation parity unit test (Fabian Jahr) b85543c test: Add Python MuHash3072 implementation to test framework (Pieter Wuille) ab30cec test: Move modinv to util and add unit test (Fabian Jahr) Pull request description: This is the second in a [series of pull requests](bitcoin#18000) to implement an Index for UTXO set statistics. This pull request adds a Python implementation of Muhash3072, a homomorphic hashing algorithm to be used for hashing the UTXO set. The Python implementation can then be used to compare behavior with the C++ version. ACKs for top commit: jnewbery: utACK 36ec980 laanwj: Code review ACK 36ec980 Tree-SHA512: a3519c6e11031174f1ae71ecd8bcc7f3be42d7fc9c84c77f2fbea7cfc5ad54fcbe10b55116ad8d9a52ac5d675640eefed3bf260c58a02f2bf3bc0d8ec208baa6
PR 19105 is merged |
9815332 test: Change MuHash Python implementation to match cpp version again (Fabian Jahr) 01297fb fuzz: Add MuHash consistency fuzz test (Fabian Jahr) b111410 test: Add MuHash3072 fuzz test (Fabian Jahr) c122527 bench: Add Muhash benchmarks (Fabian Jahr) 7b12422 test: Add MuHash3072 unit tests (Fabian Jahr) adc708c crypto: Add MuHash3072 implementation (Fabian Jahr) 0b4d290 crypto: Add Num3072 implementation (Fabian Jahr) 589f958 build: Check for 128 bit integer support (Fabian Jahr) Pull request description: This is the first split of #18000 which implements the Muhash algorithm and uses it to calculate the UTXO set hash in `gettxoutsetinfo`. ACKs for top commit: laanwj: Code review ACK 9815332 Tree-SHA512: 4bc090738f0e3d80b74bdd8122e24a8ce80121120fd37c7e4335a73e7ba4fcd7643f2a2d559e2eebf54b8e3a3bd5f12cfb27ba61ded135fda210a07a233eae45
9815332 test: Change MuHash Python implementation to match cpp version again (Fabian Jahr) 01297fb fuzz: Add MuHash consistency fuzz test (Fabian Jahr) b111410 test: Add MuHash3072 fuzz test (Fabian Jahr) c122527 bench: Add Muhash benchmarks (Fabian Jahr) 7b12422 test: Add MuHash3072 unit tests (Fabian Jahr) adc708c crypto: Add MuHash3072 implementation (Fabian Jahr) 0b4d290 crypto: Add Num3072 implementation (Fabian Jahr) 589f958 build: Check for 128 bit integer support (Fabian Jahr) Pull request description: This is the first split of bitcoin#18000 which implements the Muhash algorithm and uses it to calculate the UTXO set hash in `gettxoutsetinfo`. ACKs for top commit: laanwj: Code review ACK 9815332 Tree-SHA512: 4bc090738f0e3d80b74bdd8122e24a8ce80121120fd37c7e4335a73e7ba4fcd7643f2a2d559e2eebf54b8e3a3bd5f12cfb27ba61ded135fda210a07a233eae45
e987ae5 test: Add test for deterministic UTXO set hash results (Fabian Jahr) 6ccc8fc test: Add test for gettxoutsetinfo RPC with MuHash (Fabian Jahr) 0d3b2f6 rpc: Add hash_type MUHASH to gettxoutsetinfo (Fabian Jahr) 2474645 refactor: Separate hash and stats calculation in coinstats (Fabian Jahr) a1fccea refactor: Improve encapsulation between MuHash3072 and Num3072 (Fabian Jahr) Pull request description: This is another Pr in the series PRs for Coinstatsindex (see overview in #18000). This PR adds the `hash_type` option `muhash` to `gettxoutsetinfo` through which the user can calculate the serialized muhash of the utxo set. This PR does not use the index yet. ACKs for top commit: Sjors: tACK e987ae5 achow101: ACK e987ae5 jonatack: Tested re-ACK e987ae5 per `git diff 3506d90 e987ae5`, reviewed diff, debug built, ran gettxoutsetinfo -signet and help on this branch vs master, at height 23127 both returned `hash_serialized_2` of `2b72d65f3b6efb2311f58374ea2b939abf49684d44f4bafda45faa3b5452a454` and this branch returned `muhash` of `c9f1ff12d345ccf9939c6bbf087e6f7399b6115adee1569287e9c5c43dbb475c` ryanofsky: Code review ACK e987ae5. Looks very good. I left one suggestion to simplify code, but feel free to ignore it here and maybe consider it for later since PR has already had a lot of review. Tree-SHA512: 9a739ce375e73749fa69a467262b60d3e5314ef384e2d7150b3bbc8e4125cd9fd1db95306623bb9a632fcbaf5d9d2bf2f5cc43bf717d4ff5e2c9c4b52dd9296c
e987ae5 test: Add test for deterministic UTXO set hash results (Fabian Jahr) 6ccc8fc test: Add test for gettxoutsetinfo RPC with MuHash (Fabian Jahr) 0d3b2f6 rpc: Add hash_type MUHASH to gettxoutsetinfo (Fabian Jahr) 2474645 refactor: Separate hash and stats calculation in coinstats (Fabian Jahr) a1fccea refactor: Improve encapsulation between MuHash3072 and Num3072 (Fabian Jahr) Pull request description: This is another Pr in the series PRs for Coinstatsindex (see overview in bitcoin#18000). This PR adds the `hash_type` option `muhash` to `gettxoutsetinfo` through which the user can calculate the serialized muhash of the utxo set. This PR does not use the index yet. ACKs for top commit: Sjors: tACK e987ae5 achow101: ACK e987ae5 jonatack: Tested re-ACK e987ae5 per `git diff 3506d90 e987ae5`, reviewed diff, debug built, ran gettxoutsetinfo -signet and help on this branch vs master, at height 23127 both returned `hash_serialized_2` of `2b72d65f3b6efb2311f58374ea2b939abf49684d44f4bafda45faa3b5452a454` and this branch returned `muhash` of `c9f1ff12d345ccf9939c6bbf087e6f7399b6115adee1569287e9c5c43dbb475c` ryanofsky: Code review ACK e987ae5. Looks very good. I left one suggestion to simplify code, but feel free to ignore it here and maybe consider it for later since PR has already had a lot of review. Tree-SHA512: 9a739ce375e73749fa69a467262b60d3e5314ef384e2d7150b3bbc8e4125cd9fd1db95306623bb9a632fcbaf5d9d2bf2f5cc43bf717d4ff5e2c9c4b52dd9296c
fs::path path = GetDataDir() / "indexes" / "coinstats"; | ||
fs::create_directories(path); | ||
|
||
m_db = MakeUnique<CoinStatsIndex::DB>(path / "db", n_cache_size, f_memory, f_wipe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use std::make_unique
in new code.
I don't mean to spam the PR, but the chart was updated incorrectly. PR 19055 is merged but marked as open Thank you for all your work! |
Thanks, I was too hastily. Fixed. |
5f96d7d rpc: gettxoutsetinfo rejects hash_serialized_2 for specific height (Fabian Jahr) 23fe504 test: Add test for coinstatsindex behavior in reorgs (Fabian Jahr) 90c966b rpc: Allow gettxoutsetinfo and getblockstats for stale blocks (Fabian Jahr) b936239 index, rpc: Add use_index option for gettxoutsetinfo (Fabian Jahr) bb7788b test: Test coinstatsindex robustness across restarts (Fabian Jahr) e0938c2 test: Add tests for block_info in gettxoutsetinfo (Fabian Jahr) 2501576 rpc, index: Add verbose amounts tracking to Coinstats index (Fabian Jahr) 655d929 test: add coinstatsindex getindexinfo coverage, improve current tests (Jon Atack) ca01bb8 rpc: Add Coinstats index to getindexinfo (Fabian Jahr) 57a026c test: Add unit test for Coinstats index (Fabian Jahr) 6a4c0c0 test: Add functional test for Coinstats index (Fabian Jahr) 3f166ec rpc: gettxoutsetinfo can be requested for specific blockheights (Fabian Jahr) 3c914d5 index: Coinstats index can be activated with command line flag (Fabian Jahr) dd58a4d index: Add Coinstats index (Fabian Jahr) a8a46c4 refactor: Simplify ApplyStats and ApplyHash (Fabian Jahr) 9c8a265 refactor: Pass hash_type to CoinsStats in stats object (Fabian Jahr) 2e2648a crypto: Make MuHash Remove method efficient (Fabian Jahr) Pull request description: This is part of the coinstats index project tracked in #18000 While the review of the new UTXO set hash algorithm (MuHash) takes longer recently #19328 was merged which added the possibility to run `gettxoutsetinfo` with a specific hash type. As the first type it added `hash_type=none` which skips the hashing of the UTXO set altogether. This alone did not make `gettxoutsetinfo` much faster but it allows the use of an index for the remaining coin statistics even before a new hashing algorithm has been added. Credit to Sjors for the idea to take this intermediate step. Features summary: - Users can start their node with the option `-coinstatsindex` which syncs the index in the background - After the index is synced the user can use `gettxoutsetinfo` with `hash_type=none` or `hash_type=muhash` and will get the response instantly out of the index - The user can specify a height or block hash when calling `gettxoutsetinfo` to see coin statistics at a specific block height ACKs for top commit: Sjors: re-tACK 5f96d7d jonatack: Code review re-ACK 5f96d7d per `git range-diff 13d27b4 07201d3 5f96d7d` promag: Tested ACK 5f96d7d. Light code review ACK 5f96d7d. Tree-SHA512: cbca78bee8e9605c19da4fbcd184625fb280200718396c694a56c7daab6f44ad23ca9fb5456d09f245d8b8d9659fdc2b3f3ce5e953c1c6cf4003dbc74c0463c2
Summary: > This is another intermediate part of the Coinstats Index (tracked in [[bitcoin/bitcoin#18000 | core#18000]]). > > Sjors suggested here that the part of the changes in #19145 that don't rely on the new hash_type muhash, i.e. that are for hash_type=none, could be merged separately from everything involving muhash. So these changes are extracted from #19145 here and can be merged without any other requirements. > > Building the index with no UTXO set hash is still valuable because gettxoutsetinfo can still be used to audit the total_amount for example. By itself this PR is not a huge improvement, hash_type=none is speeding up gettxoutsetinfo by about 10%, but it enables the implementation of an index on top of it in a follow-up and that means large parts of the index code of Coinstats Index can be merged while reviews for the hashing algorithm might take longer. This is a backport of [[bitcoin/bitcoin#19328 | core#19328]] [1/4] bitcoin/bitcoin@605884e Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D9978
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
Closing this as almost everything that was aimed for has been implemented by now. Thanks to everyone who supported this project with advice, code reviews, or just general words of encouragement! There are still two open PRs that are related and could use some review:
|
This PR will not be updated anymore because the project is now split up into multiple pull requests. But I will use it to keep track of the projects PRs:
Since it's starting to get confusing I am trying to visualize the dependencies of the open PRs in this depency graph:
This implements an index of coin statistics with the goal of making the response time of the
gettxoutsetinfo
RPC call dramatically faster. Currently, this RPC is scanning the full UTXO set every time it is called which makes it hard to use for users that want to continually check the coin supply or compare UTXO set hashes between different nodes. It is especially challenging in periods of multiple quickly mined blocks, even relatively fast machines.Implementation overview:
gettxoutsetinfo
to scan the UTXO set-coinstatsindex
Todos/Open questions:
txindex
and the transactions count from itThis is an extension to my rolling UTXO set hash proposal from last year.
Edit: 3c5c1ca should probably be squashed into the prior commits but I wanted to leave sipa's commits unchanged for the start.