Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Index for UTXO Set Statistics #18000

Closed
wants to merge 13 commits into from

Conversation

fjahr
Copy link
Contributor

@fjahr fjahr commented Jan 24, 2020

This PR will not be updated anymore because the project is now split up into multiple pull requests. But I will use it to keep track of the projects PRs:

Content PR Status Review Club
Add Muhash and SHA256Writer (unused) #19055 Merged notes
Add Muhash Python implementation (unused) #19105 Merged notes
Add hash_type NONE (Muhash code independent) #19328 Merged -
Add hash_type MUHASH #19145 Merged ?
Add ASM optimizations for MuHash #19181 Open ?
Add CoinStatsIndex #19521 Merged ?
Parallelize gettxoutsetinfo
Remove legacy UTXO set hash (future)

Since it's starting to get confusing I am trying to visualize the dependencies of the open PRs in this depency graph:

+--------------+   +-------------+   +----------+
|#19328        |   |#19105       |   |#19055    |
|hash_type NONE|   |Muhash Python|   |Muhash CPP|
+----+-----+---+   +------+------+   +-+----+---+
     ^     ^              ^            ^    ^
     |     |              |            |    |
     |     |    +---------+------+     |    |
     |     +----+#19145          +-----+    |
     |          |hash_type MUHASH|          |
     |          +----------------+          |
     |                               +------+---+
+----+--------+                      |#19181    |
|#19521       |                      |MuHash ASM|
|No hash Index|                      +----------+
+-------------+


This implements an index of coin statistics with the goal of making the response time of the gettxoutsetinfo RPC call dramatically faster. Currently, this RPC is scanning the full UTXO set every time it is called which makes it hard to use for users that want to continually check the coin supply or compare UTXO set hashes between different nodes. It is especially challenging in periods of multiple quickly mined blocks, even relatively fast machines.

Implementation overview:

  • The current serialization of the UTXO set for the purpose of hashing is changed based on sipa's concept in proposed on the mailing list in 2017 and [WIP] 3072-bit MuHash based hash_serialized #10434
  • The hashing algorithm in for the UTXO set is changed to Muhash, which was also implemented by sipa
  • CoinStatsIndex is added which keeps an index of all values that would require gettxoutsetinfo to scan the UTXO set
  • CoinStatsIndex can be activated through the flag -coinstatsindex

Todos/Open questions:

  • The transactions count is currently not implemented as it seems not knowable ex-post without running a full IBD to build up the index. I am looking for a solution to this. Ideas are:
    • Require txindex and the transactions count from it
    • Remove transactions count or mark it as unreliable in a different way if coinstatsindex is enabled
  • The transactions count question is also interesting because it seems to be the only obstacle to providing access to historical coin statistics, which may be a nice follow-up feature if transaction counts can be ignored somehow
  • More benchmarking to evaluate potential switch from Muhash to ECMH as the hashing algorithm
  • IBD benchmarking with the index enabled
  • Tests could be improved

This is an extension to my rolling UTXO set hash proposal from last year.

Edit: 3c5c1ca should probably be squashed into the prior commits but I wanted to leave sipa's commits unchanged for the start.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jan 25, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@fjahr fjahr force-pushed the utxo-stats-index-rebase branch 6 times, most recently from 0376a22 to 663dbfb Compare January 27, 2020 15:21
@cvengler
Copy link
Contributor

I'm reviewing and testing it right now but from some benchmarks I did, I saw how the time goes down by 50%. However the performance increases by 19% percent.

Regtest results only right now

On master (ef8e2ce):

$ time src/bitcoin-cli --regtest gettxoutsetinfo
...
> src/bitcoin-cli --regtest gettxoutsetinfo  0,00s user 0,00s system 23% cpu 0,014 total

With this PR

$ time src/bitcoin-cli --regtest gettxoutsetinfo
...
> src/bitcoin-cli --regtest gettxoutsetinfo  0,00s user 0,00s system 42% cpu 0,007 total

@ryanofsky
Copy link
Contributor

Can you say more about current use cases of the gettxoutsetinfo function and maybe future use cases if it's significantly faster with this change?

@sipa
Copy link
Member

sipa commented Jan 27, 2020

I think it's perfectly fine to drop the "transactions" statistics. It doesn't really mean anything; it just happened to be easy to compute pre-pertxout, but even now it's mostly a hack.

@fjahr
Copy link
Contributor Author

fjahr commented Jan 27, 2020

@emilengler Thanks for giving it a try! But I think the benchmarks you are comparing are not well suited in this case. I assume you are running a standard regtest network, maybe generated a few hundred blocks and made some transactions by hand. In this case the UTXO set is very small and iterating over it will be extremely fast, so that it would not surprise me if there is no measurable upside to using the index (maybe it will be even a little slower as you saw). But the goal of this index is that the time of the call will stay constant for a realistically sized UTXO set. So a better indicator would be to try it on a synced mainnet or testnet node, where gettxoutset currently takes several minutes without the index, even on pretty high powered hardware.

@ryanofsky Sure, I will go along the different statistics the call provides to highlight which ones I think are more or less important:

  • total_amount The total amount of the current bitcoin supply. Tracking this number on a continual basis allows users to detect potential inflation bugs, such as CVE-2018-17144 (Example: Bitmex Research). I think all users should be given the chance to continually check this, even on more affordable hardware. I am currently running a mainnet node on an Odroid HC2 and gettxoutsetinfo takes 7min 50s. Specs are roughly comparable to a Raspberry Pi 4, so it can keep up well with the chain otherwise. If I wanted to run a continual inflation detection on this hardware, I would not be able to do so when the UTXO set has grown another 20-25% (ignoring hashrate fluctuations).
  • hash_serialized_2 The serialized hash of the UTXO set can be used for a quick comparison of the UTXO set between different notes. Essentially, it can be a stand-in for the latest blockhash and compared for sanity checks. BTCPayServer has been using it for their FastSync feature, which will probably be replaced with assume_utxo, where a UTXO set hash (not necessarily this one) will play a role as well. I still need to sync up with @jamesob about the latest plans for the hash and how this proposal can be compatible with his or even help/support it.
  • txouts The total number of unspent outputs is an interesting number to track from an operations standpoint. Users can see if coins are being consolidated or at which rate the UTXO set is growing. Example: satoshi.info
  • bogosize and disk_size might be also interesting to track but not as actionable, disc usage is probably much more efficient to collect from the OS
  • transactions it seems to be a nice-to-have metric that does not serve any real purpose afaict, as @sipa also noted
  • height and bestblock can be ignored since they are available from other RPCs, but are necessary to compare numbers between different machines

For future use cases (aside from integration with assume_utxo potentially) I don't have specific ideas yet, but it opens the door to track any other statistics of the UTXO set or block content in the index. So it may collect further metrics of interest where users currently parse the block data with their own custom software.

@cvengler
Copy link
Contributor

@fjahr I'm a bit confused now, I ran the tests again on a full chain and master is MUCH faster than this Pull Request (4 MINUTES faster)
On master (2755b2b)

$ time src/bitcoin-cli --datadir=/media/emil/Files/Bitcoin/ gettxoutsetinfo
{
  "height": 614943,
  "bestblock": "0000000000000000000fa03a2ce4c6d7fb763e55402deac32371b898e8faba5e",
  "transactions": 38747934,
  "txouts": 65245740,
  "bogosize": 4903711195,
  "hash_serialized_2": "011d3eccd148b68a3bf039f0695bb4d77a23ba208dcd4d8ac073d867fd5bbef6",
  "disk_size": 4035082036,
  "total_amount": 18186617.32125282
}
src/bitcoin-cli --datadir=/media/emil/Files/Bitcoin/ gettxoutsetinfo  0,00s user 0,01s system 0% cpu 57,428 total

On fjahr:utxo-stats-index-rebase (663dbfb)

$ time src/bitcoin-cli --datadir=/media/emil/Files/Bitcoin/ gettxoutsetinfo
{
  "height": 614937,
  "bestblock": "00000000000000000005973a14ac338b681e410922c7566e435107db1b1d807b",
  "transactions": 38745882,
  "txouts": 65244661,
  "bogosize": 4903634025,
  "utxo_set_hash": "87d3e7f807d6b428bf60e983a73137ef410cb510de1b5f4e650e081d044fc772",
  "disk_size": 4082447626,
  "total_amount": 18186542.32125282
}
src/bitcoin-cli --datadir=/media/emil/Files/Bitcoin/ gettxoutsetinfo  0,01s user 0,00s system 0% cpu 5:01,42 total

@ryanofsky
Copy link
Contributor

@ryanofsky Sure, I will go along the different statistics the call provides to highlight which ones I think are more or less important:

Thanks, this information is really clarifying. I wonder if some of it could be incorporated in the reference documentation (without mentioning specific CVE's or software projects), or maybe if it would be useful as part of some wiki. In any case, this definitely helps motivate the PR, so concept ACK from me

@fjahr
Copy link
Contributor Author

fjahr commented Jan 30, 2020

@emilengler did you actually start bitcoind with -coinstatsindex? I mention it briefly in the PR description, but as @ryanofsky already pointed out as well, I need to do a much better job at documenting the intent and current state of the proposal. It seems so because otherwise you could not test it that quickly, building up the index takes a quite long time and while the index is building up gettxoutsetinfo is responding with "unable to read UTXO set". In fact, on my machine, it is currently so slow that I might have to address it. But running a plain IBD is also incredibly slow right now so it could also be a local issue. What is probably happening is that you are running bitcoind without the flag and so the stats are still calculated by iterating over the UTXO set but the hash is using the new hashing algorithm (Muhash), which is slower and results in slower performance when not using the index.

@cvengler
Copy link
Contributor

@emilengler did you actually start bitcoind with -coinstatsindex?

no, sorry oversaw that

@cvengler
Copy link
Contributor

Hmmm I get this error message now:

error code: -32603
error message:
Unable to read UTXO set

the chain state is about ~2 days old

@fjahr
Copy link
Contributor Author

fjahr commented Jan 30, 2020

Hmmm I get this error message now:

error code: -32603
error message:
Unable to read UTXO set

the chain state is about ~2 days old

Yeah, that means the index is syncing at the moment. You should see the progress in the logs of bitcoind. Something like Syncing coinstatsindex with block chain from height XXXXX.

Copy link
Contributor

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • benchmark: have you run gettxoutsetinfo with-and-without this PR on few UTXO sets across the years to check than performance holds linearly with utxo set growth (or a least give an intuition of it) ?

  • assume-utxo: As security model of assume-utxo lays on public audit of a hardcoded hash of a utxo set, lowering the bar to compute it will increase review process trustworthiness with a higher number of participants. So IMO this point is worthy enough to Concept ACK this proposal (though @jamesob may already have work here?)

Quickly skimmed over the new code, it's clean enough if we want later to memory-isolate indexes in their own space. Don't have opinion on either MuHash or ECMH (but no lattice).

src/node/coinstats.cpp Outdated Show resolved Hide resolved
src/index/coinstatsindex.cpp Show resolved Hide resolved
src/index/coinstatsindex.cpp Outdated Show resolved Hide resolved
@fjahr fjahr marked this pull request as ready for review March 6, 2020 23:08
@fjahr fjahr changed the title [WIP] Coin Statistics Index Index for Coin Statistics Mar 6, 2020
@fjahr
Copy link
Contributor Author

fjahr commented Mar 6, 2020

First of all, thanks for taking early looks and comments @ariard , @ryanofsky, @emilengler , @jnewbery (offline). Your comments should be mostly addressed in the latest code changes and in further information below. With the latest fixes and performance improvements, this should be now ready for real review and testing.

Short recap of changes (this changed slightly)

  • Introduces MuHash3072 and TruncatedSHA256Writer implementations (code by Pieter Wuille)
  • Uses above for MuHash hashing alogrithm for the UTXO set hash in gettxoutsetinfo
  • Removed tracking of the transactions count statistic
  • Introduces CoinStatsIndex class, an index for all the statistics in gettxoutsetinfo
  • Allows activating the index using the flag -coinstatsindex. When activated gettxoutsetinfo responds within seconds after the initial sync phase. Otherwise it works as before but will be slower to parse the UTXO set due to the change in the hashing algorithm.

Benchmarking

  • For reference, on my benchmarking server and my local machine, IBD takes around 18-19 hours
  • I can provide more detailed numbers if required but so far I am pretty confident that it's not a bottleneck even for users on under-powered nodes.

How long does the sync of the CoinStatsIndex take?

  • On my benchmarking server syncing the Coinstatsindex after IBD took between 8.5 and 9.5h hours
  • On my local machine, I also synced testnet and it took 8h 20min.

How about the growth of the UTXO set (ariard's question above)?

  • For the sync time of the index should actually not make much of a difference if the UTXO set shrinks or grows, it should rather grow linearly with the size of the blockchain.
  • Instead, growth of the UTXO set causes gettxoutsetinfo without the index to run longer because it iterates over it every time and so it is one of the reasons why I think the index is worth integrating into core.

TBD

  • Although its a lot of LOC I think the changes are still ok to be reviewed as one PR but if people would prefer me to split it into two or three smaller PRs that is easily doable.
  • The newly introduced metric coins count that is only used in Assume UTXO is not in the index and I don't think there is a practical way to integrate it (similar problem to transaction count). That means it will probably not be possible to have this index running and create a new assume utxo set on the same node at the same time. I will need to discuss this with @jamesob to make sure this dealt with correctly one Assume UTXO is finished.

@fjahr
Copy link
Contributor Author

fjahr commented Mar 7, 2020

Rebased

@Sjors
Copy link
Member

Sjors commented Jun 9, 2020

In #19145 (missing in OP) you're adding a hash_type option to gettxoutsetinfo, which can be none. This makes an index immediately useful, even without MuHash. So you make a PR for that independently.

Adding MuHash to the index later might be annoying, however if the plan is to support different hash types, it sounds like MuHash should be in a separate index from the coin stats anyway?

@fjahr
Copy link
Contributor Author

fjahr commented Jun 16, 2020

In #19145 (missing in OP) you're adding a hash_type option to gettxoutsetinfo, which can be none. This makes an index immediately useful, even without MuHash. So you make a PR for that independently.

Adding MuHash to the index later might be annoying, however if the plan is to support different hash types, it sounds like MuHash should be in a separate index from the coin stats anyway?

I needed to reflect a bit on this but now do I think it would be good to offer the index already without the hash. But I think to keep the hash in a different index and letting users set these with separate settings might be too much responsibility for the users. I would suggest a more opinionated approach. Rather than having -coinstatsindex=none and -coinstatsindex=muhash later I would suggest it stays -coinstatsindex and at first there would be no hash and users would only get a fast response when they use hash_type=none, then muhash can be added to the index and users will then be receiving responses from the index with hash_type=none and hash_type=muhash. Personally I think we don't want to have a lot of hash types, too much unnecessary choice for the user. I think we use this now to support hash_serialized_2 for some time and maybe also to run ECMH in parallel but ideally we should be ending up at one hash_type that is the default and most of the other options can probably be removed.

I will prepare the alternate PRs with only hash_type=none and I hope I can make so people are not totally confused by all the different PRs. :)

maflcko pushed a commit that referenced this pull request Jul 6, 2020
40506bf test: Test gettxouttsetinfo hash_type option (Fabian Jahr)
f17a4d1 rpc: Add hash_type NONE to gettxoutsetinfo (Fabian Jahr)
a712cf6 rpc: gettxoutsetinfo can specify hash_type (only legacy option for now) (Fabian Jahr)
605884e refactor: Extract GetBogoSize function (Fabian Jahr)

Pull request description:

  This is another intermediate part of the Coinstats Index (tracked in #18000).

  Sjors suggested [here](#18000 (comment)) that the part of the changes in #19145 that don't rely on the new `hash_type` muhash, i.e. that are for `hash_type=none`, could be merged separately from everything involving muhash. So these changes are extracted from #19145 here and can be merged without any other requirements.

  Building the index with no UTXO set hash is still valuable because `gettxoutsetinfo` can still be used to audit the `total_amount` for example. By itself this PR is not a huge improvement, `hash_type=none` is speeding up `gettxoutsetinfo` by about 10%, but it enables the implementation of an index on top of it in a follow-up and that means large parts of the index code of Coinstats Index can be merged while reviews for the hashing algorithm might take longer.

ACKs for top commit:
  MarcoFalke:
    ACK 40506bf 🖨
  Sjors:
    tACK 40506bf

Tree-SHA512: 3964c2b8eed427511b1aa9b2ef285dff27dc4d1537d72c3911e435b6e6b40912232da4acb3a09bd19a0372ddffa44103388d8a650169d95a4a727b970d210add
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 8, 2020
40506bf test: Test gettxouttsetinfo hash_type option (Fabian Jahr)
f17a4d1 rpc: Add hash_type NONE to gettxoutsetinfo (Fabian Jahr)
a712cf6 rpc: gettxoutsetinfo can specify hash_type (only legacy option for now) (Fabian Jahr)
605884e refactor: Extract GetBogoSize function (Fabian Jahr)

Pull request description:

  This is another intermediate part of the Coinstats Index (tracked in bitcoin#18000).

  Sjors suggested [here](bitcoin#18000 (comment)) that the part of the changes in bitcoin#19145 that don't rely on the new `hash_type` muhash, i.e. that are for `hash_type=none`, could be merged separately from everything involving muhash. So these changes are extracted from bitcoin#19145 here and can be merged without any other requirements.

  Building the index with no UTXO set hash is still valuable because `gettxoutsetinfo` can still be used to audit the `total_amount` for example. By itself this PR is not a huge improvement, `hash_type=none` is speeding up `gettxoutsetinfo` by about 10%, but it enables the implementation of an index on top of it in a follow-up and that means large parts of the index code of Coinstats Index can be merged while reviews for the hashing algorithm might take longer.

ACKs for top commit:
  MarcoFalke:
    ACK 40506bf 🖨
  Sjors:
    tACK 40506bf

Tree-SHA512: 3964c2b8eed427511b1aa9b2ef285dff27dc4d1537d72c3911e435b6e6b40912232da4acb3a09bd19a0372ddffa44103388d8a650169d95a4a727b970d210add
This was referenced Jul 14, 2020
laanwj added a commit that referenced this pull request Aug 20, 2020
124e1ee doc: Add release notes for getindexinfo RPC (Fabian Jahr)
c447b09 test: Add tests for getindexinfo RPC (Fabian Jahr)
667bc7a rpc: Add getindexinfo RPC (Fabian Jahr)

Pull request description:

  As I was playing with indices a I was missing an RPC that gives information about the active indices in the node. I think this can be helpful for many users, especially since there are some new index candidates coming up (#14053, #18000) that can give a quick overview without the user having to parse the logs.

  Feature summary:
  - Adds new RPC `listindices` (placed in Util section)
  - That RPC only lists the actively running indices
  - For each index it gives the name, whether it is synced and up to which block height it is synced

ACKs for top commit:
  laanwj:
    Re-ACK 124e1ee
  jonatack:
    Code review re-ACK 124e1ee per `git range-diff a57af89 47a5372 124e1ee` no change since my last re-ACK, rebase only

Tree-SHA512: 3b7174c87951e6457fef099f530337803906baf32fb64261410b8def2c0917853d6a1bf3059cd590b1cc1523608f8916dafb327a431d27ecbf8d7454406b5b35
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Aug 20, 2020
124e1ee doc: Add release notes for getindexinfo RPC (Fabian Jahr)
c447b09 test: Add tests for getindexinfo RPC (Fabian Jahr)
667bc7a rpc: Add getindexinfo RPC (Fabian Jahr)

Pull request description:

  As I was playing with indices a I was missing an RPC that gives information about the active indices in the node. I think this can be helpful for many users, especially since there are some new index candidates coming up (bitcoin#14053, bitcoin#18000) that can give a quick overview without the user having to parse the logs.

  Feature summary:
  - Adds new RPC `listindices` (placed in Util section)
  - That RPC only lists the actively running indices
  - For each index it gives the name, whether it is synced and up to which block height it is synced

ACKs for top commit:
  laanwj:
    Re-ACK 124e1ee
  jonatack:
    Code review re-ACK 124e1ee per `git range-diff a57af89 47a5372 124e1ee` no change since my last re-ACK, rebase only

Tree-SHA512: 3b7174c87951e6457fef099f530337803906baf32fb64261410b8def2c0917853d6a1bf3059cd590b1cc1523608f8916dafb327a431d27ecbf8d7454406b5b35
laanwj added a commit that referenced this pull request Sep 1, 2020
36ec980 test: Add chacha20 test vectors in muhash (Fabian Jahr)
0e2b400 test: Add basic Python/C++ Muhash implementation parity unit test (Fabian Jahr)
b85543c test: Add Python MuHash3072 implementation to test framework (Pieter Wuille)
ab30cec test: Move modinv to util and add unit test (Fabian Jahr)

Pull request description:

  This is the second in a [series of pull requests](#18000) to implement an Index for UTXO set statistics.

  This pull request adds a Python implementation of Muhash3072, a homomorphic hashing algorithm to be used for hashing the UTXO set. The Python implementation can then be used to compare behavior with the C++ version.

ACKs for top commit:
  jnewbery:
    utACK 36ec980
  laanwj:
    Code review ACK 36ec980

Tree-SHA512: a3519c6e11031174f1ae71ecd8bcc7f3be42d7fc9c84c77f2fbea7cfc5ad54fcbe10b55116ad8d9a52ac5d675640eefed3bf260c58a02f2bf3bc0d8ec208baa6
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Sep 3, 2020
36ec980 test: Add chacha20 test vectors in muhash (Fabian Jahr)
0e2b400 test: Add basic Python/C++ Muhash implementation parity unit test (Fabian Jahr)
b85543c test: Add Python MuHash3072 implementation to test framework (Pieter Wuille)
ab30cec test: Move modinv to util and add unit test (Fabian Jahr)

Pull request description:

  This is the second in a [series of pull requests](bitcoin#18000) to implement an Index for UTXO set statistics.

  This pull request adds a Python implementation of Muhash3072, a homomorphic hashing algorithm to be used for hashing the UTXO set. The Python implementation can then be used to compare behavior with the C++ version.

ACKs for top commit:
  jnewbery:
    utACK 36ec980
  laanwj:
    Code review ACK 36ec980

Tree-SHA512: a3519c6e11031174f1ae71ecd8bcc7f3be42d7fc9c84c77f2fbea7cfc5ad54fcbe10b55116ad8d9a52ac5d675640eefed3bf260c58a02f2bf3bc0d8ec208baa6
@jnewbery
Copy link
Contributor

PR 19105 is merged

laanwj added a commit that referenced this pull request Jan 7, 2021
9815332 test: Change MuHash Python implementation to match cpp version again (Fabian Jahr)
01297fb fuzz: Add MuHash consistency fuzz test (Fabian Jahr)
b111410 test: Add MuHash3072 fuzz test (Fabian Jahr)
c122527 bench: Add Muhash benchmarks (Fabian Jahr)
7b12422 test: Add MuHash3072 unit tests (Fabian Jahr)
adc708c crypto: Add MuHash3072 implementation (Fabian Jahr)
0b4d290 crypto: Add Num3072 implementation (Fabian Jahr)
589f958 build: Check for 128 bit integer support (Fabian Jahr)

Pull request description:

  This is the first split of #18000 which implements the Muhash algorithm and uses it to calculate the UTXO set hash in `gettxoutsetinfo`.

ACKs for top commit:
  laanwj:
    Code review ACK 9815332

Tree-SHA512: 4bc090738f0e3d80b74bdd8122e24a8ce80121120fd37c7e4335a73e7ba4fcd7643f2a2d559e2eebf54b8e3a3bd5f12cfb27ba61ded135fda210a07a233eae45
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jan 7, 2021
9815332 test: Change MuHash Python implementation to match cpp version again (Fabian Jahr)
01297fb fuzz: Add MuHash consistency fuzz test (Fabian Jahr)
b111410 test: Add MuHash3072 fuzz test (Fabian Jahr)
c122527 bench: Add Muhash benchmarks (Fabian Jahr)
7b12422 test: Add MuHash3072 unit tests (Fabian Jahr)
adc708c crypto: Add MuHash3072 implementation (Fabian Jahr)
0b4d290 crypto: Add Num3072 implementation (Fabian Jahr)
589f958 build: Check for 128 bit integer support (Fabian Jahr)

Pull request description:

  This is the first split of bitcoin#18000 which implements the Muhash algorithm and uses it to calculate the UTXO set hash in `gettxoutsetinfo`.

ACKs for top commit:
  laanwj:
    Code review ACK 9815332

Tree-SHA512: 4bc090738f0e3d80b74bdd8122e24a8ce80121120fd37c7e4335a73e7ba4fcd7643f2a2d559e2eebf54b8e3a3bd5f12cfb27ba61ded135fda210a07a233eae45
laanwj added a commit that referenced this pull request Feb 12, 2021
e987ae5 test: Add test for deterministic UTXO set hash results (Fabian Jahr)
6ccc8fc test: Add test for gettxoutsetinfo RPC with MuHash (Fabian Jahr)
0d3b2f6 rpc: Add hash_type MUHASH to gettxoutsetinfo (Fabian Jahr)
2474645 refactor: Separate hash and stats calculation in coinstats (Fabian Jahr)
a1fccea refactor: Improve encapsulation between MuHash3072 and Num3072 (Fabian Jahr)

Pull request description:

  This is another Pr in the series PRs for Coinstatsindex (see overview in #18000). This PR adds the `hash_type` option `muhash` to `gettxoutsetinfo` through which the user can calculate the serialized muhash of the utxo set. This PR does not use the index yet.

ACKs for top commit:
  Sjors:
    tACK e987ae5
  achow101:
    ACK e987ae5
  jonatack:
    Tested re-ACK e987ae5 per `git diff 3506d90 e987ae5`, reviewed diff, debug built, ran gettxoutsetinfo -signet and help on this branch vs master, at height 23127 both returned `hash_serialized_2` of `2b72d65f3b6efb2311f58374ea2b939abf49684d44f4bafda45faa3b5452a454` and this branch returned `muhash` of `c9f1ff12d345ccf9939c6bbf087e6f7399b6115adee1569287e9c5c43dbb475c`
  ryanofsky:
    Code review ACK e987ae5. Looks very good. I left one suggestion to simplify code, but feel free to ignore it here and maybe consider it for later since PR has already had a lot of review.

Tree-SHA512: 9a739ce375e73749fa69a467262b60d3e5314ef384e2d7150b3bbc8e4125cd9fd1db95306623bb9a632fcbaf5d9d2bf2f5cc43bf717d4ff5e2c9c4b52dd9296c
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Feb 12, 2021
e987ae5 test: Add test for deterministic UTXO set hash results (Fabian Jahr)
6ccc8fc test: Add test for gettxoutsetinfo RPC with MuHash (Fabian Jahr)
0d3b2f6 rpc: Add hash_type MUHASH to gettxoutsetinfo (Fabian Jahr)
2474645 refactor: Separate hash and stats calculation in coinstats (Fabian Jahr)
a1fccea refactor: Improve encapsulation between MuHash3072 and Num3072 (Fabian Jahr)

Pull request description:

  This is another Pr in the series PRs for Coinstatsindex (see overview in bitcoin#18000). This PR adds the `hash_type` option `muhash` to `gettxoutsetinfo` through which the user can calculate the serialized muhash of the utxo set. This PR does not use the index yet.

ACKs for top commit:
  Sjors:
    tACK e987ae5
  achow101:
    ACK e987ae5
  jonatack:
    Tested re-ACK e987ae5 per `git diff 3506d90 e987ae5`, reviewed diff, debug built, ran gettxoutsetinfo -signet and help on this branch vs master, at height 23127 both returned `hash_serialized_2` of `2b72d65f3b6efb2311f58374ea2b939abf49684d44f4bafda45faa3b5452a454` and this branch returned `muhash` of `c9f1ff12d345ccf9939c6bbf087e6f7399b6115adee1569287e9c5c43dbb475c`
  ryanofsky:
    Code review ACK e987ae5. Looks very good. I left one suggestion to simplify code, but feel free to ignore it here and maybe consider it for later since PR has already had a lot of review.

Tree-SHA512: 9a739ce375e73749fa69a467262b60d3e5314ef384e2d7150b3bbc8e4125cd9fd1db95306623bb9a632fcbaf5d9d2bf2f5cc43bf717d4ff5e2c9c4b52dd9296c
fs::path path = GetDataDir() / "indexes" / "coinstats";
fs::create_directories(path);

m_db = MakeUnique<CoinStatsIndex::DB>(path / "db", n_cache_size, f_memory, f_wipe);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rspigler
Copy link
Contributor

Chart can be updated as PR #19055 and #19145 have been merged

@fjahr
Copy link
Contributor Author

fjahr commented Apr 24, 2021

Chart can be updated as PR #19055 and #19145 have been merged

Thanks for the reminder, done!

@Rspigler
Copy link
Contributor

I don't mean to spam the PR, but the chart was updated incorrectly.

PR 19055 is merged but marked as open
PR 19181 is open but marked as merged

Thank you for all your work!

@fjahr
Copy link
Contributor Author

fjahr commented Apr 25, 2021

I don't mean to spam the PR, but the chart was updated incorrectly.

PR 19055 is merged but marked as open
PR 19181 is open but marked as merged

Thank you for all your work!

Thanks, I was too hastily. Fixed.

laanwj added a commit that referenced this pull request Apr 30, 2021
5f96d7d rpc: gettxoutsetinfo rejects hash_serialized_2 for specific height (Fabian Jahr)
23fe504 test: Add test for coinstatsindex behavior in reorgs (Fabian Jahr)
90c966b rpc: Allow gettxoutsetinfo and getblockstats for stale blocks (Fabian Jahr)
b936239 index, rpc: Add use_index option for gettxoutsetinfo (Fabian Jahr)
bb7788b test: Test coinstatsindex robustness across restarts (Fabian Jahr)
e0938c2 test: Add tests for block_info in gettxoutsetinfo (Fabian Jahr)
2501576 rpc, index: Add verbose amounts tracking to Coinstats index (Fabian Jahr)
655d929 test: add coinstatsindex getindexinfo coverage, improve current tests (Jon Atack)
ca01bb8 rpc: Add Coinstats index to getindexinfo (Fabian Jahr)
57a026c test: Add unit test for Coinstats index (Fabian Jahr)
6a4c0c0 test: Add functional test for Coinstats index (Fabian Jahr)
3f166ec rpc: gettxoutsetinfo can be requested for specific blockheights (Fabian Jahr)
3c914d5 index: Coinstats index can be activated with command line flag (Fabian Jahr)
dd58a4d index: Add Coinstats index (Fabian Jahr)
a8a46c4 refactor: Simplify ApplyStats and ApplyHash (Fabian Jahr)
9c8a265 refactor: Pass hash_type to CoinsStats in stats object (Fabian Jahr)
2e2648a crypto: Make MuHash Remove method efficient (Fabian Jahr)

Pull request description:

  This is part of the coinstats index project tracked in #18000

  While the review of the new UTXO set hash algorithm (MuHash) takes longer recently #19328 was merged which added the possibility to run `gettxoutsetinfo` with a specific hash type. As the first type it added `hash_type=none` which skips the hashing of the UTXO set altogether. This alone did not make `gettxoutsetinfo` much faster but it allows the use of an index for the remaining coin statistics even before a new hashing algorithm has been added. Credit to Sjors for the idea to take this intermediate step.

  Features summary:
  - Users can start their node with the option `-coinstatsindex` which syncs the index in the background
  - After the index is synced the user can  use `gettxoutsetinfo` with `hash_type=none` or `hash_type=muhash` and will get the response instantly out of the index
  - The user can specify a height or block hash when calling `gettxoutsetinfo` to see coin statistics at a specific block height

ACKs for top commit:
  Sjors:
    re-tACK 5f96d7d
  jonatack:
    Code review re-ACK 5f96d7d per `git range-diff 13d27b4 07201d3 5f96d7d`
  promag:
    Tested ACK 5f96d7d. Light code review ACK 5f96d7d.

Tree-SHA512: cbca78bee8e9605c19da4fbcd184625fb280200718396c694a56c7daab6f44ad23ca9fb5456d09f245d8b8d9659fdc2b3f3ce5e953c1c6cf4003dbc74c0463c2
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Aug 31, 2021
Summary:
> This is another intermediate part of the Coinstats Index (tracked in [[bitcoin/bitcoin#18000 | core#18000]]).
>
> Sjors suggested here that the part of the changes in #19145 that don't rely on the new hash_type muhash, i.e. that are for hash_type=none, could be merged separately from everything involving muhash. So these changes are extracted from #19145 here and can be merged without any other requirements.
>
> Building the index with no UTXO set hash is still valuable because gettxoutsetinfo can still be used to audit the total_amount for example. By itself this PR is not a huge improvement, hash_type=none is speeding up gettxoutsetinfo by about 10%, but it enables the implementation of an index on top of it in a follow-up and that means large parts of the index code of Coinstats Index can be merged while reviews for the hashing algorithm might take longer.

This is a backport of [[bitcoin/bitcoin#19328 | core#19328]] [1/4]
bitcoin/bitcoin@605884e

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D9978
@DrahtBot
Copy link
Contributor

There hasn't been much activity lately and the patch still needs rebase. What is the status here?
  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@Sjors
Copy link
Member

Sjors commented Dec 15, 2021

@fjahr should this be closed? If #19181 is all that's left, it can just be its own PR.

@fjahr
Copy link
Contributor Author

fjahr commented Dec 19, 2021

Closing this as almost everything that was aimed for has been implemented by now. Thanks to everyone who supported this project with advice, code reviews, or just general words of encouragement!

There are still two open PRs that are related and could use some review:

@fjahr fjahr closed this Dec 19, 2021
@bitcoin bitcoin locked and limited conversation to collaborators Dec 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.