-
Notifications
You must be signed in to change notification settings - Fork 37.1k
p2p, validation: Don't download witnesses for assumed-valid blocks when running in prune mode #27050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The head ref may contain hidden characters: "2023-02-av-wit-\u{1F9D9}"
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
src/net_processing.cpp
Outdated
uint32_t PeerManagerImpl::GetBlockFetchFlags(const Peer& peer, const CBlockIndex* index) const | ||
{ | ||
uint32_t fetch_flags = GetFetchFlags(peer); | ||
if (m_chainman.m_blockman.IsPruneMode() && WITH_LOCK(m_chainman.GetMutex(), return m_chainman.IsAssumedValid(index))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also check if we are after the first segwit block? The blocks before have no witness data so don't require any more bandwidth, and then peers can still serve directly from disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think that would be possible, although I wonder if that should happen on the server side (That is, have the server always serve pre-segwit blocks straight from disk). It doesn't seem like it makes sense to support serving pre-segwit blocks not directly from disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Related SE discussion: https://bitcoin.stackexchange.com/questions/117057/why-is-witness-data-downloaded-during-ibd-in-prune-mode
Concept ACK |
As @sipa pointed out in the Stack Overflow:
So it shifts the meaning of (our baked in)
Do you mean serving nodes that don't have the witness data? Or do you mean nodes that do (but that's already the case then)? For nodes that come fresh out of IBD we should carefully test the behavior before and after the assumevalid drops below the prune height. Because at that point we should signal |
One potential issue with this is that when we write blocks to disk, we would be writing them without witness data. I think this could create a problem if you shut down a node in the middle of IBD and restart it with a different assumevalid value (such as disabling assumevalid) -- we would think we have a block that is invalid, because we're missing the witnesses. While it's an interesting thought experiment to see how much work would be required to optimize for this use case, I'm a bit skeptical that the complexity of implementing this optimization is worth it. But if you keep working on it and get to something that you think is robust I would be curious to see what it looks like. |
I'm not sure skipping size checks is substantially different from skipping the script evaluation for security purposes. But I also tend to agree with @sdaftuar that the complexity needed here probably isn't worth it. Additionally, some thought would need to be given to privacy implications (we already identify version explicitly, but do we want to make full history validation an externally-visible attribute?). |
Good point; it may require some flag in the blockindex to note "valid modulo the possible the lack of witness data".
It's a pretty substantial savings in IBD bandwidth; in the last 100000 blocks, 43% of the serialized size is in witness data (47.8 GiB out of 110.6 GiB). Maybe that makes it worth looking into more?
Agree. I think it's important to note that it's a change in what constitutes assumevalid, but it's not like it's harder to run validation for the new set of things covered by it (in fact, it's hard to just check script validity without checking the rest...).
The change only affects IBD, and only affects blocks that won't be available to peers anymore once IBD is over (even in the scenario of interrupting and changing assumevalid in between). Are there other privacy concerns you're worried about? |
Well I was more focused on my perception of the complexity than the benefit I guess, but you may be right. I have a number of thoughts that likely could be addressed if we can think about these issues clearly enough, but some things to consider:
When I first started thinking about this, I was contemplating that assumeutxo would be where we should spend our engineering resources, but now that I think a little more, maybe this feature (if we can design it properly) would compose nicely with assumeutxo (specifically around how much background validation happens when a node downloads the chain from scratch -- maybe we could allow users to have different levels of validation like we do today). |
Frankly, we should probably have a way to indicate this for every softfork anyway. And in that sense, it probably makes more sense as a per-rule-introduction range of blocks that have been checked, rather than a flag on each block index entry. But that may be complicated due to the existence of multiple conflicting chains. (It might be possible to "cheat" on this issue for now and simply require those blocks to get pruned immediately.)
My thought is that during IBD, your peers will know you're doing an assumevalid IBD. This might actually rise to the level of a security issue since they then know they might be able to get away with feeding you blocks with invalid witnesses... |
@luke-jr wrote:
I don't believe so, because up until you've synced headers beyond the assumevalid block, your node is indistinguishable. Once you have the headers, your node has the initiative and is going to ask specific blocks. (And you don't need the witness data itself to check the block hash). |
Concept ACK |
My comment was about serving nodes that store witnesses, because we do not support an archival full node mode that doesn't keep the witnesses.
Yes even currently, if asked for blocks without witnesses then we need to read, un-serialize and re-serialize without the witnesses. I was trying to imply that this does not happen very often currently, as (Bitcoin Core) nodes will always ask for blocks with witnesses from peers that signal Thank you all for the discussion so far! I updated the PR description to include the edge cases and suggestions mentioned here as TODOs. |
It's important to recognize that this is a security reduction, even though it only applies to assumed-valid blocks: one of the reasons why assume-valid is acceptable, is because the data that we're assuming is valid is widely available for independent auditing and our peers can't distinguish between nodes that are and are not actually checking it. Skipping the downloading of witnesses negates that security by making it possible to sync a pruned node without anyone having that data available. A concrete example where this could become relevant is in the event that courts order Bitcoin node operators to not serve specific blocks that are claimed to contain illegal data: an adversary can easily spin up patched Bitcoin nodes that do not serve those witnesses, in conjunction with the distribution of a new assumed-valid hash. Similarly, in conjunction with the above - or as a stand alone action - courts can simply order Bitcoin devs/node runners to use an assumed valid hash that contains a transaction with an invalid signature to confiscate/"recover" funds. This of course is a possible outcome of the Craig Wright case. The difference with this patch from the status quo, is we'd already be shipping software that without any user intervention other than running with the appropriate assumed valid flag, would appear to work even though it was impossible for anyone to prove that something nefarious was going on. NACK - This is not worth the consequences for a 43% savings. |
Not without also putting enough PoW in the chain with the alternative assumed-valid hash. This attack doesn't seem cheaper than a regular 51% large reorg. I do agree with these two more general concerns, but I'm not sure if they apply here, because the proposal is limited to pruned nodes:
Archival nodes will still have it. Pruned nodes only have it for recent blocks, just like they do for non-witness data.
This is partially the case, because nodes won't reveal if they're going to ask for the witness data until they have the headers. It also helps if during IBD we don't reveal whether the node is going to be pruned or not (I assume we set |
On February 9, 2023 12:57:27 PM GMT+01:00, Sjors Provoost ***@***.***> wrote:
> an adversary can easily spin up patched Bitcoin nodes that do not serve those witnesses, in conjunction with the distribution of a new assumed-valid hash.
Not without also putting enough PoW in the chain with the alternative assumed-valid hash. This attack doesn't seem cheaper than a regular 51% large reorg.
The status quo does not trust miners. What you are suggesting does trust miners.
Recently the US based Foundry USA approached 50% of hash power, and it appears that much of the hashing power pointed to them is contractually obliged to mine at Foundry. Assuming that miners only mine valid blocks is not a good idea.
Another way of looking at it is if you aren't validating signatures, at least indirectly, why are you even bothering to download witnesses? Why not just use a UTXO snapshot? You aren't even doing a good job of inflation control: theft of lost coins _is_ an example of inflation.
|
To clarify, so the scenario you have in mind here is a prolonged >51% attack involving blocks with invalid witness data, where in addition they have coopted Bitcoin Core to ship a false No existing node would follow that chain. Nor would any unpruned node. The concern here is about freshly started pruned nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK; saving a substantial amount of download bandwidth for data we're not going to verify and will discard anyway seems worthwhile.
I agree with luke's comment on recording whether we downloaded (and validated) witness data and on how to do that. (Recording (rule-name, hash, since-height)
might work -- that would create an extra entry for each rule for each valid-fork entry in getchaintips, I think, which seems okay)
I think it would probably make sense to have a separate option, say -prepruneassumevalidwitness
(softset to on when -prune
is above 550 and assumevalid is set), so that people can turn off this behaviour without sacrificing ibd performance, for when they're doing weird special cases (particularly if they're running a recently released version with an assumevalid block that's only a couple of months old).
src/validation.cpp
Outdated
@@ -3612,7 +3612,8 @@ static bool ContextualCheckBlock(const CBlock& block, BlockValidationState& stat | |||
// {0xaa, 0x21, 0xa9, 0xed}, and the following 32 bytes are SHA256^2(witness root, witness reserved value). In case there are | |||
// multiple, the last one is used. | |||
bool fHaveWitness = false; | |||
if (DeploymentActiveAfter(pindexPrev, chainman, Consensus::DEPLOYMENT_SEGWIT)) { | |||
if (DeploymentActiveAfter(pindexPrev, chainman, Consensus::DEPLOYMENT_SEGWIT) && | |||
(!chainman.m_blockman.IsPruneMode() || WITH_LOCK(chainman.GetMutex(), return !chainman.IsAssumedValid(block.GetHash())))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps clearer to write
bool check_witness = DeploymentActiveAfter(..);
if (check_witness && IsPruneMode()) {
LOCK(chainman.GetMutex());
if (IsAssumedValid()) check_witness = false;
}
if (check_witness) { ... }
It's true that, if witness data from Assumevalid blocks is not available, the new type of pruned node will be depositing more trust on a previously reviewed Assumevalid (the specific release of Assumevalid cannot be audited now because the witnesses are missing). So the trust assumption here is: If there is no archival node accessible, we still trust an Assumevalid release that was audited/reviewed although it's not possible to currently audit it. This is true because archival nodes, by definition, store all witness data. So if there's no archival node but, for some reason there are nodes serving all non-witness data and witness data from non-Assumevalid blocks, this type of pruned node will operate under a stronger assumption (that a previous Assumevalid release is valid, although not auditable). However Bitcoin is already operating under the assumption that archival nodes are available. If they weren't available no full node would be able to sync. If the new pruned node type is introduced, these nodes could still sync if only witness data from Assumevalid blocks is missing. Ruling out a systematic failure that makes all archival nodes to lose only witness data from Assumevalid blocks, the only way something like that could happen is if all archival nodes decided to selectively delete/not serve Assumevalid witness data, but store/serve everything else. If no node served historic data in general (witness and non-witness), it would be impossible for our new type of pruned node to sync, and if all "archival" nodes were pre-segwit, then our node couldn't sync either. It seems to me that this doesn't change anything fundamentally. I could of course be wrong. |
@petertodd Thanks for commenting and challenging the approach! If I understand correctly, your main concern is about witness availability. As it currently stands that availability is required to sync a full node (pruned or not), as we are checking that the witness merkle root that is committed to matches with the witnesses that are actually downloaded. So even though we are skipping script validation, we require the exact witnesses (which we are assuming to be valid) to be available. By enabling pruned nodes to not download the witnesses (which requires skipping the witness merkle root check), we remove the requirement of witness availability (at least for assumed-valid blocks) to sync a pruned node. Witness availability is obviously important to be able to audit the chain with You are not arguing that stopping to request witnesses (i.e. removing the need to have them available for syncing) somehow makes them less available, right? My thoughts on this would be similar/the same to what Sjors said: pruned nodes already don't serve historical witnesses and this PR does not prevent archival nodes from storing/serving them, so IMO witness availability is not directly threatened by this PR. iiuc, you are saying that removing the witness availability requirement makes it possible to assume a chain valid without ever seeing the invalid witnesses (since witnesses are no longer forced to be revealed). This seems entirely possible but updating the default assume valid hash in Bitcoin Core always involves auditing (I would expect anyone to audit their own choice of that hash as well) and a chain that fails the audit due to witness unavailability would not be made the default. Of course like you say, a court could try to mandate the inclusion of a specific assume-valid hash but that just leads us to the final backstop we always have: As long as a economic majority can't be forced to upgrade their nodes, the new rules won't be adopted. Under the assumption that that is in fact possible, a court can mandate worse things than an invalid assume-valid hash. It is currently not possible to set the assume-valid hash to a chain with unavailable witnesses, however it is currently possible to set it to a chain with available but invalid witnesses. What is worse, unavailable or invalid witnesses? I would argue, that they are equally bad because they can achieve the exact same nefarious outcomes.
Being forced to assume a chain valid which I can't audit (due to witness unavailability), sounds like a clear indication of something nefarious going on to me (but yea, one couldn't prove that). |
One thing we could do to mitigate the concern of missing witness data, is to download it for a small random sample of blocks. When that fails, we fall back to redownloading all witness data (which would presumably also fail). So in a scenario where witness data is really gone, pruned nodes will behave the same as unpruned nodes during IBD. |
Concept ACK. W.r.t. the security reduction pointed out by @petertodd, I agree with the @JoseSK999 comment almost entirely. I agree with @sdaftuar it plays nicely together with assumeutxo. |
Notice how in the assume-UTXO work, we've chosen to not even give users the ability to choose their own assumed-valid UTXO set. Instead, the hashes are fixed by Bitcoin Core because we don't trust users to audit their own choice of UTXO set. @naumenkogs assumeutxo is much more likely to "help node running culture" in terms of getting more pruned nodes up and running. And it has the advantage of having much more clear security considerations: you are trusting the assumed UTXO. Anyway, a way to mitigate these concerns about the reduction in security margin would be to simply put "skip-witness" mode behind a command line flag and leave it default off. Remember that many, probably most, nodes have ample bandwidth available and thus won't actually see an improvement from skipping witness downloads. That's certainly been the case for most of my nodes, which are mostly in data centers, and have been IO/CPU limited during IBD, not bandwidth limited. "Node-in-a-box" solutions like Umbrel and Start9, more frequently used on slower home internet connections and Tor, can consider enabling the skip-witness mode by default (or perhaps after a bandwidth test). |
Making witness skipping opt-in makes sense. In addition our GUI initial setup wizard could help the user with this (as it does with suggesting pruning). |
Could we please pick a different name for old witnessless syncing than "pruned"? It's different from the existing pruned node type that entirely discards old blocks. Old witnessless nodes could reindex and rescan as well as provide old witnessless blocks to other nodes doing IBD of this type. We need a new name for this. |
@wtogami This PR does not introduce a new witnessless archival mode (which is what you are describing iiuc). It only makes pruned nodes skip downloading the witnesses up to the assume-valid point (since they're not validated and deleted shortly after anyway). I would agree that witnessless archival nodes should not be called "pruned", but again that is not what this PR is introducing. |
1abf5b7
to
3bb9d66
Compare
3bb9d66
to
7f1b8ed
Compare
Love the adversarial thinking shared in quoted comments by Peter Todd leaving aside the emotions. NACK I do not agree with the changes made in this pull request
Maybe we can have such useless options similar to the patch created by Luke Dashjr recently Note: Default matters a lot in bitcoin core and its used by 99% nodes. Nobody wants to play with their money and use different values unless they know what they are doing. |
The BlockStatus::BLOCK_PRUNED_WITNESSES flag is used to indicate when a post-segwit block was stored on disk without its witnesses.
…ed without witnesses It can happen that no blocks were actually pruned while starting up in prune mode (e.g. manual prune mode with `-prune=1`) and when restarting with `-prune=0` we have to redownload if we stored blocks without their witnesses.
…s in prune mode This commit widens what we assume to be valid as part of our assume-valid settings for *pruned* full nodes. Previously only script checking was skipped (i.e they were assumed valid), but with this commit the witness merkle root is no longer validated and therefore assumed valid as well.
…e witnesses in prune mode We want to avoid announcing and relaying blocks for which we do not have the witnesses. Otherwise a peer might request the block expecting it to include witnesses and disconnect us for failing to deliver. TODO With the default max tip age of 24h it is *very* unlikely that we will announce assumed-valid (i.e. witnessless) blocks while in prune mode. So this commit is a bit of a belt and supenders, or maybe not even necessary at all?
… prune mode If we are in prune mode, we can skip downloading witnesses for assumed-valid blocks, as we will not be using the witnesses and pruning/deleting them shortly after anyway.
7f1b8ed
to
5d91fae
Compare
🐙 This pull request conflicts with the target branch and needs rebase. |
@@ -364,6 +364,12 @@ bool BlockManager::LoadBlockIndexDB(const Consensus::Params& consensus_params) | |||
LogPrintf("LoadBlockIndexDB(): Block files have previously been pruned\n"); | |||
} | |||
|
|||
// Check whether we have ever pruned block & undo files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what this comment means.
"ever" could be a misspelt "every". If that's the case, best to change it to from 'every' to 'all the'
Can be marked up for grabs |
What about adding a mode that skips downloading witness data for assumed-valid blocks and also discards downloaded witness data after validating it (for non assumed-valid blocks), but doesn't prune non-witness data at all? |
I got a little nerd sniped by this idea, so I created this draft. I have done very little testing so far (syncing signet worked, currently syncing mainnet).
This PR does two things when running in prune mode: (a) assume witness merkle roots to be valid for assumed-valid blocks and (b) request assumed-valid blocks without
MSG_WITNESS_FLAG
.In theory this is a good idea, because witnesses are not validated (i.e. they are assumed valid up to a certain block
-assumevalid
) and get pruned anyway when running in prune mode, so not downloading them (for assumed-valid blocks) reduces the bandwidth that is needed for IBD (don't have any numbers on this yet).One downside is that nodes serving blocks without witnesses can't serve them directly from disk. They have to un-serialize and re-serialize without the witnesses before sending them.
Even though this is a draft, conceptual and approach review is very welcome.
TODO: