New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
policy: make unstructured annex standard #27926
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
f0a2db2
to
b4421da
Compare
Think this is failing CI because of an annex nonstandard check bitcoin/test/functional/feature_taproot.py Line 1209 in f1b4975
|
@Crypt-iQ thanks for pointing that out. I need to get used to this repository still. Will definitely fix tests, but looking for concept ack first. |
Use case? |
@luke-jr annex covenants, see link in PR description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC from #19645 and beyond, witnesses could be concatenate on the flight by transaction-relaying node, therefore ensuring the best feerate transaction propagates. As it’s opening it’s own wormhole of DoS issues, better to pursue the conversation on the mailing list (like doing or not sounds to impact L2/transaction-relay propagation and hardware signers, I believe).
src/policy/policy.cpp
Outdated
|
||
// Calculate the size of the unstructured data by subtracting | ||
// one for the 0x00 signaling byte. | ||
size_t unstructured_annex_size = annex_size - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we might prefer to not discount the 0x00 signaling byte from the 265 byte anti-annex-inflation-attacks limit, otherwise it sounds to me we’re introducing a coupling effect between the max witness size one can sign for and the tag signaling byte format itself. Such tag signaling byte format might acquire consensus semantic in the future, and I think we would prefer to reserve the evolvability in function of consensus changes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that it might be good to choose a number of actually useable bytes that is a multiple of a commonly used data structure sizes such as 32 bytes. Therefore 256. It is an arbitrary number though that it supposed to strike a balance between the limiting the potency of annex inflation attacks and having enough space to make the annex useful.
Are you proposing to limit the annex size including any tags to 257 bytes? I see the conceptual difference, but there wouldn't be a functional difference for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the constant defined as suggested below, this suggestion does seem to make reading easier. Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you proposing to limit the annex size including any tags to 257 bytes? I see the conceptual difference, but there wouldn't be a functional difference for now?
Yes, for now it doesn’t make a functional difference. The issue I have in mind is the following, assuming the unstructured annex format is deployed and we see applications building on top of it, the 256 (or 257 now) annex size limit will have to become a max-size data payload, which is going to leak in the all application toolchains (in the same sense than TCP’s 64k bytes max frame is often used as the limit for application transport design).
As of today, the signaling byte size is of 1 byte, however if in the future we have a consensus definition of the tag being more than 1 byte, we might have to swallow the 255 bytes reserved for the application data-payload. Not swallowing on the 255 bytes would mean the consensus-aware application will have to pay the witness cost for the 0x00 signaling byte and this byte will have to be discarded by parsing logic.
So if those observations are correct, I think the constant could have some safety margin padding for changes of the “structured/unstructured guard” signaling byte size. One can think 256 byte reserved for application-data + 32 byte of padding, from which this 1st byte of padding will be the current 0x00
one ? And I think that way we can guarantee to the application a multiple of a 32-byte based data structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I fully get what you mean. Can you give a concrete example of a future extended format that might interfere with the 0x00 + unstructured proposal? My thought was to simply reserve all non-0x00 first bytes for future use.
When adding 32 bytes of padding, do you mean that initially (in a world with only unstructured annex data), 31 of those bytes are wasted but paid for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a concrete example of a future extended format that might interfere with the 0x00 + unstructured proposal?
Sure any future serialization format where the semantic “tag” is encoded on more than 8-bit to allow more than 256 types of payload without having to picked up which are the “most-used” 256 one when the consensus change. This is actually the serialization format on which the TLV proposal of the annex (with some other discussed tweaks for the length), see comment above VarIntMode
in src/serialize.h
.
From my understanding, if we’re following your approach where non-0x00
first bytes are reserved for future use, and let’s say 0x01
is for TLV semantic of the annex field, it means all the annex payload with consensus enforcement will have to pay the feerate cost of the 1 byte marker (either 0x00
or 0x01
).
So I’m thinking the advantage of multi-bytes padding we reserve a quantitatively significant number of bytes for the eventual adoption of a multi-bytes serialization format. This comes at the burden of unstructured data having to pay for the cost of the reserve byte padding toda while making tomorrow potential consensus enforced annex payload cheaper, I think.
Of course 32-byte might be considered to high a reserve byte padding to make the usage of the annex attractive for unstructured data applications. Yet the question we might have to answer is how much consensus “tag” we might need to preserve for now, let’s say in the future we have an automated consensus mechanism where people can “burn” satoshis / coins to assign an identifier to an annex tag (weird use-cases that has been proposed in the early days of Bitcoin for namespaces experiment iirc). I don’t know the answer to this “how much tag" question.
src/policy/policy.cpp
Outdated
// Unstructured annexes are nonstandard if they are larger than | ||
// 256 bytes. This limits the potential of annex inflation | ||
// attacks. | ||
if (unstructured_annex_size > 256) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can constify the value, e.g PER_INPUT_MAX_ANNEX_SIZE
, generally this is unclear if we would like a per-input annex limit and a max transaction all annexes limits, e.g MAX_ANNEX_BUDGET
.
Splitting the limit enables flow where inputs can be combined in a non-deterministic order, e.g BIP174’s combiner, and reject in function of “witness weight slots” that can be distributed by such combiner, which makes sense to coordinate flows between hardware signers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added MAX_PER_INPUT_ANNEX_SIZE
.
Not sure if a per tx maximum is necessary. I'd say that opting in to annex usage has some risks to be aware of. Maybe you don't want to use it for multi-party protocols at all because of that. To me, MAX_PER_INPUT_ANNEX_SIZE
is already debatable when you have opt-in, and MAX_ANNEX_BUDGET
feels even more like an unnecessary precaution. In the end, this is all policy anyway and not a real protection against tx outside of these boundaries appearing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you don't want to use it for multi-party protocols at all because of that.
I think that’s a good question, there is an annex inflation risk concerning the multi-party protocol users and there is a new CPU DoS risk that is faced by node operators due to the increased in annex data size. I think this CPU DoS risk is coming from the fact that with BIP341, there is a sha_annex
in the transaction digest if the annex is present and therefore you have a new SHA-256 ops and data bytes to hash that have to get done by the nodes.
I don’t think this CPU DoS risk, if correct, is that much an increase in DoS surface, though a MAX_ANNEX_BUDGET
can be a good conservative “belt-and-suspender” DoS limits, like we have a lot in Bitcoin Core.
In the end, this is all policy anyway and not a real protection against tx outside of these boundaries appearing.
Yes if you’re considering the risk faced by multi-party protocol users, policy is a shallow protection for transaction appearing outside those boundaries by being directly relayed to the miners. However, if you’re considering the CPU DoS protection for node operators this is a real protection, especially in a world where you might have a lot of hobbyist nodes running on resources-constraint hosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think this CPU DoS risk, if correct, is that much an increase in DoS surface, though a MAX_ANNEX_BUDGET can be a good conservative “belt-and-suspender” DoS limits, like we have a lot in Bitcoin Core.
If it isn't that much of an increase, is it worth adding extra code for it? More code, more things to go wrong. In the end, they are also paying for this tx in sats and the computation cost only increases linearly with the number of inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it isn't that much of an increase, is it worth adding extra code for it?
From my experience of DoS discussion on the Core-side, even if we’re able to come with sound evaluation of the worst-case inputs payload one might submit during a time period, the host performance are always an unknown as you have low perf Umbrel and Raspy, and somehow preserving their processing capabilities has always been pursued during Bitcoin Core development for technical decentralization purpose, from my understanding.
There was a lot of dumb DoS vectors in the early days of Bitcoin Core, see https://en.bitcoin.it/wiki/Common_Vulnerabilities_and_Exposures. The codebase has still a reasonable number of conservative belt-and-suspender policy checks. Even if you’re paying for this DoSy malicious input in term of fees, they might inflict more damages than the attacker cost (e.g knocking-out nodes during a fork deployment).
const auto& annex = SpanPopBack(stack); | ||
|
||
// Check that the annex is standard. | ||
if (!IsAnnexStandard(annex)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to have a annexrelay
option in src/init.cpp
with DEFAULT_ANNEXRELAY_ENABLE=false
, therefore to have node operators running nodes on performance-constrained hosts to opt-out from DoS concerns arising from the processing of the annex itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which DoS concerns are you referring to? Since #24007 hasn't landed yet, the tx can only be accepted once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the CPU DoS concerned mentioned above about BIP341’s sha_annex
, if annexes start to be relayed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's clear now. Thanks.
Same as with MAX_ANNEX_BUDGET
, my feeling is that this safeguard isn't necessary because the extra cost is only linear and paid for.
@@ -289,6 +332,12 @@ bool IsWitnessStandard(const CTransaction& tx, const CCoinsViewCache& mapInputs) | |||
} | |||
} | |||
} | |||
|
|||
// If any inputs commit to an annex, all inputs must commit to an annex. | |||
if (annex_input_count > 0 && annex_input_count != tx.vin.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we introduce a limit on the number of inputs supported in a transaction opted-in annex ? To make it easier to reason on worst-case scenario from a second-layer perspective in case of multi-party flow e.g dual-funding. This number of inputs could be tied to the transaction’s nVersion
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My answer here would be the same as in #27926 (comment). I think you can argue that opt-in is enough protection and multi-party flows with annexes may not be a good idea. Especially if there are no concrete plans for multi-party protocols that require the unique properties of the annex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially if there are no concrete plans for multi-party protocols that require the unique properties of the annex.
If you’re taking the use-case of unstructured data discussed on the mailing list, I think there is still a leak in term of fee-bumping reserves than a user must provision for, in case of the worst annex inflation attacks being done, under known limits of PER_INPUT_MAX_ANNEX_SIZE
(and eventual MAX_ANNEX_BUDGET
).
Of course, as you’re raising you can always have an adversarial transaction being broadcast directly to the mining pools mempools, though in a world where miners select the block template (i.e more with Stratum V2 deployment), this is a high technical bar for an annex inflation adversary. So the “default” policy limit on the max annex size is far from second-order considerations on the level of fee-bumping reserves than one must maintained to prevent timevalue DoS.
(I don’t deny this is quite a sophisticated discussion to encompass the impact on fee-bumping reserves, still the kind of realistic threat model we’re considering for LN today).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you’re taking the use-case of unstructured data discussed on the mailing list
Which one do you refer to exactly, the timelocked vaults using presigned txes? For that there is only a single party, so the inflation attack doesn't apply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the timelocked vaults, yes there is a single party using presigned txes so it shouldn’t apply. For the other use-case e.g anchoring symbolic data in the chain, I think you might have multiple parties contributing to the transaction, though I’m less knowledgeable about those use-cases to be honest.
If there is the current annex deployment is not targeted for multi-party protocols, yes we don’t have discuss of the impact of annex DoS on fee-bumping reserve.
@@ -208,11 +208,45 @@ bool AreInputsStandard(const CTransaction& tx, const CCoinsViewCache& mapInputs) | |||
return true; | |||
} | |||
|
|||
bool IsAnnexStandard(const std::vector<unsigned char>& annex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to do it if others agree that the current annex check isn't too minimal for its own file.
e100018
to
ac119a5
Compare
ac119a5
to
e71f939
Compare
Updated |
Left a new round of reviews, I think the following conceptual concerns are pending:
I still have to reply on the latest mail post about standardizing the annex, which I’m aiming to do so. I think at that stage the proposal would benefit from an “applications" BIP draft that would give one or two use-cases in detail, and therefore serves as technical tie-breaker for the conceptual concerns raised above. edited to add one concern from the mailing list |
I agree that use cases can help guide the decision making in this PR. That's why I created the demo/explanation mentioned in the PR description: https://github.com/joostjager/annex-covenants. |
I understand how the spend transaction can be re-built from the I still have a concern on the economics of such a proposal as “archive” data like witness might be discarded by full-nodes, so if the scheme become generalized we might see this subset of validation data pruned by the majority of full-nodes, at the very least when it’s older than X. I think the incentives to store historical chain data for regular full-nodes are quite low, and one could argue a scheme could be introduced where this “unstructured data” must be re-paid for by a new spend every Y, therefore generating on-chain fees. I think “unstructured data” is still compelling to improve self-custody solutions like vaults, especially if we can have a future where all you need to backup is a least 12 words (and here mnemotechnic can be used if physical storage is a concern) and access to the chain data to safe-guard $$$ of satoshis value and standard tools like the PoC you built. Yet I’m not sure this “reliable backup” advantage for advanced custody solutions (where the locking script is not composed only of standard scripts and keys e.g taproot script-path spend) is well-understood across the ecosystem, cf Dave Harding comment on the mailing list thread “don’t use the block chain to store backup data”. |
Clearly data backed up in this way isn't guaranteed forever and forever free. However, for specific users this might still represent the most feasible option available. At the very least chain backup can serve as a safety net in case a conventional backup that is fully under control of the user somehow gets lost or becomes inaccessible. |
Chain backup servicing as a safety net definitely is interesting for witness and chain of transactions state or scarce collectible, when the value of the coins locked or “digital items” is worth more than a range of block miners fees. At least in term of Bitcoin crypto-economics IMHO, the good equilibrium with full-nodes resources I don’t know. |
@fanquake If you can reference this PR and this mailing list thread: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-June/thread.html started by the author of this PR in #28130 as I think the conversation is capturing good technical insights in practical ways to improve data carriage and archiving in the Bitcoin ecosystem. I was thinking to do it in #28130 though you locked the PR before. |
🐙 This pull request conflicts with the target branch and needs rebase. |
@joostjager Bitcoin has this predicate issue (since genesis) and it is necessary to perform a risk analysis about this for every proposed change. |
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
Closing for now due to inactivity since August. Leave a comment below, if you are working on this again and want it reopened. |
This PR opens up the annex in a limited way:
If any one of the inputs commits to an annex, all inputs must. This is a mechanism to opt-in to annex use, preventing participants in a multi-party transaction from adding an unexpected annex which would increase the tx weight and lower the fee rate. See https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-June/021736.html
Allow empty annexes, to let users opt-in to annex usage with the minimum number of bytes.
Additionally, only allow annexes that start with
0x00
. This keeps open the possibility to support structured annexes such as for example the tlv format proposed in BIPXXX: Taproot Annex Format bips#1381. A potential future structured format would start with a non-zero byte.Limit the maximum size of unstructured annex data to 256 bytes. Including the annex tag 0x50 and the unstructured data tag 0x00, this makes for a maximum total witness element size of 258 bytes. This upper limit somewhat protects participants in a multi-party transaction that uses the annex against annex inflation.
The constant 256 is chosen so that it provides enough space to accommodate several schnorr signatures. This is useful for implementing annex covenants with multiple presigned transactions. For a PoC of a single transaction annex covenant, see https://github.com/joostjager/annex-covenants
Todo:
Related: