policy: make unstructured annex standard #27926

joostjager · 2023-06-21T11:23:26Z

This PR opens up the annex in a limited way:

If any one of the inputs commits to an annex, all inputs must. This is a mechanism to opt-in to annex use, preventing participants in a multi-party transaction from adding an unexpected annex which would increase the tx weight and lower the fee rate. See https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-June/021736.html
Allow empty annexes, to let users opt-in to annex usage with the minimum number of bytes.
Additionally, only allow annexes that start with 0x00. This keeps open the possibility to support structured annexes such as for example the tlv format proposed in BIPXXX: Taproot Annex Format bips#1381. A potential future structured format would start with a non-zero byte.
Limit the maximum size of unstructured annex data to 256 bytes. Including the annex tag 0x50 and the unstructured data tag 0x00, this makes for a maximum total witness element size of 258 bytes. This upper limit somewhat protects participants in a multi-party transaction that uses the annex against annex inflation.

The constant 256 is chosen so that it provides enough space to accommodate several schnorr signatures. This is useful for implementing annex covenants with multiple presigned transactions. For a PoC of a single transaction annex covenant, see https://github.com/joostjager/annex-covenants

Todo:

fix/add tests

Max 126 byte unstructured annex without opt-in: Add annex data carrier option behind -annexcarrier option bitcoin-inquisition/bitcoin#22

DrahtBot · 2023-06-21T11:23:29Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#28246 (wallet: Use CTxDestination in CRecipient instead of just scriptPubKey by achow101)
#28244 (Break up script/standard.{h/cpp} by achow101)
#26291 (Update MANDATORY_SCRIPT_VERIFY_FLAGS by ajtowns)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

src/policy/policy.cpp

Crypt-iQ · 2023-06-21T15:56:09Z

Think this is failing CI because of an annex nonstandard check

bitcoin/test/functional/feature_taproot.py

Line 1209 in f1b4975

def spenders_taproot_nonstandard():

joostjager · 2023-06-21T16:41:07Z

@Crypt-iQ thanks for pointing that out. I need to get used to this repository still. Will definitely fix tests, but looking for concept ack first.

luke-jr · 2023-06-24T14:37:25Z

Use case?

joostjager · 2023-06-25T20:38:53Z

@luke-jr annex covenants, see link in PR description

ariard

IIRC from #19645 and beyond, witnesses could be concatenate on the flight by transaction-relaying node, therefore ensuring the best feerate transaction propagates. As it’s opening it’s own wormhole of DoS issues, better to pursue the conversation on the mailing list (like doing or not sounds to impact L2/transaction-relay propagation and hardware signers, I believe).

ariard · 2023-06-26T00:29:56Z

src/policy/policy.cpp

+
+    // Calculate the size of the unstructured data by subtracting
+    // one for the 0x00 signaling byte.
+    size_t unstructured_annex_size = annex_size - 1;


I think here we might prefer to not discount the 0x00 signaling byte from the 265 byte anti-annex-inflation-attacks limit, otherwise it sounds to me we’re introducing a coupling effect between the max witness size one can sign for and the tag signaling byte format itself. Such tag signaling byte format might acquire consensus semantic in the future, and I think we would prefer to reserve the evolvability in function of consensus changes here.

My thought was that it might be good to choose a number of actually useable bytes that is a multiple of a commonly used data structure sizes such as 32 bytes. Therefore 256. It is an arbitrary number though that it supposed to strike a balance between the limiting the potency of annex inflation attacks and having enough space to make the annex useful.

Are you proposing to limit the annex size including any tags to 257 bytes? I see the conceptual difference, but there wouldn't be a functional difference for now?

With the constant defined as suggested below, this suggestion does seem to make reading easier. Changed.

Are you proposing to limit the annex size including any tags to 257 bytes? I see the conceptual difference, but there wouldn't be a functional difference for now?

Yes, for now it doesn’t make a functional difference. The issue I have in mind is the following, assuming the unstructured annex format is deployed and we see applications building on top of it, the 256 (or 257 now) annex size limit will have to become a max-size data payload, which is going to leak in the all application toolchains (in the same sense than TCP’s 64k bytes max frame is often used as the limit for application transport design).

As of today, the signaling byte size is of 1 byte, however if in the future we have a consensus definition of the tag being more than 1 byte, we might have to swallow the 255 bytes reserved for the application data-payload. Not swallowing on the 255 bytes would mean the consensus-aware application will have to pay the witness cost for the 0x00 signaling byte and this byte will have to be discarded by parsing logic.

So if those observations are correct, I think the constant could have some safety margin padding for changes of the “structured/unstructured guard” signaling byte size. One can think 256 byte reserved for application-data + 32 byte of padding, from which this 1st byte of padding will be the current 0x00 one ? And I think that way we can guarantee to the application a multiple of a 32-byte based data structure.

I am not sure if I fully get what you mean. Can you give a concrete example of a future extended format that might interfere with the 0x00 + unstructured proposal? My thought was to simply reserve all non-0x00 first bytes for future use.

When adding 32 bytes of padding, do you mean that initially (in a world with only unstructured annex data), 31 of those bytes are wasted but paid for?

Can you give a concrete example of a future extended format that might interfere with the 0x00 + unstructured proposal?

Sure any future serialization format where the semantic “tag” is encoded on more than 8-bit to allow more than 256 types of payload without having to picked up which are the “most-used” 256 one when the consensus change. This is actually the serialization format on which the TLV proposal of the annex (with some other discussed tweaks for the length), see comment above VarIntMode in src/serialize.h.

From my understanding, if we’re following your approach where non-0x00 first bytes are reserved for future use, and let’s say 0x01 is for TLV semantic of the annex field, it means all the annex payload with consensus enforcement will have to pay the feerate cost of the 1 byte marker (either 0x00 or 0x01).

So I’m thinking the advantage of multi-bytes padding we reserve a quantitatively significant number of bytes for the eventual adoption of a multi-bytes serialization format. This comes at the burden of unstructured data having to pay for the cost of the reserve byte padding toda while making tomorrow potential consensus enforced annex payload cheaper, I think.

Of course 32-byte might be considered to high a reserve byte padding to make the usage of the annex attractive for unstructured data applications. Yet the question we might have to answer is how much consensus “tag” we might need to preserve for now, let’s say in the future we have an automated consensus mechanism where people can “burn” satoshis / coins to assign an identifier to an annex tag (weird use-cases that has been proposed in the early days of Bitcoin for namespaces experiment iirc). I don’t know the answer to this “how much tag" question.

ariard · 2023-06-26T00:51:48Z

src/policy/policy.cpp

+    // Unstructured annexes are nonstandard if they are larger than
+    // 256 bytes. This limits the potential of annex inflation
+    // attacks.
+    if (unstructured_annex_size > 256) {


I think we can constify the value, e.g PER_INPUT_MAX_ANNEX_SIZE, generally this is unclear if we would like a per-input annex limit and a max transaction all annexes limits, e.g MAX_ANNEX_BUDGET.

Splitting the limit enables flow where inputs can be combined in a non-deterministic order, e.g BIP174’s combiner, and reject in function of “witness weight slots” that can be distributed by such combiner, which makes sense to coordinate flows between hardware signers.

Added MAX_PER_INPUT_ANNEX_SIZE.

Not sure if a per tx maximum is necessary. I'd say that opting in to annex usage has some risks to be aware of. Maybe you don't want to use it for multi-party protocols at all because of that. To me, MAX_PER_INPUT_ANNEX_SIZE is already debatable when you have opt-in, and MAX_ANNEX_BUDGET feels even more like an unnecessary precaution. In the end, this is all policy anyway and not a real protection against tx outside of these boundaries appearing.

Maybe you don't want to use it for multi-party protocols at all because of that.

I think that’s a good question, there is an annex inflation risk concerning the multi-party protocol users and there is a new CPU DoS risk that is faced by node operators due to the increased in annex data size. I think this CPU DoS risk is coming from the fact that with BIP341, there is a sha_annex in the transaction digest if the annex is present and therefore you have a new SHA-256 ops and data bytes to hash that have to get done by the nodes.

I don’t think this CPU DoS risk, if correct, is that much an increase in DoS surface, though a MAX_ANNEX_BUDGET can be a good conservative “belt-and-suspender” DoS limits, like we have a lot in Bitcoin Core.

In the end, this is all policy anyway and not a real protection against tx outside of these boundaries appearing.

Yes if you’re considering the risk faced by multi-party protocol users, policy is a shallow protection for transaction appearing outside those boundaries by being directly relayed to the miners. However, if you’re considering the CPU DoS protection for node operators this is a real protection, especially in a world where you might have a lot of hobbyist nodes running on resources-constraint hosts.

I don’t think this CPU DoS risk, if correct, is that much an increase in DoS surface, though a MAX_ANNEX_BUDGET can be a good conservative “belt-and-suspender” DoS limits, like we have a lot in Bitcoin Core.

If it isn't that much of an increase, is it worth adding extra code for it? More code, more things to go wrong. In the end, they are also paying for this tx in sats and the computation cost only increases linearly with the number of inputs.

If it isn't that much of an increase, is it worth adding extra code for it?

From my experience of DoS discussion on the Core-side, even if we’re able to come with sound evaluation of the worst-case inputs payload one might submit during a time period, the host performance are always an unknown as you have low perf Umbrel and Raspy, and somehow preserving their processing capabilities has always been pursued during Bitcoin Core development for technical decentralization purpose, from my understanding.

There was a lot of dumb DoS vectors in the early days of Bitcoin Core, see https://en.bitcoin.it/wiki/Common_Vulnerabilities_and_Exposures. The codebase has still a reasonable number of conservative belt-and-suspender policy checks. Even if you’re paying for this DoSy malicious input in term of fees, they might inflict more damages than the attacker cost (e.g knocking-out nodes during a fork deployment).

ariard · 2023-06-26T00:55:53Z

src/policy/policy.cpp

+                const auto& annex = SpanPopBack(stack);
+
+                // Check that the annex is standard.
+                if (!IsAnnexStandard(annex)) {


I think it makes sense to have a annexrelay option in src/init.cpp with DEFAULT_ANNEXRELAY_ENABLE=false, therefore to have node operators running nodes on performance-constrained hosts to opt-out from DoS concerns arising from the processing of the annex itself.

Which DoS concerns are you referring to? Since #24007 hasn't landed yet, the tx can only be accepted once?

See the CPU DoS concerned mentioned above about BIP341’s sha_annex, if annexes start to be relayed.

Yes, that's clear now. Thanks.

Same as with MAX_ANNEX_BUDGET, my feeling is that this safeguard isn't necessary because the extra cost is only linear and paid for.

ariard · 2023-06-26T01:02:49Z

src/policy/policy.cpp

@@ -289,6 +332,12 @@ bool IsWitnessStandard(const CTransaction& tx, const CCoinsViewCache& mapInputs)
            }
        }
    }
+
+    // If any inputs commit to an annex, all inputs must commit to an annex.
+    if (annex_input_count > 0 && annex_input_count != tx.vin.size()) {


Should we introduce a limit on the number of inputs supported in a transaction opted-in annex ? To make it easier to reason on worst-case scenario from a second-layer perspective in case of multi-party flow e.g dual-funding. This number of inputs could be tied to the transaction’s nVersion field.

My answer here would be the same as in #27926 (comment). I think you can argue that opt-in is enough protection and multi-party flows with annexes may not be a good idea. Especially if there are no concrete plans for multi-party protocols that require the unique properties of the annex.

Especially if there are no concrete plans for multi-party protocols that require the unique properties of the annex.

If you’re taking the use-case of unstructured data discussed on the mailing list, I think there is still a leak in term of fee-bumping reserves than a user must provision for, in case of the worst annex inflation attacks being done, under known limits of PER_INPUT_MAX_ANNEX_SIZE (and eventual MAX_ANNEX_BUDGET).

Of course, as you’re raising you can always have an adversarial transaction being broadcast directly to the mining pools mempools, though in a world where miners select the block template (i.e more with Stratum V2 deployment), this is a high technical bar for an annex inflation adversary. So the “default” policy limit on the max annex size is far from second-order considerations on the level of fee-bumping reserves than one must maintained to prevent timevalue DoS.

(I don’t deny this is quite a sophisticated discussion to encompass the impact on fee-bumping reserves, still the kind of realistic threat model we’re considering for LN today).

If you’re taking the use-case of unstructured data discussed on the mailing list

Which one do you refer to exactly, the timelocked vaults using presigned txes? For that there is only a single party, so the inflation attack doesn't apply?

For the timelocked vaults, yes there is a single party using presigned txes so it shouldn’t apply. For the other use-case e.g anchoring symbolic data in the chain, I think you might have multiple parties contributing to the transaction, though I’m less knowledgeable about those use-cases to be honest.

If there is the current annex deployment is not targeted for multi-party protocols, yes we don’t have discuss of the impact of annex DoS on fee-bumping reserve.

ariard · 2023-06-26T01:07:50Z

src/policy/policy.cpp

@@ -208,11 +208,45 @@ bool AreInputsStandard(const CTransaction& tx, const CCoinsViewCache& mapInputs)
    return true;
 }

+bool IsAnnexStandard(const std::vector<unsigned char>& annex)


If we can move the annex policy-only code in its own src/policy/annex like we have at the image of src/policy/packages.h for code modularity. I think in the discussion of #1381 there has been question to reuse the annex in the future for Grafroot.

Happy to do it if others agree that the current annex check isn't too minimal for its own file.

joostjager · 2023-06-26T15:36:23Z

Updated feature_taproot.py tests

ariard · 2023-07-04T19:34:00Z

Left a new round of reviews, I think the following conceptual concerns are pending:

padding of the signaling byte to preserve evolvability for future consensus validation of the annex data
adding a MAX_ANNEX_BUDGET or not to mitigate potential CPU DoS concerns for full-nodes, and potential impact for fee-bumping reserves
opted-in of annexrelay for node operators
annex policy limits versioned on the transaction’s nVersion field
"all-or-nothing" annex opted-in mechanism

I still have to reply on the latest mail post about standardizing the annex, which I’m aiming to do so. I think at that stage the proposal would benefit from an “applications" BIP draft that would give one or two use-cases in detail, and therefore serves as technical tie-breaker for the conceptual concerns raised above.

edited to add one concern from the mailing list

joostjager · 2023-07-05T07:53:03Z

I think at that stage the proposal would benefit from an “applications" BIP draft that would give one or two use-cases in detail, and therefore serves as technical tie-breaker for the conceptual concerns raised above.

I agree that use cases can help guide the decision making in this PR. That's why I created the demo/explanation mentioned in the PR description: https://github.com/joostjager/annex-covenants.

ariard · 2023-07-24T01:02:50Z

I agree that use cases can help guide the decision making in this PR. That's why I created the demo/explanation mentioned in the PR description: https://github.com/joostjager/annex-covenants.

I understand how the spend transaction can be re-built from the annex-covenant stateless tool, and on my side I can opine on the interest of using the chain as a reliable backup fro critical information enabling vaulted funds to be rebuilt, only from standard tools.

I still have a concern on the economics of such a proposal as “archive” data like witness might be discarded by full-nodes, so if the scheme become generalized we might see this subset of validation data pruned by the majority of full-nodes, at the very least when it’s older than X. I think the incentives to store historical chain data for regular full-nodes are quite low, and one could argue a scheme could be introduced where this “unstructured data” must be re-paid for by a new spend every Y, therefore generating on-chain fees.

I think “unstructured data” is still compelling to improve self-custody solutions like vaults, especially if we can have a future where all you need to backup is a least 12 words (and here mnemotechnic can be used if physical storage is a concern) and access to the chain data to safe-guard $$$ of satoshis value and standard tools like the PoC you built.

Yet I’m not sure this “reliable backup” advantage for advanced custody solutions (where the locking script is not composed only of standard scripts and keys e.g taproot script-path spend) is well-understood across the ecosystem, cf Dave Harding comment on the mailing list thread “don’t use the block chain to store backup data”.

joostjager · 2023-08-10T14:10:38Z

Clearly data backed up in this way isn't guaranteed forever and forever free. However, for specific users this might still represent the most feasible option available. At the very least chain backup can serve as a safety net in case a conventional backup that is fully under control of the user somehow gets lost or becomes inaccessible.

ariard · 2023-08-12T03:55:21Z

Chain backup servicing as a safety net definitely is interesting for witness and chain of transactions state or scarce collectible, when the value of the coins locked or “digital items” is worth more than a range of block miners fees. At least in term of Bitcoin crypto-economics IMHO, the good equilibrium with full-nodes resources I don’t know.

ariard · 2023-08-12T03:59:10Z

@fanquake If you can reference this PR and this mailing list thread: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-June/thread.html started by the author of this PR in #28130 as I think the conversation is capturing good technical insights in practical ways to improve data carriage and archiving in the Bitcoin ecosystem. I was thinking to do it in #28130 though you locked the PR before.

DrahtBot · 2023-08-17T13:50:19Z

🐙 This pull request conflicts with the target branch and needs rebase.

ChrisMartl · 2023-09-03T10:00:32Z

@joostjager
Could you please provide an analysis statement, which effects or how this proposal could increment the exploit exposure for the Bitcoin system due to the loose flexibility of Bitcoin’s script (predicative processing)?

Bitcoin has this predicate issue (since genesis) and it is necessary to perform a risk analysis about this for every proposed change.

DrahtBot · 2023-12-02T01:32:58Z

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

DrahtBot · 2024-03-01T00:44:58Z

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

maflcko · 2024-03-01T07:14:48Z

Closing for now due to inactivity since August. Leave a comment below, if you are working on this again and want it reopened.

DrahtBot added the TX fees and policy label Jun 21, 2023

Crypt-iQ reviewed Jun 21, 2023

View reviewed changes

src/policy/policy.cpp Outdated Show resolved Hide resolved

joostjager force-pushed the std-unstructured-annex branch 2 times, most recently from f0a2db2 to b4421da Compare June 21, 2023 13:57

DrahtBot added the CI failed label Jun 21, 2023

joostjager marked this pull request as draft June 22, 2023 13:31

ariard reviewed Jun 26, 2023

View reviewed changes

ariard mentioned this pull request Jun 26, 2023

p2p: Advertise NODE_FULL_RBF and connect to 4 outbound full-rbf peers if -mempoolfullrbf is set #25600

Closed

joostjager force-pushed the std-unstructured-annex branch 2 times, most recently from e100018 to ac119a5 Compare June 26, 2023 10:08

make unstructured annex standard

e71f939

joostjager force-pushed the std-unstructured-annex branch from ac119a5 to e71f939 Compare June 26, 2023 15:36

DrahtBot removed the CI failed label Jun 26, 2023

DrahtBot mentioned this pull request Jun 26, 2023

Update MANDATORY_SCRIPT_VERIFY_FLAGS #26291

Merged

This was referenced Aug 15, 2023

wallet: Use CTxDestination in CRecipient instead of just scriptPubKey #28246

Merged

Break up script/standard.{h/cpp} #28244

Merged

DrahtBot added the Needs rebase label Aug 17, 2023

ariard mentioned this pull request Aug 30, 2023

policy: Allow non-standard scripts with -acceptnonstdtxn=1 #28334

Closed

casey mentioned this pull request Nov 13, 2023

Store inscriptions in taproot signature annex ordinals/ord#2405

Open

maflcko closed this Mar 1, 2024

policy: make unstructured annex standard #27926

policy: make unstructured annex standard #27926

Conversation

joostjager commented Jun 21, 2023 • edited

DrahtBot commented Jun 21, 2023 • edited

Reviews

Conflicts

Crypt-iQ commented Jun 21, 2023

joostjager commented Jun 21, 2023

luke-jr commented Jun 24, 2023

joostjager commented Jun 25, 2023

ariard left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager Jun 26, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager commented Jun 26, 2023

ariard commented Jul 4, 2023 • edited

joostjager commented Jul 5, 2023

ariard commented Jul 24, 2023

joostjager commented Aug 10, 2023

ariard commented Aug 12, 2023

ariard commented Aug 12, 2023

DrahtBot commented Aug 17, 2023

ChrisMartl commented Sep 3, 2023

DrahtBot commented Dec 2, 2023

DrahtBot commented Mar 1, 2024

maflcko commented Mar 1, 2024

joostjager commented Jun 21, 2023 •

edited

DrahtBot commented Jun 21, 2023 •

edited

ariard left a comment •

edited

joostjager Jun 26, 2023 •

edited

ariard commented Jul 4, 2023 •

edited