Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

policy: make unstructured annex standard #27926

Closed
wants to merge 1 commit into from

Conversation

joostjager
Copy link

@joostjager joostjager commented Jun 21, 2023

This PR opens up the annex in a limited way:

  • If any one of the inputs commits to an annex, all inputs must. This is a mechanism to opt-in to annex use, preventing participants in a multi-party transaction from adding an unexpected annex which would increase the tx weight and lower the fee rate. See https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-June/021736.html

  • Allow empty annexes, to let users opt-in to annex usage with the minimum number of bytes.

  • Additionally, only allow annexes that start with 0x00. This keeps open the possibility to support structured annexes such as for example the tlv format proposed in BIPXXX: Taproot Annex Format  bips#1381. A potential future structured format would start with a non-zero byte.

  • Limit the maximum size of unstructured annex data to 256 bytes. Including the annex tag 0x50 and the unstructured data tag 0x00, this makes for a maximum total witness element size of 258 bytes. This upper limit somewhat protects participants in a multi-party transaction that uses the annex against annex inflation.

    The constant 256 is chosen so that it provides enough space to accommodate several schnorr signatures. This is useful for implementing annex covenants with multiple presigned transactions. For a PoC of a single transaction annex covenant, see https://github.com/joostjager/annex-covenants

Todo:

  • fix/add tests

Related:

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 21, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #28246 (wallet: Use CTxDestination in CRecipient instead of just scriptPubKey by achow101)
  • #28244 (Break up script/standard.{h/cpp} by achow101)
  • #26291 (Update MANDATORY_SCRIPT_VERIFY_FLAGS by ajtowns)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

src/policy/policy.cpp Outdated Show resolved Hide resolved
@joostjager joostjager force-pushed the std-unstructured-annex branch 2 times, most recently from f0a2db2 to b4421da Compare June 21, 2023 13:57
@Crypt-iQ
Copy link
Contributor

Think this is failing CI because of an annex nonstandard check

def spenders_taproot_nonstandard():

@joostjager
Copy link
Author

@Crypt-iQ thanks for pointing that out. I need to get used to this repository still. Will definitely fix tests, but looking for concept ack first.

@joostjager joostjager marked this pull request as draft June 22, 2023 13:31
@luke-jr
Copy link
Member

luke-jr commented Jun 24, 2023

Use case?

@joostjager
Copy link
Author

@luke-jr annex covenants, see link in PR description

Copy link
Member

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC from #19645 and beyond, witnesses could be concatenate on the flight by transaction-relaying node, therefore ensuring the best feerate transaction propagates. As it’s opening it’s own wormhole of DoS issues, better to pursue the conversation on the mailing list (like doing or not sounds to impact L2/transaction-relay propagation and hardware signers, I believe).


// Calculate the size of the unstructured data by subtracting
// one for the 0x00 signaling byte.
size_t unstructured_annex_size = annex_size - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we might prefer to not discount the 0x00 signaling byte from the 265 byte anti-annex-inflation-attacks limit, otherwise it sounds to me we’re introducing a coupling effect between the max witness size one can sign for and the tag signaling byte format itself. Such tag signaling byte format might acquire consensus semantic in the future, and I think we would prefer to reserve the evolvability in function of consensus changes here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that it might be good to choose a number of actually useable bytes that is a multiple of a commonly used data structure sizes such as 32 bytes. Therefore 256. It is an arbitrary number though that it supposed to strike a balance between the limiting the potency of annex inflation attacks and having enough space to make the annex useful.

Are you proposing to limit the annex size including any tags to 257 bytes? I see the conceptual difference, but there wouldn't be a functional difference for now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the constant defined as suggested below, this suggestion does seem to make reading easier. Changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you proposing to limit the annex size including any tags to 257 bytes? I see the conceptual difference, but there wouldn't be a functional difference for now?

Yes, for now it doesn’t make a functional difference. The issue I have in mind is the following, assuming the unstructured annex format is deployed and we see applications building on top of it, the 256 (or 257 now) annex size limit will have to become a max-size data payload, which is going to leak in the all application toolchains (in the same sense than TCP’s 64k bytes max frame is often used as the limit for application transport design).

As of today, the signaling byte size is of 1 byte, however if in the future we have a consensus definition of the tag being more than 1 byte, we might have to swallow the 255 bytes reserved for the application data-payload. Not swallowing on the 255 bytes would mean the consensus-aware application will have to pay the witness cost for the 0x00 signaling byte and this byte will have to be discarded by parsing logic.

So if those observations are correct, I think the constant could have some safety margin padding for changes of the “structured/unstructured guard” signaling byte size. One can think 256 byte reserved for application-data + 32 byte of padding, from which this 1st byte of padding will be the current 0x00 one ? And I think that way we can guarantee to the application a multiple of a 32-byte based data structure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I fully get what you mean. Can you give a concrete example of a future extended format that might interfere with the 0x00 + unstructured proposal? My thought was to simply reserve all non-0x00 first bytes for future use.

When adding 32 bytes of padding, do you mean that initially (in a world with only unstructured annex data), 31 of those bytes are wasted but paid for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give a concrete example of a future extended format that might interfere with the 0x00 + unstructured proposal?

Sure any future serialization format where the semantic “tag” is encoded on more than 8-bit to allow more than 256 types of payload without having to picked up which are the “most-used” 256 one when the consensus change. This is actually the serialization format on which the TLV proposal of the annex (with some other discussed tweaks for the length), see comment above VarIntMode in src/serialize.h.

From my understanding, if we’re following your approach where non-0x00 first bytes are reserved for future use, and let’s say 0x01 is for TLV semantic of the annex field, it means all the annex payload with consensus enforcement will have to pay the feerate cost of the 1 byte marker (either 0x00 or 0x01).

So I’m thinking the advantage of multi-bytes padding we reserve a quantitatively significant number of bytes for the eventual adoption of a multi-bytes serialization format. This comes at the burden of unstructured data having to pay for the cost of the reserve byte padding toda while making tomorrow potential consensus enforced annex payload cheaper, I think.

Of course 32-byte might be considered to high a reserve byte padding to make the usage of the annex attractive for unstructured data applications. Yet the question we might have to answer is how much consensus “tag” we might need to preserve for now, let’s say in the future we have an automated consensus mechanism where people can “burn” satoshis / coins to assign an identifier to an annex tag (weird use-cases that has been proposed in the early days of Bitcoin for namespaces experiment iirc). I don’t know the answer to this “how much tag" question.

// Unstructured annexes are nonstandard if they are larger than
// 256 bytes. This limits the potential of annex inflation
// attacks.
if (unstructured_annex_size > 256) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can constify the value, e.g PER_INPUT_MAX_ANNEX_SIZE, generally this is unclear if we would like a per-input annex limit and a max transaction all annexes limits, e.g MAX_ANNEX_BUDGET.

Splitting the limit enables flow where inputs can be combined in a non-deterministic order, e.g BIP174’s combiner, and reject in function of “witness weight slots” that can be distributed by such combiner, which makes sense to coordinate flows between hardware signers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added MAX_PER_INPUT_ANNEX_SIZE.

Not sure if a per tx maximum is necessary. I'd say that opting in to annex usage has some risks to be aware of. Maybe you don't want to use it for multi-party protocols at all because of that. To me, MAX_PER_INPUT_ANNEX_SIZE is already debatable when you have opt-in, and MAX_ANNEX_BUDGET feels even more like an unnecessary precaution. In the end, this is all policy anyway and not a real protection against tx outside of these boundaries appearing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you don't want to use it for multi-party protocols at all because of that.

I think that’s a good question, there is an annex inflation risk concerning the multi-party protocol users and there is a new CPU DoS risk that is faced by node operators due to the increased in annex data size. I think this CPU DoS risk is coming from the fact that with BIP341, there is a sha_annex in the transaction digest if the annex is present and therefore you have a new SHA-256 ops and data bytes to hash that have to get done by the nodes.

I don’t think this CPU DoS risk, if correct, is that much an increase in DoS surface, though a MAX_ANNEX_BUDGET can be a good conservative “belt-and-suspender” DoS limits, like we have a lot in Bitcoin Core.

In the end, this is all policy anyway and not a real protection against tx outside of these boundaries appearing.

Yes if you’re considering the risk faced by multi-party protocol users, policy is a shallow protection for transaction appearing outside those boundaries by being directly relayed to the miners. However, if you’re considering the CPU DoS protection for node operators this is a real protection, especially in a world where you might have a lot of hobbyist nodes running on resources-constraint hosts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this CPU DoS risk, if correct, is that much an increase in DoS surface, though a MAX_ANNEX_BUDGET can be a good conservative “belt-and-suspender” DoS limits, like we have a lot in Bitcoin Core.

If it isn't that much of an increase, is it worth adding extra code for it? More code, more things to go wrong. In the end, they are also paying for this tx in sats and the computation cost only increases linearly with the number of inputs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it isn't that much of an increase, is it worth adding extra code for it?

From my experience of DoS discussion on the Core-side, even if we’re able to come with sound evaluation of the worst-case inputs payload one might submit during a time period, the host performance are always an unknown as you have low perf Umbrel and Raspy, and somehow preserving their processing capabilities has always been pursued during Bitcoin Core development for technical decentralization purpose, from my understanding.

There was a lot of dumb DoS vectors in the early days of Bitcoin Core, see https://en.bitcoin.it/wiki/Common_Vulnerabilities_and_Exposures. The codebase has still a reasonable number of conservative belt-and-suspender policy checks. Even if you’re paying for this DoSy malicious input in term of fees, they might inflict more damages than the attacker cost (e.g knocking-out nodes during a fork deployment).

const auto& annex = SpanPopBack(stack);

// Check that the annex is standard.
if (!IsAnnexStandard(annex)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have a annexrelay option in src/init.cpp with DEFAULT_ANNEXRELAY_ENABLE=false, therefore to have node operators running nodes on performance-constrained hosts to opt-out from DoS concerns arising from the processing of the annex itself.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which DoS concerns are you referring to? Since #24007 hasn't landed yet, the tx can only be accepted once?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the CPU DoS concerned mentioned above about BIP341’s sha_annex, if annexes start to be relayed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's clear now. Thanks.

Same as with MAX_ANNEX_BUDGET, my feeling is that this safeguard isn't necessary because the extra cost is only linear and paid for.

@@ -289,6 +332,12 @@ bool IsWitnessStandard(const CTransaction& tx, const CCoinsViewCache& mapInputs)
}
}
}

// If any inputs commit to an annex, all inputs must commit to an annex.
if (annex_input_count > 0 && annex_input_count != tx.vin.size()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we introduce a limit on the number of inputs supported in a transaction opted-in annex ? To make it easier to reason on worst-case scenario from a second-layer perspective in case of multi-party flow e.g dual-funding. This number of inputs could be tied to the transaction’s nVersion field.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My answer here would be the same as in #27926 (comment). I think you can argue that opt-in is enough protection and multi-party flows with annexes may not be a good idea. Especially if there are no concrete plans for multi-party protocols that require the unique properties of the annex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially if there are no concrete plans for multi-party protocols that require the unique properties of the annex.

If you’re taking the use-case of unstructured data discussed on the mailing list, I think there is still a leak in term of fee-bumping reserves than a user must provision for, in case of the worst annex inflation attacks being done, under known limits of PER_INPUT_MAX_ANNEX_SIZE (and eventual MAX_ANNEX_BUDGET).

Of course, as you’re raising you can always have an adversarial transaction being broadcast directly to the mining pools mempools, though in a world where miners select the block template (i.e more with Stratum V2 deployment), this is a high technical bar for an annex inflation adversary. So the “default” policy limit on the max annex size is far from second-order considerations on the level of fee-bumping reserves than one must maintained to prevent timevalue DoS.

(I don’t deny this is quite a sophisticated discussion to encompass the impact on fee-bumping reserves, still the kind of realistic threat model we’re considering for LN today).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you’re taking the use-case of unstructured data discussed on the mailing list

Which one do you refer to exactly, the timelocked vaults using presigned txes? For that there is only a single party, so the inflation attack doesn't apply?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the timelocked vaults, yes there is a single party using presigned txes so it shouldn’t apply. For the other use-case e.g anchoring symbolic data in the chain, I think you might have multiple parties contributing to the transaction, though I’m less knowledgeable about those use-cases to be honest.

If there is the current annex deployment is not targeted for multi-party protocols, yes we don’t have discuss of the impact of annex DoS on fee-bumping reserve.

@@ -208,11 +208,45 @@ bool AreInputsStandard(const CTransaction& tx, const CCoinsViewCache& mapInputs)
return true;
}

bool IsAnnexStandard(const std::vector<unsigned char>& annex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can move the annex policy-only code in its own src/policy/annex like we have at the image of src/policy/packages.h for code modularity. I think in the discussion of #1381 there has been question to reuse the annex in the future for Grafroot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do it if others agree that the current annex check isn't too minimal for its own file.

@joostjager
Copy link
Author

Updated feature_taproot.py tests

@ariard
Copy link
Member

ariard commented Jul 4, 2023

Left a new round of reviews, I think the following conceptual concerns are pending:

  • padding of the signaling byte to preserve evolvability for future consensus validation of the annex data
  • adding a MAX_ANNEX_BUDGET or not to mitigate potential CPU DoS concerns for full-nodes, and potential impact for fee-bumping reserves
  • opted-in of annexrelay for node operators
  • annex policy limits versioned on the transaction’s nVersion field
  • "all-or-nothing" annex opted-in mechanism

I still have to reply on the latest mail post about standardizing the annex, which I’m aiming to do so. I think at that stage the proposal would benefit from an “applications" BIP draft that would give one or two use-cases in detail, and therefore serves as technical tie-breaker for the conceptual concerns raised above.

edited to add one concern from the mailing list

@joostjager
Copy link
Author

I think at that stage the proposal would benefit from an “applications" BIP draft that would give one or two use-cases in detail, and therefore serves as technical tie-breaker for the conceptual concerns raised above.

I agree that use cases can help guide the decision making in this PR. That's why I created the demo/explanation mentioned in the PR description: https://github.com/joostjager/annex-covenants.

@ariard
Copy link
Member

ariard commented Jul 24, 2023

I agree that use cases can help guide the decision making in this PR. That's why I created the demo/explanation mentioned in the PR description: https://github.com/joostjager/annex-covenants.

I understand how the spend transaction can be re-built from the annex-covenant stateless tool, and on my side I can opine on the interest of using the chain as a reliable backup fro critical information enabling vaulted funds to be rebuilt, only from standard tools.

I still have a concern on the economics of such a proposal as “archive” data like witness might be discarded by full-nodes, so if the scheme become generalized we might see this subset of validation data pruned by the majority of full-nodes, at the very least when it’s older than X. I think the incentives to store historical chain data for regular full-nodes are quite low, and one could argue a scheme could be introduced where this “unstructured data” must be re-paid for by a new spend every Y, therefore generating on-chain fees.

I think “unstructured data” is still compelling to improve self-custody solutions like vaults, especially if we can have a future where all you need to backup is a least 12 words (and here mnemotechnic can be used if physical storage is a concern) and access to the chain data to safe-guard $$$ of satoshis value and standard tools like the PoC you built.

Yet I’m not sure this “reliable backup” advantage for advanced custody solutions (where the locking script is not composed only of standard scripts and keys e.g taproot script-path spend) is well-understood across the ecosystem, cf Dave Harding comment on the mailing list thread “don’t use the block chain to store backup data”.

@joostjager
Copy link
Author

Clearly data backed up in this way isn't guaranteed forever and forever free. However, for specific users this might still represent the most feasible option available. At the very least chain backup can serve as a safety net in case a conventional backup that is fully under control of the user somehow gets lost or becomes inaccessible.

@ariard
Copy link
Member

ariard commented Aug 12, 2023

Chain backup servicing as a safety net definitely is interesting for witness and chain of transactions state or scarce collectible, when the value of the coins locked or “digital items” is worth more than a range of block miners fees. At least in term of Bitcoin crypto-economics IMHO, the good equilibrium with full-nodes resources I don’t know.

@ariard
Copy link
Member

ariard commented Aug 12, 2023

@fanquake If you can reference this PR and this mailing list thread: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-June/thread.html started by the author of this PR in #28130 as I think the conversation is capturing good technical insights in practical ways to improve data carriage and archiving in the Bitcoin ecosystem. I was thinking to do it in #28130 though you locked the PR before.

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

@ChrisMartl
Copy link

@joostjager
Could you please provide an analysis statement, which effects or how this proposal could increment the exploit exposure for the Bitcoin system due to the loose flexibility of Bitcoin’s script (predicative processing)?

Bitcoin has this predicate issue (since genesis) and it is necessary to perform a risk analysis about this for every proposed change.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 2, 2023

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 1, 2024

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@maflcko
Copy link
Member

maflcko commented Mar 1, 2024

Closing for now due to inactivity since August. Leave a comment below, if you are working on this again and want it reopened.

@maflcko maflcko closed this Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants