Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BIP for OP_TXHASH and OP_CHECKTXHASHVERIFY #1500

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

stevenroose
Copy link
Contributor

@stevenroose stevenroose commented Sep 30, 2023

Semantic changes

I thought it might be valuable to keep track of actual semantic changes being made since the initial out-of-draft version.

  • 2023-12-19: Added relative indices for individual mode.

Implementations

bip-txhash.mediawiki Outdated Show resolved Hide resolved
Comment on lines 128 to 130
* If the first byte is exactly 0x00, the Script execution succeeds immediately.
//TODO(stevenroose) is this valuable? it's the only "exception case" that
could potentially be hooked for some future upgrade
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not allow extra bytes at the end to mean OP_SUCCESS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roconnor-blockstream has previously warned about non-trivial OP_SUCCESS semantics. Though the current SUCCESS semantics are "any OP_SUCCESS opcode occurring in the script means SUCCESS", but we could have different semantics that allow any opcode internally to trigger "instant success", but (1) that are very different semantics and will require entirely different code and (2) it becomes way harder to reason about.

IIRC, @sanket1729 also noted that such SUCCESS semantics make reasoning about scripts for things like miniscript way harder.

Actually this BIP seems outdated, I have to push a small update. I decided to propose to make the 0x00 special case mean "ALL" to make this more ergonomic to use as a sighash together with CSFS. ("ALL" isn't valuable as a template check because it contains the prevout scriptPubkey which should contain the hash) Other suggestions welcome.

@stevenroose
Copy link
Contributor Author

Alternatively, but slightly even more complicating the cases, since the first two fields (version, locktime), are not very valuable without anything else (especially since we have OP_CLTV), we could introduce four special-cased bytes to mimick other popular SIGHASH modes: 0x00, 0x01, 0x02 and 0x03. Though locktime might be useful with OP_TX at some point. So I would argue against that. Mimicking "regular" sighashes isn't super useful in the first place because any system that expects to use regular sighashes can use current regular schnorr keys.

@stevenroose
Copy link
Contributor Author

I just pushed an updated version of this BIP. It has a reference implementation that produces test vectors that are tested against an implementation for Bitcoin Core and for rust-bitcoin.

I think it should be ready for review. I have one small last TODO in the specification related to txfs malleability.

Copy link
Member

@luke-jr luke-jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a section on backward compatibility

Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! I've finally found a round tuit, and have performed a more detailed review.

# Summary

## OP_CHECKTXHASHVERIFY

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize it's traditional, but why are we adding new non-Taproot opcodes? Is there a case where this is desirable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, bare OP_CHECKTXHASHVERIFY is really efficient. CTV also adds them. It's 34 bytes output script and 0 bytes witness/scriptsig. As opposed to 34 (spk) + 33 (cb: ver + internal key) + 34 (tapscript) for taproot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot that OP_SUCCESSx was only a taproot thing, not a segwit thing. Yuck!

*  3. `TXFS_INOUT_NUMBER | TXFS_INOUT_SELECTION_ALL`
*  4. `TXFS_INOUT_NUMBER | TXFS_INOUT_SELECTION_ALL`
* the `0x00` byte: it is set equal to `TXFS_SPECIAL_ALL`, which means "ALL" and is primarily
useful to emulate `SIGHASH_ALL` when `OP_TXHASH` is used in combination
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, "would be useful if that were proposed which it isn't". I am skeptical of this magic value.

While I understand Russell O'Connor's dislike of runtime OP_SUCCESS, it is a lesser evil here than this kind of guessing of future utility which will no doubt prove suboptimal when we get there.

And for miniscript: sure, it will only generate and decode a push followed by TXHASH. But there are other things it can't decode too, and that's OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SUCCESS argument has merit, though. Also IMO it's not too much of a pain to pick one of the many SUCCESS opcodes tapscript still has to make a OP_TXHASH2 if really needed. I also don't like that witness input can turn an opcode into a SUCCESS operation for the entire script. This can be tricky when collaboratively constructing scripts.

summary, followed by a reference implementation of the CalculateTxHash function.

* There are two special cases for the TxFieldSelector:
* the empty value, zero bytes long: it is set equal to `TXFS_SPECIAL_TEMPLATE`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You re-use this term TXFS_SPECIAL_TEMPLATE twice for different things, which is confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, one of them is a typo and should be TXFS_SPECIAL_ALL. Fixing.

bip-txhash.md Outdated Show resolved Hide resolved
bip-txhash.md Outdated
Comment on lines 89 to 99
* The last (highest) bit of the first byte (`TXFS_CONTROL`), we will call the
"control bit", and it can be used to control the behavior of the opcode. For
`OP_TXHASH` and `OP_CHECKTXHASHVERIFY`, the control bit is used to determine
whether the TxFieldSelector itself has to be included in the resulting hash.
(For potential other uses of the TxFieldSelector (like a hypothetical
`OP_TX`), this bit can be repurposed.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a footnote, at best, mentioning how this could be expanded for a new OP_TX. But there's no reason to design for it now that I can see, except to leave a clear carve-out for future expansion.

So TXFS_CONTROL is a terrible name. TXFS_FIELD_SELECTOR perhaps?

bip-txhash.md Outdated

For both inputs and then outputs, do the following:

* If the "in/outputs" field is set to 1, another additional byte is expected:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm following this correctly, the (non-special) TxFieldSelector format is, in bytes:

CORE_SELECTOR [INOUT_SELECTOR] [IN_SELECTOR] [OUT_SELECTOR]

If TXFS_INPUTS is set in the CORE_SELECTOR, then INOUT_SELECTOR and IN_SELECTOR are present. If TXFS_OUTPUTS is set in CORE_SELECTOR, then INOUT_SELECTOR and OUT_SELECTOR are present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I'm thinking of changing this as follows:

  • remove TXFS_INPUTS and TXFS_OUTPUTS bits
  • reader will know the entire size of the txfs, so when a second byte is present, look at the bits present in the INOUT_SELECTOR byte to know whether to expect IN_SELECTOR and/or OUT_SELECTOR.
  • this frees up two bits in the CORE_SELECTOR, one of which I'm thinking to repurpose for SPEND_SCRIPT (i.e. scriptCode for segwit v0 inputs and tapscript for v1 inputs, scriptPubkey for non-segwit)

bip-txhash.md Show resolved Hide resolved
bip-txhash.md Outdated
* the leading in/outputs up to 8192
* up to 64 individually selected in/outputs
** using absolute indices up to 16384
** using indices relative to the current input index from -64 to +64.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incredibly complex, and seems to mismatch what I can see covenants being used for in practice. I anticipate fees being high in future, such that people will do a reasonable amount of engineering to minimize their total footprint. In particular, they will want to add fees after commitment, and want to batch transactions using stacking.

The first case implies you want to exclude a specific input and output, to allow for fees, or at least allow binding not to cover the final input/output. The second case implies you want to mul/divide an input number to get the corresponding range of outputs.

The simplest case is a single input and output pair: a-la SIGHASH_SINGLE. This both allows almost arbitrary fee inputs/outputs, and stacking.

But what if you want to bind a pair of inputs to one output? Or a pair of outputs to one input? Both seem reasonably common things to want to do (e.g. opening a dual-funded lightning channel, and closing a channel).

That means you need to be able to select outputs as "current input index / 2" or "current input index * 2 and current input index * 2 + 1". Numbers other than 2 are possible but this is the most likely case (since, in order to stack, all txs must be of same input-output number form, and I consider 1 and 2 by far the most likely numbers here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is true. Initially I didn't have relative indices. I'm still not entirely convinced they are useful. Precisely for the 1-in 2-out case which seems super common to me. I heard "you'd be surprised how easy it is to add an extra input".

My initial thought was that private aggregation (i.e. not through public broadcast media like mempools) would be easily possible as a user can just create/sign a thousand variants of their txs, for each possible input index. This works with absolute indices and doesn't need relative indices. It might even work with public broadcast.

The problem is that doing this with absolute indices only works if everyone in the protocol has the same in-out ratio. (Everyone needs 1-to-2 so you can sign 1in1,2out, 2in3,4out, 3in5,6out, etc). Otherwise you get a quadratic amount of data. With relative indices, you can sign XinX,1out, XinX,2out, XinX,3out,.. and this way the coordinator can put you in any place and put your second output in some arbitraty place and pick your signatures based on where your second output is placed.

Ok this doesn't really require relative indices, but it requires the ability to mix "current" with absolute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves some protocols vulnerable to the partial signature attacks. Say the covenant requires your outputs go half to pubkeyA and half to pubkeyB. Now I have two identical 1BTC covenant UTXO inputs, but re-use the same outputs to satisfy both, and steal the other 1BTC.

The same problem applies to "tell me the outputs in the witness data".

bip-txhash.md Outdated
future addition of byte manipulation opcodes like `OP_CAT`, an additional
cost is specified per TransactionHash execution. Using the same validation
budget ("sigops budget") introduced in BIP-0342, each TransactionHash
decreases the validation budget by 10. If this brings the budget below zero,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs much more justification. Why 10? It has an implied cost of 2 already, since you have to use the opcode and a selector. If it has to hash a lot, hasn't someone already paid that to make such a large transaction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's tricky. In practice, it has a similar amortized per-tx hash cost that sighashes have. It's hard to count those to the budget because they are amortized, it's basically hashing all the large tx fields once so that if they are repeatedly requested their hash can be used.

After the amortized hash cost, it's just a finite series of ~32-byte chunks with maximally 64 in/out which in total can have 8 fields that are each ~32 bytes. This is ~16,384 bytes max.

Then, another consideration is that it would be nice and reasonable if TXHASH+CSFS would not have a higher cost than what naturally would be placed in the witness, the 64-byte signature.

I see it like this: we have a 64-byte budget to divide over TXHASH+CSFS as I think it's reasonable that this combination doesn't cost more than 28% more than a CHECKSIG (which is 50).

So maybe it's right that TXHASH can actually cost more, something like 25 if CSFS would be priced at 35 or 40.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a signature budget, it's a hashing budget. Perhaps we should make this a first-class citizen then?

See https://rusty.ozlabs.org/2023/12/22/script-limits-opcat.html#my-proposal-a-dynamic-limit-for-hashing


# Detailed Specification

A reference implementation in Rust is provided attached as part of this BIP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I would really appreciate a table of all the bits and exactly what and how they encode them. It's particularly nasty because some values are little-endian 32 bit encoded, not CScriptNum encoded, and others are varint encoded?

But it's nice to be explicit in each case, for people like me who are not deep in the weeds of bitcoin's onchain representation, since it helps when considering how to use this alongside things like OP_CAT and extended arithmetic opcodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I agree. I think I tried to encode values the way they are consistently encoded in other contexts like sighashes and p2p. But I will go over them and list them in the BIP as well. It's true that I didn't consider the interactions between regular LE encoding and CScriptNum encoding which is what will be used when math is done in Script for things like values.

* The element on the stack is at least 32 bytes long, fail otherwise.
* The first 32 bytes are interpreted as the TxHash and the remaining suffix bytes specify the TxFieldSelector.
* If the TxFieldSelector is invalid, fail.
* The actual TxHash of the transaction at the current input index, calculated
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should maybe specify that the element is not popped off the stack, or is that implicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it might be worth mentioning yeah, but I thought it was implicit as the other opcode explicitly mentions that it takes the items from the stack. It's kinda characteristic of a -VERIFY opcode to not touch the stack.

bip-txhash.md Outdated
*  4. `TXFS_INOUT_NUMBER | TXFS_INOUT_SELECTION_ALL`
* the `0x00` byte: it is set equal to `TXFS_SPECIAL_ALL`, which means "ALL" and is primarily
useful to emulate `SIGHASH_ALL` when `OP_TXHASH` is used in combination
with `OP_CHECKSIGFROMSTACK`.<br>Special case `TXFS_SPECIAL_TEMPLATE` is 4
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be TXFS_SPECIAL_ALL? Maybe same as Rusty is saying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed I think.

summary, followed by a reference implementation of the CalculateTxHash function.

* There are two special cases for the TxFieldSelector:
* the empty value, zero bytes long: it is set equal to `TXFS_SPECIAL_TEMPLATE`,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"what" is set equal to TXFS_SPECIAL_TEMPLATE? Maybe define what the bytes of the field selector means before the special cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I improved this section. What I mean is "The input txfield selector is set from empty to this one, so whatever that one means".

* &nbsp;3. `TXFS_INOUT_NUMBER | TXFS_INOUT_SELECTION_ALL`
* &nbsp;4. `TXFS_INOUT_NUMBER | TXFS_INOUT_SELECTION_ALL`

* The first byte of the TxFieldSelector has its 8 bits assigned as follows, from lowest to highest:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this section very hard to follow. Would it be an idea to more gently introduce an example field selector to show how it looks like (bit representation)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it might be the case. In the latest version I added some example bit selectors after the written explanation. Can you see if they make sense to you?

@bitcoin bitcoin deleted a comment from mahmoudfranc Feb 28, 2024
bip-txhash.md Outdated
* all in/outputs
* the current input index
* the leading in/outputs up to 8192
* up to 64 individually selected in/outputs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this allow everlasting covenants (coins locked forever in a fixed set of addresses)? It is not clear from the text of BIP if this is possible and an intended use-case. IIUC, it is impossible because of chicken-and-egg problem: output script has to include a hash of itself to make an everlasting covenant which is impossible. But it is better to clarify this in BIP text explicitly, including a formal proof of why an everlasting covenant is impossible or (if it is possible) discuss use-cases and consequences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is not possible AFAIK. But I wouldn't be comfortable making that claim, if you see what people can do. Especially if we get OP_CAT and OP_TWEAKADD.

@Roasbeef
Copy link
Contributor

Assigned BIP 346.

Instead, add a notice about malleability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants