Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Signature Hash Algorithm Proposal #950

Open
davecgh opened this issue Dec 20, 2017 · 6 comments
Open

Improved Signature Hash Algorithm Proposal #950

davecgh opened this issue Dec 20, 2017 · 6 comments
Labels
consensus discussion Discussion and details for potential consensus changes.

Comments

@davecgh
Copy link
Member

davecgh commented Dec 20, 2017

I'm creating this issue to open community discussion and request feedback on an improved signature hash algorithm that I would like to propose. Obviously, a full blown DCP with a lot more details such as enumerating the affected opcodes, specifying the exact byte sizes of each field, deployment specifics, test vectors, etc will be required before any type of vote on this could happen. The intention here is to focus solely on the algorithm semantics as preliminary work toward that end.

For some background, the current signature hash algorithm works correctly by faithfully generating a hash of a portion of transactions according to the chosen signature hash type and verifying signatures against the hash with a given public key. However, unfortunately, the SigHashAllValue signature hash type that is intended to allow the signature to commit to the input amount for the transaction input being signed is not usable currently due to a consensus rule which enforces strict signature encoding disallowing the hash type.

This SigHashAllValue signing mode is highly useful for offline transaction signing devices such as hardware wallets or lightweight air-gapped cold wallets, as it allows them to avoid requesting all of the referenced outputs in order to obtain the associated input amounts as they can instead safely use and commit to the input amount provided by the ValueIn field of the transaction itself as any invalid amounts would result in an invalid signature that would later fail to verify.

That said, it would be even more useful if the signature committed to all input amounts since then it would be possible for the aforementioned devices to safely use the total amount being spent by the transaction for providing better user error messages and calculating transaction fees.

So, while it would be possible to change the consensus rule in question to include SigHashAllValue, it would obviously require a consensus vote since it constitutes a change to the consensus rules. Given that a vote is required to make any changes in regards to this anyways, I propose that we should completely change the algorithm to not only address this issue, but also to address other shortcomings in regards to efficiency and complexity of the current algorithm in addition to committing to all input amounts per the aforementioned description.


The new signature hash algorithm should satisfy the following goals and requirements:

  • Reuse as much data between signature hash types and inputs as possible
  • Avoid the need to mutate an existing transaction to produce the required commitment serializations
  • Avoid including unnecessary information that is already implicitly committed to such as the number of inputs and outputs
  • Provide full support for SigHashAll, SigHashSingle, SigHashNone, and SigHashAnyOneCanPay
  • Unconditionally commit to all transaction fields that do not involve inputs and outputs (version, lock time, expiry)
  • Commit to all inputs and input amounts when SigHashAnyOneCanPay is NOT set (previous outpoint, sequence, input amount)
  • Unconditionally commit to the input being signed along with its input amount and input public key script (previous outpoint, sequence, amount, script version, script)
  • Commit to the transaction outputs depending the signature hash type:
    • Commit to all outputs when neither SigHashSingle or SigHashNone (SigHashAll as default)
    • Commit to none of the outputs for SigHashNone
    • Generate an error for SigHashSingle if the input index is greater than or equal to the number of outputs
    • Commit to the output at the same index for the input being signed for SigHashSingle
  • Allow the input-output pair to be repositioned within the transaction for the combination of SigHashSingle and SigHashAnyOneCanPay, so long as they are both at the same index

Proposal

My proposal for an algorithm that satisfies all of these goals is as follows:

  • Retain the current existing SigHashAll, SigHashNone, SigHashSingle, and SigHashAnyOneCanPay definitions and remove SigHashAllValue altogether in favor of always committing to the input values.
  • Calculate the signature hash as follows (individual fields are further defined below):
signature_hash = BLAKE256(signature_hash_type || tx_meta_commitment_fields || txins_midstate_hash || input_commitment_fields || txouts_midstate_hash)
  • Reuse txins_midstate_hash for other inputs being signed when SigHashAnyOneCanPay is NOT set, or replace it with a zero hash (32-bytes of zeros) in the case it is set
  • Reuse txouts_midstate_hash for other inputs being signed for SigHashAll, replace it with the hash of the relevant single output for SigHashSingle, or replace it with a zero hash (32-bytes of zeros) for SigHashNone

signature_hash_type

The signature hash type ensures the signature commits to the specific signature hashing mode used to produce the signature in order to prevent potential vulnerabilities that could otherwise arise through clever construction of transactions and combinations of signature hash types.

tx_meta_commitment_fields

The transaction meta commitment fields ensure the signature commits to the transaction fields which are not part of the inputs and outputs. Specifically, these fields consist of the transaction version, lock time, and expiration.

tx_meta_commitment_fields = tx_version || lock_time || expiration

txins_midstate_hash

The transaction inputs midstate hash is the BLAKE256 of all of the transaction input's previous outpoints, input sequence numbers, and input amounts. This allows the signature to selectively commit to all transaction inputs without having to recalculate the information for every input being signed.

txin[n] = prevout_hash || prevout_index || prevout_tree || sequence || input_amount
txins_midstate_hash = BLAKE256(txin[0] || txin[1] || txin[2] || ... || txin[n])

input_commitment_fields

The input commitment fields ensure the signature always commits to the specific input being signed as well as its input script and version. Specifically, these fields consist of the input's previous outpoints, input sequence number, input amount, input public key script version, and input public key script.

Note that some of these fields are potentially redundant with data already committed to via txins_midstate_hash when SigHashAnyOneCanPay is NOT set, however, it is important to commit to it again in order to ensure the input index is implicitly committed to in that case as well, since otherwise the logic would need to be more complex in order to selectively exclude fields and commit to the input index. That extra complexity does not seem like a good tradeoff for a potential savings of a relatively small number of constant bytes which the hashing function would very likely have to end up padding out due to its internal block size anyways.

input_commitment_fields = prevout_hash || prevout_index || prevout_tree || sequence || input_amount || script_version || prevout_script

txouts_midstate_hash

The transaction outputs midstate hash is the BLAKE256 of all of the transaction outputs' amounts, script versions, and scripts. This allows the signature to selectively commit to all transaction outputs without having to recalculate the information for every input being signed.

txout[n] = output_amount || script_version || public_key_script
txouts_midstate_hash = BLAKE256(txout[0] || txout[1] || txout[2] | | ... || txout[n])
@davecgh
Copy link
Member Author

davecgh commented Jan 10, 2018

As an update to this, I've written some quick and dirty and completely unoptimized prototype code that calculates the signature hash using the proposal I laid out in the description and benchmarked it against a couple of transactions in order to compare. One of the transactions is a very typically encountered form and the other is a much larger and less frequent form. The code can surely be further improved and is not ready to be shared yet, but I did want to share some preliminary results thus far since they are extremely promising:

Typical Transaction
-------------------
BenchmarkCalcSigHash        200   5978843 ns/op   4106825 B/op   17978 allocs/op
BenchmarkCalcSigHashNew   10000    142288 ns/op     56064 B/op    1524 allocs/op

Large Transaction
-----------------
BenchmarkCalcSigHash         2   733841600 ns/op   351540152 B/op   4027003 allocs/op
BenchmarkCalcSigHashNew   1000     2270216 ns/op     1028845 B/op     21776 allocs/op

So, in the typical transaction case, this unoptimized code using the proposed algorithm is roughly 4,100% faster, and in the larger and less frequent transaction case, it is roughly 32,225% faster.

@davecgh davecgh added the non-forking consensus Changes that involve modifying consensus code without causing any forking changes. label Jan 14, 2018
@StabbarN
Copy link

4,100% and 32,225% faster are a lot, great improvement! Is the existing code slow or is the prototype amazingly fast? I guess it's hard to compare to other coins' speed but it'd be even more interesting to see a benchmark with them, at least in terms of marketing Decred.

@davecgh
Copy link
Member Author

davecgh commented Jan 14, 2018

@magnus-staberg The change to the algorithm to drastically reduce the algorithmic complexity is where the majority of the speedup comes from. It's a change from quadratic time to linear time.

@matheusd
Copy link
Member

Possibly a dumb question, but I'm going to ask anyway. What about changing the signature hash calculation to:

signature_hash = BLAKE256(tx_meta_commitment_fields || txouts_midstate_hash || txins_midstate_hash ||  signature_hash_type || input_commitment_fields )

That way, using a "streaming" hash calculator (ie, the raw github.com/dchest/blake256) we could store the internal hash state between inputs (the initial BLAKE256(meta || outs_midstate || ins_midstate || type)) for the most common type (SigHashAll) with progressively less common hash types reusing less of the state.

Do you think there's not enough of a performance gain to justify the additional logic/memory required to keep track of the internal hash state at various stages or totally breaking the bitcoin standard (by moving the hash_type further down the list of hashed items) might be problematic or introduce a security issue?

@davecgh
Copy link
Member Author

davecgh commented Jun 25, 2018

It seems like a reasonable change worthy of consideration. The proposed algorithm is already a break from what Bitcoin does, so that's not a concern here. Also, I don't see any security issues the suggested permutation would introduce on the surface, though I'd certainly like to think through all of the possibilities to be sure.

We'd have to benchmark it to see the performance difference. I wouldn't expect there to be very much difference since we're only talking about a constant number of N bytes where N is very small (something like 77), but nevertheless there would likely be some gain since it should theoretically be possible to avoid 1 block of hashing. The real gain with the original proposal is moving from quadratic time to linear time though, so any potential gains here would definitely pale in comparison regardless.

That said, you can't actually store the intermediate internal state with the linked implementation because it's all done through an interface and the internal state is stored in an unexported struct, so you can't even type cast it to get at the internal state to clone it. It would, of course, be possible to expose it via a fork of the package which provides that functionality (I did something very similar for sha256 a long time ago in another context). I'm not sure going through the hassle of maintaining our own fork for just this purpose is worth the complexity here because where only talking about small value of N as previously mentioned, however, that doesn't necessarily mean the suggestion shouldn't be done as it would potentially allow other implementations to take advantage of it if they already have libs which support cloning the internal state at arbitrary points in the byte stream.

@matheusd
Copy link
Member

Agreed on possibly very low gains compared to the major switch to linear time.

Thanks for the detailed discussion!

@davecgh davecgh added consensus discussion Discussion and details for potential consensus changes. and removed non-forking consensus Changes that involve modifying consensus code without causing any forking changes. labels Dec 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus discussion Discussion and details for potential consensus changes.
Projects
None yet
Development

No branches or pull requests

3 participants