# Signature hash algorithm evolution

## Prerequisite knowledge
- For all notebooks:
    - A high level understanding of the bitcoin. e.g. [Mastering Bitcoin](https://github.com/bitcoinbook/bitcoinbook) by Andreas Antonopoulos UTXO model, in particular [Chapter 6](https://github.com/bitcoinbook/bitcoinbook/blob/develop/ch06.asciidoc).
    - A conceptual understanding of [hash functions](https://www.thesslstore.com/blog/what-is-a-hash-function-in-cryptography-a-beginners-guide).
    - [Hexadecimal notation](https://inst.eecs.berkeley.edu/~cs61bl/r//cur/bits/decimal-binary-hex.html?topic=lab28.topic&step=2&course=) and [endianness](https://www.freecodecamp.org/news/what-is-endianness-big-endian-vs-little-endian/).
- Specific to this notebook:
    - SHA256, HASH256, HASH160 - '[Hash Functions chapter](https://github.com/DariusParvin/bitcoin-tx-tutorial/blob/main/appendix/hash-functions.ipynb)'


## Setup 
**You'll need to edit these next two lines for your local setup.**

In [1]:
path_to_bitcoin_functional_test = "/Users/dariuscognac/bitcoin/test/functional"
path_to_bitcoin_tx_tutorial = "/Users/dariuscognac/Documents/Github/bitcoin-tx-tutorial"

import sys

# Add the functional test framework to our PATH
sys.path.insert(0, path_to_bitcoin_functional_test)
from test_framework.test_shell import TestShell

# Add the bitcoin-tx-tutorial functions to our PATH
sys.path.insert(0, path_to_bitcoin_tx_tutorial)
from functions import *

## Introduction
Bitcoin transactions that require signatures are signed over the signature hash. The sighash is a hash over all the parts of transaction that the signer wants their signature to commit to. For example a signature might commit to one or all of the inputs, and one or all of the outputs. The logic is that you concatenate all parts of the transaction you want to commit to and then hash them using a double sha256 hash. If the value of any of those items is changed (by an additional signer, miner, etc.) your signature will not verify as valid against the selected sighash parameters denoted by the sighash flag.

From legacy, to segwitv0, to taproot, the computation of the sighash has changed along the way. These changes have been designed to make practical applications easier to perform. For example, in segwit v0, the input amount needed to be included in the inputs being signed over. This makes it easier for a signer to be sure what inputs it's signing over, rather than having to check against the blockchain.

Other improvements have improved the efficiency of signing. For example, in taproot transactions many of the intermediate hashes can be reused across different inputs. In this section we'll look in detail about how the sighash computation has evolved.

## Unsigned transaction

To illustrate how the sighash is calculated for the various types of bitcoin transactions, we'll define an example transaction with example values.

To start with, we'll need to define our example UTXO with an outpoint (txid and index) and an amount. 

In [2]:
# UTXO to spend from
txid_to_spend = "dee5f46bf2b13839b927a83e3c19ec9e64488c0792a66f3f8716f3d2fba84acf"
index_to_spend = 0
input_amount_sat = int(2.001 * 100_000_000)

Now we'll set all the fields for our unsigned transaction.

In [3]:
# VERSION
# version '2' indicates that we may use relative timelocks (BIP68)
version = bytes.fromhex("0200 0000")

# INPUTS
# We have just 1 input
input_count = bytes.fromhex("01")

# Convert txid and index to bytes (little endian)
txid = (bytes.fromhex(txid_to_spend))[::-1]
index = index_to_spend.to_bytes(4, byteorder="little", signed=False)

# For the unsigned transaction we use an empty scriptSig
empty_scriptsig = bytes.fromhex("")

# use 0xffffffff unless you are using OP_CHECKSEQUENCEVERIFY, locktime, or rbf
sequence = bytes.fromhex("ffff ffff")

# OUTPUTS
# 0x02 for out two outputs
output_count = bytes.fromhex("02")

# OUTPUT 1 
output1_value_sat = int(float("1.5") * 100000000)
output1_value = output1_value_sat.to_bytes(8, byteorder="little", signed=False)
output1_spk = bytes.fromhex("0014fc7250a211deddc70ee5a2738de5f07817351cef")

# OUTPUT 2
output2_value_sat = int(float("0.5") * 100000000)
output2_value = output2_value_sat.to_bytes(8, byteorder="little", signed=False)
output2_spk = bytes.fromhex("0014531260aa2a199e228c537dfa42c82bea2c7c1f4d")

# LOCKTIME
locktime = bytes.fromhex("0000 0000")

In [4]:
inputs = (
    txid
    + index
    + varint_len(empty_scriptsig)
    + empty_scriptsig
    + sequence
)

outputs = (
    output1_value
    + varint_len(output1_spk)
    + output1_spk
    + output2_value
    + varint_len(output2_spk)
    + output2_spk
)

unsigned_tx = (
    version
    + input_count
    + inputs
    + output_count
    + outputs
    + locktime
)

## Legacy signature hash

As you'll see from the example below, legacy transactions have a very straightforward signature hash algorithm that is based on hashing a slightly altering version of the unsigned transaction. While straightforward, it does has some weaknesses that get addressed in segwit.

In [5]:
# Sender pubkey 
sender_pubkey = bytes.fromhex("4f355bdcb7cc0af728ef3cceb9615d90684bb5b2ca5f859ab0f0b704075871aa")

# scriptPubkey of the UTXO we're spending from
pk_hash = hash160(sender_pubkey)
p2pkh_scriptpubkey = bytes.fromhex("76a914" + pk_hash.hex() + "88ac")

In [6]:
# STEP 1: replace the empty scriptSig with the input scriptPubkey
tx_digest_preimage = (
    # Version
    version
    
    # Inputs
    + input_count
    + txid
    + index
    + varint_len(p2pkh_scriptpubkey) # varint length of input scriptpubkey
    + p2pkh_scriptpubkey             # replace empty_scriptsig with input scriptpubkey
    + sequence
    
    # Outputs
    + output_count
    + outputs
    
    # Locktime
    + locktime
)

# STEP 2: Append the sighash flag to the transaction
sighash_type = bytes.fromhex("0100 0000") # SIGHASH_ALL
tx_digest_preimage += sighash_type

# STEP 3: HASH256
sighash = hash256(tx_digest_preimage)

print("Legacy sighash digest: ", sighash.hex())

Legacy sighash digest:  7afc74798a481b78b0d2dacb11f043e3b0b21311cfe6864a7e4acef56791e3b7


This signature hash algorithm appears very straightforward and intuitive. It is simply the original unsigned transaction, but replacing the scriptSig with the scriptPubKey, and appending the sighash type. There are unfortunately two weaknesses with this algorithm:

1 - For the verification of each signature, the amount of data hashing is proportional to the size of the transaction. As you can see from the example above, the whole transaction is signed over. Let's say the transaction had 100 p2pkh inputs. Not only does it make the transaction become large due to the number of inputs, but since each signature will require the hashing the transaction using the algorithm above, the amount of hashed data grows in O(n<sup>2</sup>).

2 - This is usually not a problem for online network nodes as they could request for the specified transaction to acquire the output value. For an offline transaction signing device ("cold wallet"), however, the unknowing of input amount makes it impossible to calculate the exact amount being spent and the transaction fee. To cope with this problem a cold wallet must also acquire the full transaction being spent, which could be a big obstacle in the implementation of lightweight, air-gapped wallet. By including the input value of part of the transaction digest, a cold wallet may safely sign a transaction by learning the value from an untrusted source. In the case that a wrong value is provided and signed, the signature would be invalid and no funding might be lost.

## Segwit v0

The segwit signature hash algorithm described in [BIP143](https://github.com/bitcoin/bips/blob/master/bip-0143.mediawiki) addresses the two weaknesses with the previous version above.


In [7]:
# For a segwit v0 transaction we'll define a new scriptpubkey to be the input we're spending from.
sender_pubkey = bytes.fromhex("4f355bdcb7cc0af728ef3cceb9615d90684bb5b2ca5f859ab0f0b704075871aa")

# scriptPubkey of the UTXO we're spending from
pk_hash = hash160(sender_pubkey)
p2wpkh_scriptcode = bytes.fromhex("76a914" + pk_hash.hex() + "88ac")

In [8]:
# weakness 2 - The value of the input amount is now included in the sighash. Therefore a cold wallet cannot 
# be 'tricked' into signing off on an input with a different amount than anticipated (which would normally 
# affect the amount being spent on fees).
value = input_amount_sat.to_bytes(8, byteorder="little", signed=False)

# weakness 1 - Hashing over the inputs (prevouts, sequences) and outputs in isolation means these hashes can 
# be reused for constructing the sighash for each input. While not much of a difference for transactions with
# a single input, in the worse case scenario for a trasaction with many inputs, reusing these intermediary 
# hashes greatly reduces the total amount of hashing.
hashPrevOuts = hash256(txid + index)
hashSequence = hash256(sequence)
hashOutputs = hash256(outputs)

# Aside from incorporating the two changes above, the rest of the signature hash algorithm is largely the same.
sighash_type = bytes.fromhex("0100 0000") # SIGHASH_ALL

tx_digest_preimage = (
    version
    + hashPrevOuts # Intermediate hash for efficiency
    + hashSequence # Intermediate hash for efficiency
    + txid
    + index
    + varint_len(p2wpkh_scriptcode)
    + p2wpkh_scriptcode
    + value        # Input value is now included
    + sequence
    + hashOutputs  # Intermediate hash for efficiency
    + locktime
    + sighash_type
)

# Create sigHash to be signed
sighash = hash256(tx_digest_preimage)
print("Segwit v0 sighash digest: ", sighash.hex())

Segwit v0 sighash digest:  da5f24608b19b11542beee4c3787b218f58d7f3686fce49c8c30911badfb2aed


The new algorithm addresses the two weaknesses with the previous one.

1 - Rather than hash over the whole transaction for each signature hash, the `hashPrevOuts`, `hashSequence`, and `hashOutputs` only need to be computed once, and can be reused for other inputs. Thus the overall complexity of the whole hashing process reduces from O(n<sup>2</sup>) to O(n).

2 - The value of the input is now included in the signature hash. This makes it more convenient for offline signers to verify the total amount being spent.

## Taproot (Segwit v1)

Taproot introduced a number of upgrades for bitcoin. Furthermore, there are a number of potential upgrades leveraging schnorr signatures such as [cross-input aggregation](https://bitcointalk.org/index.php?topic=1377298.0) that are not ready for bitcoin in their present form. The signature hash algorithm introduced in taproot includes a bytes reserved for future extensions. 

Although the Segwit v0 signature hash algorithm was an improvement on the legacy version, there are still some weaknesses to be addressed. The taproot sighash algorithm improves upon the segwit v0 algorithm in the following ways:

1 - The values of all the inputs are hashed over. This eliminates the possibility to lie to an offline signing device about the fee of the transaction. In segwit v0, only the value of the input being signed is provided in the sighash.

2 - The offline signing device commits to all the scriptPubkeys. This prevents an offline signer from being lied to about the output being spent, even when the actually executed script is correct. This makes it possible to prove to the offline signer what (unused) execution paths existed. To put it another way, if the scriptPubKey was not included, the offline signer could be signing off on a transaction spending from a taproot output via script path, without knowing the merkle root. In this situation the offline signer wouldn't have enough information to verify what the unspent paths were. 

A secondary benefit of committing to all the scriptPubKeys is that it also helps offline signers be aware of other inputs that belong to the same wallet.

3 - Instead of hash operations using HASH256, they use SHA256, avoiding unnecessary hashing.

In [9]:
# For a taproot transaction we'll define a new private key, x-only pubkey, and scriptPubkey for 
# the input we're spending from.

privkey = bytes.fromhex("1111111111111111111111111111111111111111111111111111111111111111")
pubkey = tr.pubkey_gen(privkey)
taproot_scriptPubkey = bytes.fromhex("5120") + pubkey

The `sighash_epoch` prefix allows reusing the hashTapSighash tagged hash in future signature algorithms that make invasive changes to how hashing is performed

Note that unlike previous transaction types (legacy and segwit v0), the `sha256` is used rather than `hash256`. `hash256` is simply two rounds of `sha256`, and this extra round of hashing is unnecessary as it provides no extra security.

Compared to segwit v0, taproot sighashes don't just include the hash of the previous outpoints, but the hash of the input amounts and scriptPubKeys in `sha_amounts` and `sha_scriptpubkeys`.

`spend_type` indicates whether the transaction is spending using a key-path or script-path.

The sighash includes the index of the input being signed `index_of_this_input`.

The digest is not computed from `hash256` as in previous versions, but instead a TaggedHash with the string `"TapSigHash"`.

In [10]:
# Future versions may use a different sighash epoc for different sighash algorithms
sighash_epoch = bytes.fromhex("00") 

# Sighash algorithm now includes the index of the input being signed over.
# This makes it easier for offline signers 
index_of_this_input = bytes.fromhex("0000 0000")

# Control
sighash_type = bytes.fromhex("00") # SIGHASH_DEFAULT (a new sighash type meaning implied SIGHASH_ALL)

# 3 - Avoid unnecessay hashing by using SHA256 instead of HASH256
sha_prevouts = sha256(txid + index)

# 1 - Taproot transactions sign over all the input amounts (not just the one being signed)
# Note that input_amounts is the concatenation of all input amounts
input_amounts = input_amount_sat.to_bytes(8, byteorder="little", signed=False)

# 3 - Avoid unnecessay hashing by using SHA256 instead of HASH256
sha_amounts = sha256(input_amounts)

# 2 - Taproot transactions sign now over scriptPubKeys
sha_scriptpubkeys = sha256(
    varint_len(taproot_scriptPubkey)
    + taproot_scriptPubkey
)

# 3 - Avoid unnecessay hashing by using SHA256 instead of HASH256
sha_sequences = sha256(sequence)
sha_outputs = sha256(outputs) 

# Spend type indicated key vs script path, and whether an annex is present
spend_type = bytes.fromhex("00") # no annex present

sig_msg = (
    sighash_epoch
    + sighash_type
    + version
    + locktime
    + sha_prevouts
    + sha_amounts
    + sha_scriptpubkeys
    + sha_sequences
    + sha_outputs
    + spend_type
    + index_of_this_input # Now includes the index of the input being signed
)


# Taproot uses tagged hashes for the sighash
tag_hash = sha256("TapSighash".encode())
sighash = sha256(tag_hash + tag_hash + sig_msg)
print("Taproot sighash digest: ", sighash.hex())

Taproot sighash digest:  eba91a23cb1f908e27f2e9a44c28ea127a9ce724c25b97786a0e53fd24a186e6


## Quiz

1. A company stores most of its bitcoin holdings in an air-gapped signer. The company intends to move 10 btc from cold storage into their hot wallet using a transaction with one input and one output. Due to a bug in the wallet software, of using a p2wpk UTXO with 10.001 btc as the input, they create the signed transaction using a 50 btc p2pkh UTXO. If they broadcast this transaction, would it get mined?

2. Suppose the selected input was instead a p2wpk or p2tr UTXO worth 50 btc? Would they get mined?

 ## Answers
1. The transaction would get mined as the input was a p2pkh input and the signature hash doesn't take into account the input amount.

2. For p2wpk or p2tr, the input amount used to generate the signature hash would not match the one being generated by other bitcoin nodes. This would invalidate the signature and prevent the transaction from being mined.

## Exercise
