# Intro to Bitcoin Transaction Parsing

And we start with a warning: vitalek's pybitcointools and Peter Todd's python-bitcoinlib both install as 'bitcoin'. Beware. We're using Peter Todd's python-bitcoinlib for this.

1. Import a Transaction
2. Deserialize the Tx into metadata, inputs, and outputs
3. Investigate the metadata
4. Inputs and Output parsing will be in the following notebooks.

### 1. Import a Transaction

TODO: RPC call to bitcoind, either block and #, or transaction hash?

If you don't have a bitcoind node and don't want to go find a transaction on a block explorer, here's a hardcoded
transaction to play with:

```0100000002d8c8df6a6fdd2addaf589a83d860f18b44872d13ee6ec3526b2b470d42a96d4d000000008b483045022100b31557e47191936cb14e013fb421b1860b5e4fd5d2bc5ec1938f4ffb1651dc8902202661c2920771fd29dd91cd4100cefb971269836da4914d970d333861819265ba014104c54f8ea9507f31a05ae325616e3024bd9878cb0a5dff780444002d731577be4e2e69c663ff2da922902a4454841aa1754c1b6292ad7d317150308d8cce0ad7abffffffff2ab3fa4f68a512266134085d3260b94d3b6cfd351450cff021c045a69ba120b2000000008b4830450220230110bc99ef311f1f8bda9d0d968bfe5dfa4af171adbef9ef71678d658823bf022100f956d4fcfa0995a578d84e7e913f9bb1cf5b5be1440bcede07bce9cd5b38115d014104c6ec27cffce0823c3fecb162dbd576c88dd7cda0b7b32b0961188a392b488c94ca174d833ee6a9b71c0996620ae71e799fc7c77901db147fa7d97732e49c8226ffffffff02c0175302000000001976a914a3d89c53bb956f08917b44d113c6b2bcbe0c29b788acc01c3d09000000001976a91408338e1d5e26db3fce21b011795b1c3c8a5a5d0788ac00000000```

In [79]:
# Uncomment the following line if you want to use this reference transaction
rawtx = "0100000002d8c8df6a6fdd2addaf589a83d860f18b44872d13ee6ec3526b2b470d42a96d4d000000008b483045022100b31557e47191936cb14e013fb421b1860b5e4fd5d2bc5ec1938f4ffb1651dc8902202661c2920771fd29dd91cd4100cefb971269836da4914d970d333861819265ba014104c54f8ea9507f31a05ae325616e3024bd9878cb0a5dff780444002d731577be4e2e69c663ff2da922902a4454841aa1754c1b6292ad7d317150308d8cce0ad7abffffffff2ab3fa4f68a512266134085d3260b94d3b6cfd351450cff021c045a69ba120b2000000008b4830450220230110bc99ef311f1f8bda9d0d968bfe5dfa4af171adbef9ef71678d658823bf022100f956d4fcfa0995a578d84e7e913f9bb1cf5b5be1440bcede07bce9cd5b38115d014104c6ec27cffce0823c3fecb162dbd576c88dd7cda0b7b32b0961188a392b488c94ca174d833ee6a9b71c0996620ae71e799fc7c77901db147fa7d97732e49c8226ffffffff02c0175302000000001976a914a3d89c53bb956f08917b44d113c6b2bcbe0c29b788acc01c3d09000000001976a91408338e1d5e26db3fce21b011795b1c3c8a5a5d0788ac00000000"

### 2. Deserialize the Tx into Metadata, Inputs, and Outputs

The bitcoin blockchain has a rigid serialization structure. As follows:

https://bitcoin.org/en/developer-reference#raw-transaction-format


* Version: 4 Bytes
* Number of Inputs: CompactSize Bytes
* Serialized Inputs
* Serialized Outputs
* Timestamp

Let's slowly walk through deserializing our raw transaction

In [80]:
version = rawtx[0:8]
rawtx = rawtx[8:]
print("Version Bytes: ",version)

Version Bytes:  01000000


In [81]:
rawtx

'02d8c8df6a6fdd2addaf589a83d860f18b44872d13ee6ec3526b2b470d42a96d4d000000008b483045022100b31557e47191936cb14e013fb421b1860b5e4fd5d2bc5ec1938f4ffb1651dc8902202661c2920771fd29dd91cd4100cefb971269836da4914d970d333861819265ba014104c54f8ea9507f31a05ae325616e3024bd9878cb0a5dff780444002d731577be4e2e69c663ff2da922902a4454841aa1754c1b6292ad7d317150308d8cce0ad7abffffffff2ab3fa4f68a512266134085d3260b94d3b6cfd351450cff021c045a69ba120b2000000008b4830450220230110bc99ef311f1f8bda9d0d968bfe5dfa4af171adbef9ef71678d658823bf022100f956d4fcfa0995a578d84e7e913f9bb1cf5b5be1440bcede07bce9cd5b38115d014104c6ec27cffce0823c3fecb162dbd576c88dd7cda0b7b32b0961188a392b488c94ca174d833ee6a9b71c0996620ae71e799fc7c77901db147fa7d97732e49c8226ffffffff02c0175302000000001976a914a3d89c53bb956f08917b44d113c6b2bcbe0c29b788acc01c3d09000000001976a91408338e1d5e26db3fce21b011795b1c3c8a5a5d0788ac00000000'

The next field is the number of inputs, this is the first instance of a CompactSize integer: https://bitcoin.org/en/developer-reference#compactsize-unsigned-integers . In practice, the number of inputs will always be less than 253 (0xfd), but it's good practice to treat this field as variable-sized.

In [82]:
def extract_compact_sized(raw_hex):
    if raw_hex[0:2] == "ff":
        return raw_hex[0:18]
    if raw_hex[0:2] == "fe":
        return raw_hex[0:10]
    if raw_hex[0:2] == "fd":
        return raw_hex[0:6]
    else:
        return raw_hex[0:2]

In [83]:
number_of_inputs_hex = extract_compact_sized(rawtx)
print("Number of Inputs in Hex: ",number_of_inputs_hex)
number_of_inputs = int(number_of_inputs_hex, 16)
print("Number of Inputs: ", number_of_inputs)
rawtx = rawtx[len(number_of_inputs_hex):]

Number of Inputs in Hex:  02
Number of Inputs:  2


In [84]:
input_reftx_1 = rawtx[0:32*2]
print("Input 1 Reference TxHash: ",input_reftx_1)
rawtx = rawtx[32*2:]

Input 1 Reference TxHash:  d8c8df6a6fdd2addaf589a83d860f18b44872d13ee6ec3526b2b470d42a96d4d


In [85]:
input_output_index_1_hex = rawtx[0:8]
print("Input 1 Output Index (hex): ", input_output_index_1_hex)
input_output_index_1 = int(input_output_index_1_hex,16)
print("Input 1 Output Index: ", input_output_index_1)
rawtx = rawtx[8:]

Input 1 Output Index (hex):  00000000
Input 1 Output Index:  0


In [86]:
input_scriptsig_length_1_hex = extract_compact_sized(rawtx)
print("Input 1 ScriptSig Length (hex): ", input_scriptsig_length_1_hex)
input_scriptsig_length_1 = int(input_scriptsig_length_1_hex,16)
print("Input 1 ScriptSig Length", input_scriptsig_length_1)
rawtx = rawtx[len(input_scriptsig_length_1_hex):]

Input 1 ScriptSig Length (hex):  8b
Input 1 ScriptSig Length 139


In [88]:
input_scriptsig_1 = rawtx[0:input_scriptsig_length_1*2]
print("Input 1 ScriptSig: ",input_scriptsig_1)
rawtx = rawtx[input_scriptsig_length_1*2:]

Input 1 ScriptSig:  483045022100b31557e47191936cb14e013fb421b1860b5e4fd5d2bc5ec1938f4ffb1651dc8902202661c2920771fd29dd91cd4100cefb971269836da4914d970d333861819265ba014104c54f8ea9507f31a05ae325616e3024bd9878cb0a5dff780444002d731577be4e2e69c663ff2da922902a4454841aa1754c1b6292ad7d317150308d8cce0ad7ab


In [92]:
input_sequence_1 = rawtx[0:8]
print("Input 1 Sequence: ", input_sequence_1)
rawtx = rawtx[8:]

Input 1 Sequence:  ffffffff


In [93]:
print("RawTx after extracting first input: ", rawtx)

RawTx after extracting first input:  2ab3fa4f68a512266134085d3260b94d3b6cfd351450cff021c045a69ba120b2000000008b4830450220230110bc99ef311f1f8bda9d0d968bfe5dfa4af171adbef9ef71678d658823bf022100f956d4fcfa0995a578d84e7e913f9bb1cf5b5be1440bcede07bce9cd5b38115d014104c6ec27cffce0823c3fecb162dbd576c88dd7cda0b7b32b0961188a392b488c94ca174d833ee6a9b71c0996620ae71e799fc7c77901db147fa7d97732e49c8226ffffffff02c0175302000000001976a914a3d89c53bb956f08917b44d113c6b2bcbe0c29b788acc01c3d09000000001976a91408338e1d5e26db3fce21b011795b1c3c8a5a5d0788ac00000000
