## Setup 

### Requirements
For this exercise we'll need Bitcoin Core. This notebook has been tested with [v24.0.1](https://github.com/bitcoin/bitcoin/releases/tag/v24.0.1).

Below, set the paths for:
1. The bitcoin core functional test framework directory.
2. The directory containing bitcoin-tx-tutorial.

**You'll need to edit these next two lines for your local setup.**

In [1]:
path_to_bitcoin_functional_test = "/Users/dariuscognac/bitcoin/test/functional"
path_to_bitcoin_tx_tutorial = "/Users/dariuscognac/Documents/Github/bitcoin-tx-tutorial"

import sys

# Add the functional test framework to our PATH
sys.path.insert(0, path_to_bitcoin_functional_test)
from test_framework.test_shell import TestShell

# Add the bitcoin-tx-tutorial functions to our PATH
sys.path.insert(0, path_to_bitcoin_tx_tutorial)
from functions import *

import json

# Bitcoin scriptPubKey formats and addresses

Here we will cover the different scriptPubKey formats as well as how they can be encoded and decoded.

The following functions have more explanation in the corresponding notebooks:
- [Hash Functions](https://github.com/DariusParvin/bitcoin-tx-tutorial/blob/main/appendix/hash-functions.ipynb) `HASH245`, `HASH160`.
- [Bitcoin Script](https://github.com/DariusParvin/bitcoin-tx-tutorial/blob/main/appendix/bitcoin-script.ipynb) `pushbytes`

## Introduction
When Alice sends Bob bitcoin, Alice does so by creating a new transaction where one (or more) of the outputs has a scriptPubKey (aka 'locking script') specified by Bob. What makes the output effectively belong to Bob is that only he knows how to create a scriptSig that will unlock the locking script.

If Bob were to send Alice the scriptPubKey as raw bytes, any error in communication could result in Alice sending the bitcoin to the wrong scriptPubKey, making the bitcoin impossible to recover.

To help prevent this problem, there are common address formats for encoding scriptPubKeys. These addresses are designed to be easier to read and contain a checksum to help with error detection.

Bitcoin uses three address types (base58, bech32, bech32m) that cover the standard scriptPubKey formats:
- Base58
    - P2PKH
    - P2SH
    - P2SH-P2WPKH
- Bech32
    - P2WPKH
    - P2WSH
- Bech32m
    - P2TR

### Base58 address prefixes
These address formats not only encode the scriptPubKey for the output, but they also encode a prefix that specifies which network (mainnet/testnet) the output is intended for. Other cryptocurrencies based on bitcoin forks (e.g. litecoin or zcash) will use different prefix values to indicate which cryptocurrency the output is intended for. If a wallet implementation doesn't check that the prefix matches with the type of transaction being created, the wallet user may end up creating a transaction that for a different cryptocurrency than the one the user is intending to use.

Here are some commonly used bitcoin address prefixes:
- Base58
    - Mainnet
        - P2PKH - `0x00`
        - P2SH  - `0x05`
    - Testnet/Regtest
        - P2PKH - `0x00`
        - P2SH  - `0x05`

### Bech32/Bech32m human readable part
Bech32 addresses contain a human readable part (hrp) prepended to the address to indicate the network the address is intended for.

- Bech32/Bech32m
    - Mainnet - 'bc'
    - Testnet - 'tb'
    - Regtest - 'bcrt'

Unlike Base58 addresses, the hrp does not indicate whether the address corresponds to a pubkey hash (P2WPKH) or script hash (P2SH) output. This is bech32 encodes the scriptPubKey 

A full list of bitcoin address prefixes can be found here: https://en.bitcoin.it/wiki/List_of_address_prefixes

## Base58

TODO - The examples below demonstrate using the base58 encoding/decoding functions. It would be nice for completeness to illustrate bech32 encoding in a more verbose way, similar to the rest of the notebooks. 

In [2]:
import base58

def encode_base58_checksum(b: bytes):
    return base58.b58encode(b + hash256(b)[:4]).decode()

def decode_base58(s: str):
    return base58.b58decode(s)

### Creating a base58 P2PKH address from a pubkey
Given the pubkey `02466d7fcae563e5cb09a0d1870bb580344804617879a14949cf22285f1bae3f27`, create a p2pkh address for regtest.

In [3]:
pubkey = bytes.fromhex("02466d7fcae563e5cb09a0d1870bb580344804617879a14949cf22285f1bae3f27")

# Take the hash (hash160) of the pubkey
pk_hash = hash160(pubkey)

# Set the address prefix. For regtest p2pkh we use 0x6f
# a list of prefixes can be found at https://en.bitcoin.it/wiki/List_of_address_prefixes
# In bitcoin core it is defined in chainparams.cpp
# https://github.com/bitcoin/bitcoin/blob/767d825e27b452d6e846280256e5932e906da44d/src/chainparams.cpp#L241
prefix = bytes.fromhex("6f")

# Append the prefix
payload = prefix + pk_hash

# Apply base58 encoding 
p2pkh_address = encode_base58_checksum(payload)

print('Base58 P2PKH address: ', p2pkh_address)

Base58 P2PKH address:  mo6CPsdW8EsnWdmSSCrQ6225VVDtpMBTug


For the rest of the notebooks we'll use the following function to convert pubkeys to base58 p2pkh addresses:

In [4]:
def pk_to_p2pkh(compressed: bytes, network: str):
    '''Creates a p2pkh address from a compressed pubkey'''
    pk_hash = hash160(compressed)
    if network == "regtest" or network == "testnet":
        prefix = bytes.fromhex("6f")
    elif network == "mainnet":
        prefix = bytes.fromhex("00")
    else:
        return "Enter the network: testnet/regtest/mainnet"
    return encode_base58_checksum(prefix + pk_hash)

### Creating a base58 P2SH address from a multisig script

Here we'll create a 2-of-3 multisig script from 3 pubkeys and use that to generate a base58 P2SH address.

Creating a P2SH base58 address is much like a P2PKH address, however we use the _redeemScript_ hash instead of a pubkey hash, and a different prefixes. 

The OP_CODES `02` and `03` are represented by `0x52` and `0x53`. For more on the multisig script, refer to the 'Bitcoin Script' chapter.

In [5]:
pubkey1 = bytes.fromhex("034f355bdcb7cc0af728ef3cceb9615d90684bb5b2ca5f859ab0f0b704075871aa")
pubkey2 = bytes.fromhex("02466d7fcae563e5cb09a0d1870bb580344804617879a14949cf22285f1bae3f27")
pubkey3 = bytes.fromhex("023c72addb4fdf09af94f0c94d7fe92a386a7e70cf8a1d85916386bb2535c7b1b1")

redeemScript = bytes.fromhex(
    "52"            # OP_2
    + "21"          # OP_PUSHBYTES_33 ("21" is the length of a 33 byte (compressed) pubkey in hex notation)
    + pubkey1.hex() # pubkey1
    + "21"          # OP_PUSHBYTES_33
    + pubkey2.hex() # pubkey2
    + "21"          # OP_PUSHBYTES_33
    + pubkey3.hex() # pubkey3
    + "53"          # OP_3
    + "ae"          #OP_CHECKMULTISIG
)

Now that we have our redeemScript, we can convert it to a base58 regtest P2SH address:

In [6]:
# Take the hash (hash160) of the redeemScript
script_hash = hash160(redeemScript)

# Set the address prefix. For regtest p2sh we use 0xc4
prefix = bytes.fromhex("c4")

# Append the prefix
payload = prefix + script_hash

# Apply base58 encoding 
p2sh_address = encode_base58_checksum(payload)

print('Base58 P2SH address: ', p2sh_address)

Base58 P2SH address:  2MuXogRGTh7uADB2wKBqFcsPTprVKnChJe6


For the rest of the notebooks we'll use the following function for converting a P2SH redeemScript to a base58 address:

In [7]:
def script_to_p2sh(redeemScript, network):
    rs_hash = hash160(redeemScript)
    if network == "regtest" or network == "testnet":
        prefix = bytes.fromhex("c4")
    elif network == "mainnet":
        prefix = bytes.fromhex("05")
    else:
        return "Enter the network: tesnet/regtest/mainnet"
    return encode_base58_checksum(prefix + rs_hash)

### Decoding a base58 address
Now let's do the reverse. Given a base58 address, decode it to get the prefix and scriptPubKey.

In [8]:
address = 'mo6CPsdW8EsnWdmSSCrQ6225VVDtpMBTug'
address_decoded = decode_base58(address)

# Check the checksum is valid
decoded = address_decoded[:-4] # everything before the last 4 bytes is the message
checksum = address_decoded[-4:] # last 4 bytes are the checksum

# Check that the first four bytes of the hash are equal to the checksum
print("Is checksum valid: ", hash256(decoded)[:4] == checksum)

print("prefix: ", hex(decoded[0]))

pk_hash = decoded[1:]
print("pubkey hash: ", pk_hash.hex())

Is checksum valid:  True
prefix:  0x6f
pubkey hash:  531260aa2a199e228c537dfa42c82bea2c7c1f4d


#### Pubkey hash to scriptPubKey
- The checksum was valid, so it is safe to assume the data was received and read accurately. 
- The prefix `0x6f` tells us we are creating a scriptPubKey for a P2PKH output on bitcoin regtest.
- The last part of the data therefore encodes the pubkey hash, and we can create a P2PKH script with it.

To turn the pubkey hash it into a P2PKH scriptPubkey we inset it into the standard P2PKH script format:

`OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG`

We can look up the corresponding op codes bytes from https://en.bitcoin.it/wiki/Script.

Note that in front of `<pubKeyHash>` we need to add an opcode for the length of the hash. Since the pubkey hash is taken from hash160, we have a 20 byte hash, which is `0x14` in hex notation.

In [9]:
scriptPubkey = bytes.fromhex("76a914" + pk_hash.hex() + "88ac")
print("scriptPubkey: ", scriptPubkey.hex())

scriptPubkey:  76a914531260aa2a199e228c537dfa42c82bea2c7c1f4d88ac


## Bech32/Bech32m

TODO - The examples below demonstrate using the bech32 encoding/decoding functions. It would be nice for completeness to illustrate bech32 encoding in a more verbose way, similar to the rest of the notebooks. 

In [10]:
import functions.bip_350_bech32_reference as bech32

### Creating a bech32 P2PWKH address from a pubkey
Given the pubkey `02466d7fcae563e5cb09a0d1870bb580344804617879a14949cf22285f1bae3f27`, create a p2wpkh address for regtest.

In [11]:
pubkey = bytes.fromhex("02466d7fcae563e5cb09a0d1870bb580344804617879a14949cf22285f1bae3f27")

# Take the hash (hash160) of the pubkey
pk_hash = hash160(pubkey)

# The human readable part for testnet
prefix = 'tb'

# 0 for Segwit v0. The function below can also be used for encoding v1 (bech32m) addresses
version = 0 

# By providing the version number (0), it knows to use bech32 (rather than bech32m) encoding
p2wpkh_address = bech32.encode(prefix, version, pk_hash)

print('Bech32 P2WPKH address: ', p2wpkh_address)

Bech32 P2WPKH address:  tb1q0n68nma39lfj2swn73hlq4435gc88nkpwvn976


### Creating a bech32 P2WSH address from a redeemScript
Here we'll use the same `redeemScript` as from the P2SH multisig script example: 

In [12]:
# Note that unlike P2SH which uses HASH160, for P2WSH we use SHA256
script_hash = hashlib.sha256(redeemScript).digest()

# The human readable part for mainnet
prefix = 'bc'

# 0 for Segwit v0. The function below can also be used for encoding v1 (bech32m) addresses
version = 0 

# By providing the version number (0), it knows to use bech32 (rather than bech32m) encoding
p2wsh_address = bech32.encode(prefix, version, script_hash)

print('Bech32 P2WSH address: ', p2wsh_address)

Bech32 P2WSH address:  bc1qpy8yjjs2l5neewx722mxve9w6m77zqsu7rldukggseflhwralerqdt85qc


### Creating a bech32m P2TR address from a x-only public key
Bech32m uses an almost identical encoding scheme as bech32, but with an additional constant. The imported `bech32` library is able to encode either address format and does so based on the `version` number. A version number of 0 indicates that the output is Segwit v0 and uses bech32, and a version number 1 indicates that the output is Segwit v1 (aka taproot) and uses bech32m.

Note that taproot introduces a new format of public key called _x-only_ public keys. For more on this see the note on public keys in '[Elliptic Curve Math Review](https://github.com/DariusParvin/bitcoin-tx-tutorial/blob/main/appendix/elliptic_curve_math_review.ipynb)'.

In this example we'll create a P2TR address for the following x-only pubkey `a4af82136997976431f2c76a1179662f04c14f8fdfd24de49a0df51496e733d1`.

In [13]:
# Note that unlike P2SH which uses HASH160, for P2WSH we use SHA256
x_only_pubkey = bytes.fromhex("1059bf26660804ced9a3286a16497d7e70692d14dc04e1220c2dbef3667b74f7")

# The human readable part for regtest
prefix = 'bcrt'
# prefix = 'bc'

# 1 for Segwit v1 (taproot). The function below will create a bech32m address
version = 1
p2tr_address = bech32.encode(prefix, version, x_only_pubkey)

print('Bech32m P2TR address: ', p2tr_address)

Bech32m P2TR address:  bcrt1pzpvm7fnxpqzvakdr9p4pvjta0ecxjtg5mszwzgsv9kl0xenmwnmse95m37


### Decoding a bech32/bech32m address
Given a bech32 address, decode it to get the scriptPubKey. We'll use the same address as the previous P2TR bech32m example: `bcrt1p5jhcyymfj7tkgv0jca4pz7tx9uzvznu0mlfymey6ph63f9h8x0gs7683vc`.

In [14]:
s = bech32.decode('bcrt', 'bcrt1p5jhcyymfj7tkgv0jca4pz7tx9uzvznu0mlfymey6ph63f9h8x0gs7683vc')

# First part of the tuple contains the segwit version
version = s[0]
print('Segwit version: ', version)

# Second part of the tuple contains the data for the scriptPubKey
script_data = bytearray(s[1])
print('Script data: ', script_data.hex())

# To turn this into a scriptPubKey, all we need to do is concatenate the bytes, but with 
# the data prepended by a pushdata operation 
spk = version.to_bytes(1, 'big') + pushbytes(script_data)
print('scriptPubKey: ', spk.hex())

Segwit version:  1
Script data:  a4af82136997976431f2c76a1179662f04c14f8fdfd24de49a0df51496e733d1
scriptPubKey:  0120a4af82136997976431f2c76a1179662f04c14f8fdfd24de49a0df51496e733d1


For the rest of the notebooks, we'll use the following functions to encode or decode bech32/bech32m addresses to and from scriptPubKeys.

In [15]:
# bech32/bech32m
def spk_to_bech32(spk, network):
    '''Creates a bech32 or bech32m address corresponding to a scriptPubkey'''
    version = spk[0] - 0x50 if spk[0] else 0
    program = spk[2:]
    if network == "testnet":
        prefix = 'tb'
    elif network == "regtest":
        prefix = 'bcrt'
    elif network == "mainnet":
        prefix = 'bc'
    else:
        return "Enter the network: testnet/regtest/mainnet"
    return b32.encode(prefix, version, program)

def bech32_to_spk(hrp, address):
    '''Decodes a bech32 or bech32m address to a scriptPubkey'''
    witver, witprog = b32.decode(hrp, address)
    pubkey_hash = bytearray(witprog)
    return (
        witver.to_bytes(1, byteorder="little", signed=False)
        + pushbytes(pubkey_hash)
    )

## Quiz

1. What type of outputs do these address encode? And for what network? 
    - a `tb1q0n68nma39lfj2swn73hlq4435gc88nkpwvn976`
    - b `bcrt1pzpvm7fnxpqzvakdr9p4pvjta0ecxjtg5mszwzgsv9kl0xenmwnmse95m37`
    - c `bc1qpy8yjjs2l5neewx722mxve9w6m77zqsu7rldukggseflhwralerqdt85qc`

2. Are bitcoin addresses stored on the blockchain? If so, where?

In [16]:
## Answers

## Q1.a
# bech32_to_spk('tb', 'tb1q0n68nma39lfj2swn73hlq4435gc88nkpwvn976').hex()
## 00147cf479efb12fd32541d3f46ff056b1a23073cec1
## - testnet P2WPKH

## Q1.b
# bech32_to_spk('bcrt', 'bcrt1pzpvm7fnxpqzvakdr9p4pvjta0ecxjtg5mszwzgsv9kl0xenmwnmse95m37').hex()
## 01201059bf26660804ced9a3286a16497d7e70692d14dc04e1220c2dbef3667b74f7
## - regtest P2TR

## Q1.c
# bech32_to_spk('bc', 'bc1qpy8yjjs2l5neewx722mxve9w6m77zqsu7rldukggseflhwralerqdt85qc').hex()
## 0020090e494a0afd279cb8de52b66664aed6fde1021cf0fede59088653fbb87dfe46
## - mainnet P2WSH

## Q2
## Addresses are not stored on the blockchain. They can be derived from the scriptPubKeys though, which are 
## in the output of a transaction.

## Further reading
- Why are P2WSH addresses are longer than P2SH - [Stack Exchange Answer](https://bitcoin.stackexchange.com/questions/106140/why-are-p2wsh-addresses-larger-than-p2sh-addresses)
- TODO - more further reading, PRs welcome!