### Merkle Trees in Python
This Python notebook serves as a reference in understanding how Merkle Trees work, their functionalities, and simulates how the Merkle tree data structure works typically for blockchains.



## What Are Merkle Trees?
Merkle trees are hash-based binary tree data structures used to efficienctly verify the integrity and consistency of large datasets. Named after Ralph Merkle, they are used in blockchain systems like Bitcoin and Ethereum to summarize and secure data, such as transactions in a block.

1. Key Properties 
- Tamper-Evidence: Merkle trees ensure data integrity by detecting tampering. A change in a data item (e.g., a transaction) alters its leaf hash, causing changes to propagate up and eventually change the Merkle root. A mismatch between the computed and expected root signals that tampering has occurred.

- Efficiency: Merkle trees allow quick verification of data within a large dataset by only requiring minimal information. This is done by utilising Merkle proofs to prove that a specific data block is part of a larger dataset, without needing to send the entire dataset, requiring only O(log n) hashes to prove a data item's inclusion. 

- Scalability: By summarizing thousands of data items into a single Merkle root, Merkle trees enable compact storage (e.g., in blockchain headers) and efficient processing, making it ideal for large datasets and distributed systems.

2. Structure
At its core, a merkle tree is a balanced binary tree that utilises cryptographic hashes. Similar to a binary tree a merkle tree has:
- Leaf Nodes which store cryptographic hashes of data items (typically, SHA-256 is used for hashing). Each node has at most 2 children.

- Parent / Branch Nodes that store hashes of their two children's hashes, of which the hashes are computed recursively up to a single Merkle Root (i.e. Root Node). 

- A Balanced Hierarchical Structure where the tree has a height of O(log n) for n leaves, with leaf nodes at the bottom and a single root at the top, ensuring efficient operations.

3. Merkle Proofs
Merkle proofs are a key feature of Merkle trees, enabling efficient and secure verification of data inclusion. A Merkle proof is a sequence of sibling hashes along the path from a leaf to the root, used to verify that a specific data item is part of the tree without needing the entire dataset.

- How it Works: To prove a leaf’s inclusion, provide its hash and the sibling hashes at each level. The verifier recomputes the Merkle root by hashing the leaf with its sibling, then hashing the result with the next sibling, up to the root. If the recomputed root matches the known root, the proof is valid.

- Efficiency: While the input size grows linearly O(n), Merkle proofs require only O(log n) hashes due to the tree’s logarithmic height, making verification fast even for large datasets.

- Lightweight: Proofs enable light clients (e.g., Bitcoin’s SPV clients / Crypto Wallets) to verify transactions without storing the full blockchain, ideal for resource-constrained devices.

- Trustless: Proofs rely on cryptographic hashes, allowing verification without trusting the data provider, as long as the Merkle root is trusted.

- Tamper-Evidence: An incorrect proof results in a mismatched root, ensuring data integrity.

## Components of a Merkle Tree
To implement the simluation of a merkle tree data structure in Python, this notebook comprises of the following components to build the merkle tree:

1. Hash Function
2. Leaf Node Creation
3. Tree Construction
4. Proof Generation
5. Proof Verification

# 1. Hash Function
First lets start with the hash function. The hash function converts data into a fixed-size SHA-256 hash, a key step for Merkle trees in blockchains like Bitcoin and Ethereum. We use Python’s `hashlib` module to hash a string input into a 32-byte hash, ensuring data integrity.  Notice how the hash always outputs 32 bytes or 64 characters when displayed as a hexadecimal (2 chars per byte). Run the cell below to see the output.

In [6]:
import hashlib
def sha256_hash(data: str) -> bytes:
    """Hash a string using SHA-256, returning a 32-byte hash.
    Args:
        data: Input string to hash (encoded as UTF-8).
    Returns:
        32-byte SHA-256 hash.
    """
    return hashlib.sha256(data.encode('utf-8')).digest()

input_data1 = "transaction1"
input_data2 = "txn1"
output_str1 =  sha256_hash(input_data1)
output_str2 =  sha256_hash(input_data2)
print(f"output_str1 (bytes): {output_str1}")
print(f"output_str2 (bytes): {output_str2}")
print(f"output_str1 (hex): {output_str1.hex()}")
print(f"output_str2 (hex): {output_str2.hex()}")

output_str1 (bytes): b'\xbd\xe4i>U\xa36\xff\x81\xab#\x8c\xe2\x0c\xae\x1d\xd9\xc8\xba\x03\xb9\xb8\xf49c\xf5V\x9b\xf3\xcfR)'
output_str2 (bytes): b'\xbc\xad\x10\xeeS\xaf1\xca\xc1.5\x8e\xd6\xb8F\xd5\x0e~\xc4\xb5f\x9e\xf6\xae\xd5\xb5s\x8f\xe6$\x8b\x16'
output_str1 (hex): bde4693e55a336ff81ab238ce20cae1dd9c8ba03b9b8f43963f5569bf3cf5229
output_str2 (hex): bcad10ee53af31cac12e358ed6b846d50e7ec4b5669ef6aed5b5738fe6248b16


# 2. Leaf Node Creation


# 3. Tree Construction

# 4. Proof Generation

# 5. Proof Verification

### Putting it Altogether

### References
1. [Introduction to Merkle Tree by Geeks for Geeks](https://www.geeksforgeeks.org/introduction-to-merkle-tree/) 
2. [What are Merkle trees? By Alchemy](https://www.alchemy.com/docs/what-are-merkle-trees)
3. [Balanced Binary Tree by Geeks for Geeks](https://www.geeksforgeeks.org/balanced-binary-tree/)