Skip to content

cppNexus/blockchain-zc-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

blockchain-zc-parser

Crates.io Docs.rs MSRV Downloads CI License: Apache-2.0

A zero-copy, allocation-free parser for Bitcoin blockchain binary data written in Rust, designed for high-throughput indexers, analytics engines, and embedded environments.

A low-level Bitcoin binary parser focused on performance, memory locality, and predictable streaming behavior.


Features

Zero-copy All parsed structures borrow &'a [u8] directly from the input — no memcpy, no String, no Vec.
No alloc Compatible with #![no_std] targets. Use in embedded devices, WASM, kernel modules.
Streaming BlockTxIter and TransactionParser process transactions lazily via closures — never load an entire block into structured memory.
Fast Parsing an 80-byte block header requires only ~10 integer reads from a contiguous buffer. Block file iteration is a tight loop over magic bytes and size fields.
Safe unsafe is used only inside cursor.rs for pointer arithmetic after explicit bounds checks. Every unsafe block is annotated.

Comparison

Feature blockchain-zc-parser rust-bitcoin
Zero-copy Partial
Alloc-free parsing
Streaming block iteration
Full protocol model

Supported formats

  • Block headers (80 bytes, Bitcoin protocol)
  • Legacy and SegWit (BIP 141) transactions
  • Bitcoin script pattern matching:
    • P2PKH, P2SH, P2WPKH, P2WSH, P2TR, P2PK, OP_RETURN, bare multisig
  • blkNNNNN.dat raw block files written by Bitcoin Core

Quick start

[dependencies]
blockchain-zc-parser = "0.1"

Parse a block header

use blockchain_zc_parser::{Cursor, BlockHeader};

fn parse(raw_80_bytes: &[u8]) -> blockchain_zc_parser::ParseResult<()> {
    let mut cursor = Cursor::new(raw_80_bytes);
    let header = BlockHeader::parse(&mut cursor)?;

    println!("version   = {}", header.version);
    println!("timestamp = {}", header.timestamp);
    println!("nonce     = {:#010x}", header.nonce);
    println!("prev_hash = {}", header.prev_block);   // Display impl, no alloc

    // header.prev_block is a &[u8;32] pointing into raw_80_bytes — no copy.
    Ok(())
}

Example: parse a raw block file

Download a raw block (example: Bitcoin genesis block):

curl -L \
"https://mempool.space/api/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f/raw" \
-o genesis.bin

Run the example parser:

cargo run --example parse_block -- genesis.bin

Summary-only mode (no per-transaction printing):

cargo run --example parse_block -- --summary genesis.bin

Limit printed transactions:

cargo run --example parse_block -- --limit-tx 5 genesis.bin

Print a specific transaction index:

cargo run --example parse_block -- --tx 100 genesis.bin

Raw block vs blkNNNNN.dat

There are two different binary formats you may encounter:

Raw block (.bin, RPC, mempool API)

This is the pure Bitcoin block payload:

[80-byte header]
[varint tx_count]
[transactions...]

It contains no magic bytes and no size prefix.

You typically obtain it via:

curl -L \
"https://mempool.space/api/block/<blockhash>/raw" \
-o block.bin

This format can be parsed directly with:

let (header, iter) = BlockTxIter::new(raw_block_bytes)?;

Bitcoin Core blkNNNNN.dat

Files in your local Bitcoin Core data directory:

~/.bitcoin/blocks/blk00000.dat

Each file contains multiple blocks, each prefixed by:

[4-byte magic][4-byte little-endian size][raw block]

To parse these files, use BlkFileIter:

use blockchain_zc_parser::block::{BlkFileIter, MAINNET_MAGIC};

let mut it = BlkFileIter::new(file_bytes, MAINNET_MAGIC);
while let Some(raw_block) = it.next_block()? {
    let (_header, mut tx_iter) = BlockTxIter::new(raw_block)?;
    // process block...
}

Why this crate exists

Most blockchain parsers:

  • Allocate Vecs for every transaction
  • Copy script bytes into owned buffers
  • Build large in-memory representations
  • Optimize for convenience over throughput

blockchain-zc-parser takes the opposite approach:

  • Every structure borrows directly from &[u8]
  • No heap allocation in parsing paths
  • Streaming transaction iteration
  • Minimal and auditable unsafe
  • Designed for memory-mapped blkNNNNN.dat processing

If you care about throughput, memory locality, and predictable performance, this crate is built for you.


When not to use this crate

If you need:

  • Full Bitcoin protocol validation
  • Address encoding/decoding
  • PSBT, descriptors, miniscript
  • Wallet functionality

Use rust-bitcoin instead.


Important

If you pass a blkNNNNN.dat file directly to BlockTxIter::new, parsing will fail because the file contains magic bytes and size prefixes.

The parse_block example automatically detects and unwraps the first block from a blkNNNNN.dat file if necessary.


Why zero-copy matters

Bitcoin blocks can exceed 1–2 MB and may contain thousands of transactions.

A traditional parser typically:

  • Allocates Vecs for inputs and outputs
  • Copies script bytes into owned buffers
  • Builds full in-memory representations

blockchain-zc-parser avoids all of this.

Every parsed structure borrows directly from the original &[u8] buffer. No heap allocations. No memcpy. No string building.

This has several practical consequences:

  • High throughput (hundreds of MB/s on modern CPUs)
  • Very low memory usage
  • Suitable for streaming, indexers, and embedded environments
  • Works naturally with memory-mapped files (mmap)

For indexers and blockchain analytics pipelines, this allows processing entire block files with near-linear memory access patterns.


Stream transactions from a block

use blockchain_zc_parser::{BlockTxIter, script::ScriptType};

fn scan_block(raw_block: &[u8]) -> blockchain_zc_parser::ParseResult<u64> {
    let (_header, mut iter) = BlockTxIter::new(raw_block)?;
    let mut total_satoshis: u64 = 0;

    while iter.next_tx(
        |_input| Ok(()),              // called for every TxInput
        |output| {                    // called for every TxOutput
            total_satoshis += output.value;
            if let ScriptType::P2WPKH { pubkey_hash } = output.script_pubkey.script_type() {
                // pubkey_hash: &[u8; 20] — zero-copy pointer into raw_block
                println!("  P2WPKH output to {:?}", pubkey_hash);
            }
            Ok(())
        },
    )? {}

    Ok(total_satoshis)
}

Iterate over a Bitcoin Core blkNNNNN.dat file

use blockchain_zc_parser::block::{BlkFileIter, MAINNET_MAGIC};

fn count_blocks(file_bytes: &[u8]) -> usize {
    let mut iter = BlkFileIter::new(file_bytes, MAINNET_MAGIC);
    let mut count = 0;
    while let Ok(Some(_raw_block)) = iter.next_block() {
        count += 1;
    }
    count
}

Architecture

src/
├── lib.rs          — crate root, re-exports
├── cursor.rs       — zero-copy Cursor<'a> over &[u8]  ← start here
├── error.rs        — ParseError enum, no_std compatible
├── hash.rs         — Hash32<'a> / Hash20<'a> wrappers
├── script.rs       — Script<'a>, ScriptType, instruction iterator
├── transaction.rs  — TxInput, TxOutput, OutPoint, TransactionParser
└── block.rs        — BlockHeader, BlockTxIter, BlkFileIter

The Cursor type is the single entry point for all parsing. It advances a usize offset into a &'a [u8] and returns sub-slices with lifetime 'a — identical to the original input. No unsafe code exists outside this file.


Core design principles

  1. Zero-copy first — data is never duplicated.
  2. Streaming over materialization — process blocks incrementally.
  3. no_std compatible — works outside of full OS environments.
  4. Explicit safety — all unsafe is documented and bounded.
  5. Performance transparency — benchmarked and reproducible.

Benchmarks

Run on an Apple M2 Pro (single-core, Rust stable 1.88 at time of measurement, --release):

Benchmark Throughput
block_header/parse_80_bytes ~1.1 GB/s
transaction/parse/coinbase ~860 MB/s
transaction/parse/p2pkh_2out ~740 MB/s
block/streaming_iter/tx_count=1000 ~695 MB/s

Run yourself:

cargo bench
# HTML report: target/criterion/report/index.html

no_std usage

Disable the default feature set (which enables std):

[dependencies]
blockchain-zc-parser = { version = "0.1", default-features = false }

With default-features = false:

  • All std::error::Error impls are removed.
  • BlockHeader::block_hash() (requires SHA-256) is removed — call sha2::Sha256 directly on header.raw.
  • Everything else works identically.

Minimum supported Rust version (MSRV)

Rust 1.88+ (edition 2021). The crate uses only stable Rust features.


Safety

The only unsafe code lives in src/cursor.rs:

// SAFETY: `end` was checked to be ≤ data.len() on the line above.
let slice = unsafe { self.data.get_unchecked(self.pos..end) };

All other code is safe Rust. The crate passes cargo miri test (run it yourself with cargo +nightly miri test).


Contributing

Pull requests are welcome. Please:

  1. Run:
   cargo test
   cargo clippy --all-targets --all-features -- -D warnings
  1. Add a unit test for any new parsing logic.
  2. Keep unsafe blocks minimal and documented.

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.


About

A zero-copy, allocation-free parser for Bitcoin blockchain binary data written in Rust, designed for high-throughput indexers, analytics engines, and embedded environments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages