A zero-copy, allocation-free parser for Bitcoin blockchain binary data written in Rust, designed for high-throughput indexers, analytics engines, and embedded environments.
A low-level Bitcoin binary parser focused on performance, memory locality, and predictable streaming behavior.
| Zero-copy | All parsed structures borrow &'a [u8] directly from the input — no memcpy, no String, no Vec. |
| No alloc | Compatible with #![no_std] targets. Use in embedded devices, WASM, kernel modules. |
| Streaming | BlockTxIter and TransactionParser process transactions lazily via closures — never load an entire block into structured memory. |
| Fast | Parsing an 80-byte block header requires only ~10 integer reads from a contiguous buffer. Block file iteration is a tight loop over magic bytes and size fields. |
| Safe | unsafe is used only inside cursor.rs for pointer arithmetic after explicit bounds checks. Every unsafe block is annotated. |
| Feature | blockchain-zc-parser | rust-bitcoin |
|---|---|---|
| Zero-copy | ✅ | Partial |
| Alloc-free parsing | ✅ | ❌ |
| Streaming block iteration | ✅ | ❌ |
| Full protocol model | ❌ | ✅ |
- Block headers (80 bytes, Bitcoin protocol)
- Legacy and SegWit (BIP 141) transactions
- Bitcoin script pattern matching:
P2PKH,P2SH,P2WPKH,P2WSH,P2TR,P2PK,OP_RETURN, bare multisig
blkNNNNN.datraw block files written by Bitcoin Core
[dependencies]
blockchain-zc-parser = "0.1"use blockchain_zc_parser::{Cursor, BlockHeader};
fn parse(raw_80_bytes: &[u8]) -> blockchain_zc_parser::ParseResult<()> {
let mut cursor = Cursor::new(raw_80_bytes);
let header = BlockHeader::parse(&mut cursor)?;
println!("version = {}", header.version);
println!("timestamp = {}", header.timestamp);
println!("nonce = {:#010x}", header.nonce);
println!("prev_hash = {}", header.prev_block); // Display impl, no alloc
// header.prev_block is a &[u8;32] pointing into raw_80_bytes — no copy.
Ok(())
}Download a raw block (example: Bitcoin genesis block):
curl -L \
"https://mempool.space/api/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f/raw" \
-o genesis.binRun the example parser:
cargo run --example parse_block -- genesis.binSummary-only mode (no per-transaction printing):
cargo run --example parse_block -- --summary genesis.binLimit printed transactions:
cargo run --example parse_block -- --limit-tx 5 genesis.binPrint a specific transaction index:
cargo run --example parse_block -- --tx 100 genesis.binThere are two different binary formats you may encounter:
This is the pure Bitcoin block payload:
[80-byte header]
[varint tx_count]
[transactions...]
It contains no magic bytes and no size prefix.
You typically obtain it via:
curl -L \
"https://mempool.space/api/block/<blockhash>/raw" \
-o block.binThis format can be parsed directly with:
let (header, iter) = BlockTxIter::new(raw_block_bytes)?;Files in your local Bitcoin Core data directory:
~/.bitcoin/blocks/blk00000.dat
Each file contains multiple blocks, each prefixed by:
[4-byte magic][4-byte little-endian size][raw block]
To parse these files, use BlkFileIter:
use blockchain_zc_parser::block::{BlkFileIter, MAINNET_MAGIC};
let mut it = BlkFileIter::new(file_bytes, MAINNET_MAGIC);
while let Some(raw_block) = it.next_block()? {
let (_header, mut tx_iter) = BlockTxIter::new(raw_block)?;
// process block...
}Most blockchain parsers:
- Allocate
Vecs for every transaction - Copy script bytes into owned buffers
- Build large in-memory representations
- Optimize for convenience over throughput
blockchain-zc-parser takes the opposite approach:
- Every structure borrows directly from
&[u8] - No heap allocation in parsing paths
- Streaming transaction iteration
- Minimal and auditable
unsafe - Designed for memory-mapped
blkNNNNN.datprocessing
If you care about throughput, memory locality, and predictable performance, this crate is built for you.
If you need:
- Full Bitcoin protocol validation
- Address encoding/decoding
- PSBT, descriptors, miniscript
- Wallet functionality
Use rust-bitcoin instead.
If you pass a blkNNNNN.dat file directly to BlockTxIter::new, parsing will fail
because the file contains magic bytes and size prefixes.
The parse_block example automatically detects and unwraps the first block
from a blkNNNNN.dat file if necessary.
Bitcoin blocks can exceed 1–2 MB and may contain thousands of transactions.
A traditional parser typically:
- Allocates
Vecs for inputs and outputs - Copies script bytes into owned buffers
- Builds full in-memory representations
blockchain-zc-parser avoids all of this.
Every parsed structure borrows directly from the original &[u8] buffer.
No heap allocations. No memcpy. No string building.
This has several practical consequences:
- High throughput (hundreds of MB/s on modern CPUs)
- Very low memory usage
- Suitable for streaming, indexers, and embedded environments
- Works naturally with memory-mapped files (
mmap)
For indexers and blockchain analytics pipelines, this allows processing entire block files with near-linear memory access patterns.
use blockchain_zc_parser::{BlockTxIter, script::ScriptType};
fn scan_block(raw_block: &[u8]) -> blockchain_zc_parser::ParseResult<u64> {
let (_header, mut iter) = BlockTxIter::new(raw_block)?;
let mut total_satoshis: u64 = 0;
while iter.next_tx(
|_input| Ok(()), // called for every TxInput
|output| { // called for every TxOutput
total_satoshis += output.value;
if let ScriptType::P2WPKH { pubkey_hash } = output.script_pubkey.script_type() {
// pubkey_hash: &[u8; 20] — zero-copy pointer into raw_block
println!(" P2WPKH output to {:?}", pubkey_hash);
}
Ok(())
},
)? {}
Ok(total_satoshis)
}use blockchain_zc_parser::block::{BlkFileIter, MAINNET_MAGIC};
fn count_blocks(file_bytes: &[u8]) -> usize {
let mut iter = BlkFileIter::new(file_bytes, MAINNET_MAGIC);
let mut count = 0;
while let Ok(Some(_raw_block)) = iter.next_block() {
count += 1;
}
count
}src/
├── lib.rs — crate root, re-exports
├── cursor.rs — zero-copy Cursor<'a> over &[u8] ← start here
├── error.rs — ParseError enum, no_std compatible
├── hash.rs — Hash32<'a> / Hash20<'a> wrappers
├── script.rs — Script<'a>, ScriptType, instruction iterator
├── transaction.rs — TxInput, TxOutput, OutPoint, TransactionParser
└── block.rs — BlockHeader, BlockTxIter, BlkFileIter
The Cursor type is the single entry point for all parsing.
It advances a usize offset into a &'a [u8] and returns sub-slices with
lifetime 'a — identical to the original input. No unsafe code exists outside
this file.
- Zero-copy first — data is never duplicated.
- Streaming over materialization — process blocks incrementally.
- no_std compatible — works outside of full OS environments.
- Explicit safety — all
unsafeis documented and bounded. - Performance transparency — benchmarked and reproducible.
Run on an Apple M2 Pro (single-core, Rust stable 1.88 at time of measurement, --release):
| Benchmark | Throughput |
|---|---|
block_header/parse_80_bytes |
~1.1 GB/s |
transaction/parse/coinbase |
~860 MB/s |
transaction/parse/p2pkh_2out |
~740 MB/s |
block/streaming_iter/tx_count=1000 |
~695 MB/s |
Run yourself:
cargo bench
# HTML report: target/criterion/report/index.htmlDisable the default feature set (which enables std):
[dependencies]
blockchain-zc-parser = { version = "0.1", default-features = false }With default-features = false:
- All
std::error::Errorimpls are removed. BlockHeader::block_hash()(requires SHA-256) is removed — callsha2::Sha256directly onheader.raw.- Everything else works identically.
Rust 1.88+ (edition 2021). The crate uses only stable Rust features.
The only unsafe code lives in src/cursor.rs:
// SAFETY: `end` was checked to be ≤ data.len() on the line above.
let slice = unsafe { self.data.get_unchecked(self.pos..end) };All other code is safe Rust. The crate passes cargo miri test (run it yourself with cargo +nightly miri test).
Pull requests are welcome. Please:
- Run:
cargo test
cargo clippy --all-targets --all-features -- -D warnings
- Add a unit test for any new parsing logic.
- Keep
unsafeblocks minimal and documented.
This project is licensed under the Apache License 2.0 — see the LICENSE file for details.