@cogcoin/bitcoin-scrape@1.0.0 is the public scraper used to build Cogcoin's mainnet getblock archive artifacts for faster managed Bitcoin Core catch-up after the AssumeUTXO base height 910000.
Use Node 22 or newer.
This repo exists so the hosted getblock archive can be audited and reproduced.
It shows exactly how the public artifacts are assembled from a fully synchronized Bitcoin Core node using bitcoin-cli.
The scraper produces the artifact pair that Cogcoin clients download from:
https://snapshots.cogcoin.org/getblock-910000-latest.dathttps://snapshots.cogcoin.org/getblock-910000-latest.json
The archive is for bitcoind catch-up, not Cogcoin indexer replay.
The scraper reads blocks from a fully synchronized mainnet node via:
bitcoin-cli getblockhash <height>bitcoin-cli getblock <hash> 0
By default it resolves bitcoin-cli from:
--bitcoin-cli <path>BITCOIN_CLI- plain
bitcoin-clionPATH
Extra Bitcoin Core CLI flags can be passed with repeated --bitcoin-cli-arg <arg> values such as -datadir=/path/to/bitcoin.
The scraper writes:
getblock-910000-latest.datgetblock-910000-latest.json
The .dat file is a blkfile-style binary container suitable for bitcoind -loadblock=..., with repeated records of:
- 4-byte mainnet magic
- 4-byte little-endian raw block length
- raw block bytes
The manifest records:
- format version
- chain and base snapshot height
- first and last included block heights
- block count
- artifact size and whole-file SHA-256
- chunk size and per-chunk SHA-256 list
- per-block offsets, lengths, hashes, and previous-block hashes
Build the repo:
npm install
npm run buildCapture or extend the archive:
node dist/cli.js capture --to-height 945188
node dist/cli.js capture --to-height 945188 --resetPoint at a specific Bitcoin Core datadir if needed:
node dist/cli.js capture --to-height 945188 --bitcoin-cli-arg=-datadir=/path/to/bitcoinDefault behavior is safe append:
- existing
getblock-910000-latest.datandgetblock-910000-latest.jsonare validated first - append resumes from
manifest.endHeight + 1 - if local state is inconsistent, the scraper fails instead of guessing
--resetrebuilds the artifact pair from910001
The scraper does not wait for future blocks in one long-running invocation.
If --to-height is ahead of the node's current tip, it stops cleanly at the highest block available at the moment it attempts the next height and leaves the local artifact pair committed at the last completed block.
Each committed block is printed as it is added:
Added block 945189: <blockhash>
Added block 945190: <blockhash>
The scraper is designed so interruptions do not leave the public artifact pair in an unusable state:
- each appended block is written to the
.datfile - the file is synced before the manifest is advanced
- the manifest is rewritten atomically after each committed block
- if a previous run died after extending the
.datfile but before advancing the manifest, the next run trims the uncommitted tail back to the last manifest-backed height before resuming
This makes the local artifact pair safe against:
- process interruption
- laptop sleep
- network loss while scraping from
bitcoin-cli - partial append failures
The published artifact family uses height-only cache busting:
getblock-910000-latest.json?end=<endHeight>getblock-910000-latest.dat?end=<endHeight>
The canonical cache-bust value is the manifest's endHeight.
This repo is intentionally narrow:
- scraper code only
- no hosted secrets
- no private node credentials
- no generated public archive artifacts checked into source control
Local generated outputs are ignored by .gitignore.