Skip to content

cogcoin/bitcoin-scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

@cogcoin/bitcoin-scrape

@cogcoin/bitcoin-scrape@1.0.0 is the public scraper used to build Cogcoin's mainnet getblock archive artifacts for faster managed Bitcoin Core catch-up after the AssumeUTXO base height 910000.

Use Node 22 or newer.

Purpose

This repo exists so the hosted getblock archive can be audited and reproduced. It shows exactly how the public artifacts are assembled from a fully synchronized Bitcoin Core node using bitcoin-cli.

The scraper produces the artifact pair that Cogcoin clients download from:

  • https://snapshots.cogcoin.org/getblock-910000-latest.dat
  • https://snapshots.cogcoin.org/getblock-910000-latest.json

The archive is for bitcoind catch-up, not Cogcoin indexer replay.

Inputs

The scraper reads blocks from a fully synchronized mainnet node via:

  • bitcoin-cli getblockhash <height>
  • bitcoin-cli getblock <hash> 0

By default it resolves bitcoin-cli from:

  1. --bitcoin-cli <path>
  2. BITCOIN_CLI
  3. plain bitcoin-cli on PATH

Extra Bitcoin Core CLI flags can be passed with repeated --bitcoin-cli-arg <arg> values such as -datadir=/path/to/bitcoin.

Output

The scraper writes:

  • getblock-910000-latest.dat
  • getblock-910000-latest.json

The .dat file is a blkfile-style binary container suitable for bitcoind -loadblock=..., with repeated records of:

  • 4-byte mainnet magic
  • 4-byte little-endian raw block length
  • raw block bytes

The manifest records:

  • format version
  • chain and base snapshot height
  • first and last included block heights
  • block count
  • artifact size and whole-file SHA-256
  • chunk size and per-chunk SHA-256 list
  • per-block offsets, lengths, hashes, and previous-block hashes

Usage

Build the repo:

npm install
npm run build

Capture or extend the archive:

node dist/cli.js capture --to-height 945188
node dist/cli.js capture --to-height 945188 --reset

Point at a specific Bitcoin Core datadir if needed:

node dist/cli.js capture --to-height 945188 --bitcoin-cli-arg=-datadir=/path/to/bitcoin

Capture Behavior

Default behavior is safe append:

  • existing getblock-910000-latest.dat and getblock-910000-latest.json are validated first
  • append resumes from manifest.endHeight + 1
  • if local state is inconsistent, the scraper fails instead of guessing
  • --reset rebuilds the artifact pair from 910001

The scraper does not wait for future blocks in one long-running invocation. If --to-height is ahead of the node's current tip, it stops cleanly at the highest block available at the moment it attempts the next height and leaves the local artifact pair committed at the last completed block.

Each committed block is printed as it is added:

Added block 945189: <blockhash>
Added block 945190: <blockhash>

Durability

The scraper is designed so interruptions do not leave the public artifact pair in an unusable state:

  • each appended block is written to the .dat file
  • the file is synced before the manifest is advanced
  • the manifest is rewritten atomically after each committed block
  • if a previous run died after extending the .dat file but before advancing the manifest, the next run trims the uncommitted tail back to the last manifest-backed height before resuming

This makes the local artifact pair safe against:

  • process interruption
  • laptop sleep
  • network loss while scraping from bitcoin-cli
  • partial append failures

Cache Busting

The published artifact family uses height-only cache busting:

  • getblock-910000-latest.json?end=<endHeight>
  • getblock-910000-latest.dat?end=<endHeight>

The canonical cache-bust value is the manifest's endHeight.

Repository Scope

This repo is intentionally narrow:

  • scraper code only
  • no hosted secrets
  • no private node credentials
  • no generated public archive artifacts checked into source control

Local generated outputs are ignored by .gitignore.

About

Bitcoin getBlock scraper to speed up initial sync

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors