Summary
Connection.IsNodeArchive() (src/apps/chifra/pkg/rpc/is_node.go) is misnamed and doesn't reliably answer "is this an archive node." It performs a consistency check between chifra's bundled <chain>/allocs.csv and the RPC's eth_getBalance(largestPrefund, blockNumber=0). The function name implies an archive-mode probe; the implementation tests something different. This has practical consequences for chifra scrape and chifra state, both of which call it as a gate.
Current implementation (v5.9.3, master)
// src/apps/chifra/pkg/rpc/is_node.go
func (conn *Connection) IsNodeArchive() bool {
thePath := filepath.Join(config.MustGetPathToChainConfig(conn.Chain), "allocs.csv")
if !file.FileExists(thePath) {
logger.Warn("No pre-allocation file found at", thePath, "assuming an archive node")
return true
}
largest, err := prefunds.GetLargestPrefund(conn.Chain, thePath)
if err != nil {
return false
}
bal, err := conn.GetBalanceAt(largest.Address, 0)
if err != nil {
return false
}
return bal.Cmp(&largest.Balance) == 0
}
Why this is wrong
A standard archive-mode check asks: "can this node still serve historical state?" The conventional probe is eth_getBalance(addr, oldBlock) (or eth_call, eth_getProof, etc.) and looking for a "missing trie node" / "state not available" / -32000 error from a pruned node. Geth, Reth, Erigon all surface the pruned state this way.
What chifra actually checks instead — does the value in my bundled CSV match what the node says block 0's largest prefund balance is — is a different question with several failure modes:
-
Returns false for an actual archive node when chifra has no CSV for the chain. The "file doesn't exist" early-return masks this in the simple case, but LoadPrefunds (called from chifra scrape's prepare step, chifra names handle_clean, pkg/names/names.go, and pkg/rpc/get_transaction.go) opens the file with os.O_RDWR|os.O_CREATE. So as soon as any of those paths run, chifra auto-creates an empty allocs.csv. On the next IsNodeArchive call the file exists, gocsv.UnmarshalToCallback errors on the empty input, and the function returns false — even though nothing about the node's actual archive-ness changed.
-
Returns false for an archive node serving a chain whose genesis chifra has the wrong CSV for. Shadowforks, custom devnets, or any chain whose canonical genesis allocations differ from chifra's bundled values will mismatch and be declared non-archive even if they're full archives.
-
Returns true for a pruned node iff the prune horizon is recent enough that block 0 state is still served from genesis. Most pruned configurations keep genesis state available; this means the check can return true for nodes that are emphatically not archive (they'd return "missing trie node" on a balance lookup at block 10_000_000 but happily serve block 0).
-
Side-effect via Address.Hex() short-circuit. Because Address.Hex() returns "0x0" for the zero address (src/apps/chifra/pkg/base/address.go), if the largest-balance row in allocs.csv happens to be the zero address, IsValidAddress(record.Address.Hex()) rejects it (length check fails on "0x0" vs the required 42-char 0x…), the row gets filtered, LoadPrefunds returns no allocations, and IsNodeArchive returns false. That filtering happens at pkg/prefunds/prefunds.go LoadPrefunds callback.
Concrete repro on a known-archive Erigon
# 1. Run Erigon in archive mode (default behavior), confirm it's archive:
curl -X POST -H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","method":"eth_getBalance","params":["0x0000000000000000000000000000000000000001","0xa0000"],"id":1}' \
http://erigon:8545
# returns a value (no "missing trie node" error)
# 2. Point chifra at it for a chain without a bundled allocs.csv (e.g., any
# fresh devnet, or any chain you've added via [chains.<name>] without
# shipping per-chain config files):
chifra status
# isArchive: true (file-doesn't-exist branch)
chifra scrape # touches LoadPrefunds, auto-creates empty allocs.csv
chifra status
# isArchive: false (empty file now exists, gocsv parse fails)
Same node, no node-side change, just chifra's own side effect on its config dir flipping the answer.
Proposed fix
Two options, in order of cleanliness:
Option A (recommended): replace IsNodeArchive with a real archive probe.
Call eth_getBalance(<deterministic non-zero address>, <recent-ish block>) and inspect the error. The conventional non-zero address is one of the precompiles (e.g. 0x0000…0001). The block can be either 1 (always exists once the chain has produced at least one block) or, for chains where the user might be talking to a snapshot-syncing node, a block ~256 blocks before head (geth's default pruning threshold). Real archives return a value; pruned nodes return -32000 missing trie node or equivalent.
This eliminates the entire allocs.csv consistency-check failure mode and stops needing chifra to ship per-chain genesis-allocation CSVs purely to answer "is this archive."
Option B: keep the CSV-consistency check but rename it and stop using it as the archive gate.
The check is still potentially useful as a "the connected node is on the chain I think it is" sanity check at chifra startup — call it IsExpectedChain() or MatchesGenesis(), and let chifra scrape / chifra state gate on IsNodeArchive() (the real one from A).
Affected commands
chifra scrape, chifra state, plus any code path that calls Connection.IsNodeArchive() directly. Status output (chifra status → /status API) also exposes the wrong isArchive value back to callers.
Workaround we're using meanwhile
For context: I'm integrating chifra into ethpandaops/ethereum-package (kurtosis devnets). To get past the gate, our entrypoint queries the RPC for the first prefunded account's balance at block 0 and writes a self-consistent <chain>/allocs.csv at runtime so chifra's check evaluates trivially-true. This is self-consistent but obviously a workaround for the check's behavior, not a real archive-mode confirmation. Happy to drop it the moment IsNodeArchive is a real probe.
(I tried the [requires] archive = false config — at least in v5.9.3 chifra scrape still calls IsNodeArchive() directly and the [requires] setting doesn't bypass it. If that's only configured on master, let me know and I'll re-test.)
Summary
Connection.IsNodeArchive()(src/apps/chifra/pkg/rpc/is_node.go) is misnamed and doesn't reliably answer "is this an archive node." It performs a consistency check between chifra's bundled<chain>/allocs.csvand the RPC'seth_getBalance(largestPrefund, blockNumber=0). The function name implies an archive-mode probe; the implementation tests something different. This has practical consequences forchifra scrapeandchifra state, both of which call it as a gate.Current implementation (v5.9.3, master)
Why this is wrong
A standard archive-mode check asks: "can this node still serve historical state?" The conventional probe is
eth_getBalance(addr, oldBlock)(oreth_call,eth_getProof, etc.) and looking for a "missing trie node" / "state not available" /-32000error from a pruned node. Geth, Reth, Erigon all surface the pruned state this way.What chifra actually checks instead — does the value in my bundled CSV match what the node says block 0's largest prefund balance is — is a different question with several failure modes:
Returns false for an actual archive node when chifra has no CSV for the chain. The "file doesn't exist" early-return masks this in the simple case, but
LoadPrefunds(called fromchifra scrape's prepare step,chifra names handle_clean,pkg/names/names.go, andpkg/rpc/get_transaction.go) opens the file withos.O_RDWR|os.O_CREATE. So as soon as any of those paths run, chifra auto-creates an emptyallocs.csv. On the nextIsNodeArchivecall the file exists,gocsv.UnmarshalToCallbackerrors on the empty input, and the function returns false — even though nothing about the node's actual archive-ness changed.Returns false for an archive node serving a chain whose genesis chifra has the wrong CSV for. Shadowforks, custom devnets, or any chain whose canonical genesis allocations differ from chifra's bundled values will mismatch and be declared non-archive even if they're full archives.
Returns true for a pruned node iff the prune horizon is recent enough that block 0 state is still served from genesis. Most pruned configurations keep genesis state available; this means the check can return true for nodes that are emphatically not archive (they'd return "missing trie node" on a balance lookup at block 10_000_000 but happily serve block 0).
Side-effect via
Address.Hex()short-circuit. BecauseAddress.Hex()returns"0x0"for the zero address (src/apps/chifra/pkg/base/address.go), if the largest-balance row inallocs.csvhappens to be the zero address,IsValidAddress(record.Address.Hex())rejects it (length check fails on"0x0"vs the required 42-char0x…), the row gets filtered,LoadPrefundsreturns no allocations, andIsNodeArchivereturns false. That filtering happens atpkg/prefunds/prefunds.goLoadPrefundscallback.Concrete repro on a known-archive Erigon
Same node, no node-side change, just chifra's own side effect on its config dir flipping the answer.
Proposed fix
Two options, in order of cleanliness:
Option A (recommended): replace
IsNodeArchivewith a real archive probe.Call
eth_getBalance(<deterministic non-zero address>, <recent-ish block>)and inspect the error. The conventional non-zero address is one of the precompiles (e.g.0x0000…0001). The block can be either1(always exists once the chain has produced at least one block) or, for chains where the user might be talking to a snapshot-syncing node, a block ~256 blocks before head (geth's default pruning threshold). Real archives return a value; pruned nodes return-32000 missing trie nodeor equivalent.This eliminates the entire
allocs.csvconsistency-check failure mode and stops needing chifra to ship per-chain genesis-allocation CSVs purely to answer "is this archive."Option B: keep the CSV-consistency check but rename it and stop using it as the archive gate.
The check is still potentially useful as a "the connected node is on the chain I think it is" sanity check at chifra startup — call it
IsExpectedChain()orMatchesGenesis(), and letchifra scrape/chifra stategate onIsNodeArchive()(the real one from A).Affected commands
chifra scrape,chifra state, plus any code path that callsConnection.IsNodeArchive()directly. Status output (chifra status→/statusAPI) also exposes the wrongisArchivevalue back to callers.Workaround we're using meanwhile
For context: I'm integrating chifra into
ethpandaops/ethereum-package(kurtosis devnets). To get past the gate, our entrypoint queries the RPC for the first prefunded account's balance at block 0 and writes a self-consistent<chain>/allocs.csvat runtime so chifra's check evaluates trivially-true. This is self-consistent but obviously a workaround for the check's behavior, not a real archive-mode confirmation. Happy to drop it the momentIsNodeArchiveis a real probe.(I tried the
[requires] archive = falseconfig — at least in v5.9.3chifra scrapestill callsIsNodeArchive()directly and the[requires]setting doesn't bypass it. If that's only configured on master, let me know and I'll re-test.)