Rates Engine v0.5.0-rc.75
Pre-release
Pre-release
·
600 commits
to main
since this release
[v0.5.0-rc.75] — 2026-05-24
Changed
verify-archive-tier-a.serviceswitched toType=notify+
WatchdogSec=1h(#62). Replaces theType=oneshot+ fixed
TimeoutStartSecpair. The binary signalsREADY=1at start and
WATCHDOG=1every 30 s for the rest of its life; systemd SIGTERMs
only on real silence (binary hung / crashed / dead-locked), not on
a guessed wall-clock cap. The walk can take 25h+ on single-chunk
serial without anyone re-tuning aTimeoutStartSec; as long as
it's making progress, the watchdog stays satisfied. The previous
17h → 36hTimeoutStartSecraise from earlier in this Unreleased
block is superseded by this change.sd_notifyis a no-op when
$NOTIFY_SOCKETisn't set, so manualratesengine-ops verify- archiveinvocations from a shell are unaffected. Adds
github.com/coreos/go-systemd/v22as a Go dependency.
Added
verify-archiveper-chunk resume across restarts (#62). The
state file previously was written only on clean exit ("at the end
on success" per the doc comment) — a SIGTERM during a bootstrap
walk discarded everything verified so far, so
-from-last-verifiedon the next fire restarted from genesis.
Yesterday's #62 walk burned 17h05m of clean work (ledger 2 →
40.5M, every line verified) before the timeout SIGTERMed it; with
state writing only on completion, that progress couldn't be
recovered.- The state file now carries a per-tier
InProgresssection with
the original run plan (from / to / workers / chunks) plus a
Doneflag per chunk. As each chunk's walker completes
successfully, the orchestrator marks that chunk Done and
rewrites the state file atomically. SIGTERM at any point leaves
a coherent record of which chunks finished. - On the next fire,
verify-archivereads the prior InProgress.
If the run plan still matches (same from / to / workers /
chunk-count), only the chunks that aren't Done get re-issued —
completed chunks are skipped. A plan mismatch (operator changed
flags, tip moved) ignores the prior state cleanly and starts
fresh, with a log line naming the difference. - Worst case: a SIGTERM mid-walk loses one chunk's in-flight work
(~1/12 of the bootstrap, not the whole thing). Mid-chunk
resumption would require persisting a per-chunk-resume-from- hashanchor — deferred until the 1/12-loss proves too costly. - The end-of-walk
updateTierStateunconditionally clears
InProgress now, including on no-advance success (every chunk
was already Done from the prior run). Legacy state files
without thein_progresskey parse cleanly asnil.
- The state file now carries a per-tier