Rates Engine v0.5.0-rc.75

Pre-release

Pre-release

github-actions released this 24 May 12:34

· 600 commits to main since this release

c5da280

[v0.5.0-rc.75] — 2026-05-24

Changed

verify-archive-tier-a.service switched to Type=notify +
WatchdogSec=1h (#62). Replaces the Type=oneshot + fixed
TimeoutStartSec pair. The binary signals READY=1 at start and
WATCHDOG=1 every 30 s for the rest of its life; systemd SIGTERMs
only on real silence (binary hung / crashed / dead-locked), not on
a guessed wall-clock cap. The walk can take 25h+ on single-chunk
serial without anyone re-tuning a TimeoutStartSec; as long as
it's making progress, the watchdog stays satisfied. The previous
17h → 36h TimeoutStartSec raise from earlier in this Unreleased
block is superseded by this change. sd_notify is a no-op when
$NOTIFY_SOCKET isn't set, so manual ratesengine-ops verify- archive invocations from a shell are unaffected. Adds
github.com/coreos/go-systemd/v22 as a Go dependency.

Added

verify-archive per-chunk resume across restarts (#62). The
state file previously was written only on clean exit ("at the end
on success" per the doc comment) — a SIGTERM during a bootstrap
walk discarded everything verified so far, so
-from-last-verified on the next fire restarted from genesis.
Yesterday's #62 walk burned 17h05m of clean work (ledger 2 →
40.5M, every line verified) before the timeout SIGTERMed it; with
state writing only on completion, that progress couldn't be
recovered.
- The state file now carries a per-tier InProgress section with
  the original run plan (from / to / workers / chunks) plus a
  Done flag per chunk. As each chunk's walker completes
  successfully, the orchestrator marks that chunk Done and
  rewrites the state file atomically. SIGTERM at any point leaves
  a coherent record of which chunks finished.
- On the next fire, verify-archive reads the prior InProgress.
  If the run plan still matches (same from / to / workers /
  chunk-count), only the chunks that aren't Done get re-issued —
  completed chunks are skipped. A plan mismatch (operator changed
  flags, tip moved) ignores the prior state cleanly and starts
  fresh, with a log line naming the difference.
- Worst case: a SIGTERM mid-walk loses one chunk's in-flight work
  (~1/12 of the bootstrap, not the whole thing). Mid-chunk
  resumption would require persisting a per-chunk -resume-from- hash anchor — deferred until the 1/12-loss proves too costly.
- The end-of-walk updateTierState unconditionally clears
  InProgress now, including on no-advance success (every chunk
  was already Done from the prior run). Legacy state files
  without the in_progress key parse cleanly as nil.

Assets 9