Skip to content

Rates Engine v0.5.0-rc.75

Pre-release
Pre-release

Choose a tag to compare

@github-actions github-actions released this 24 May 12:34
· 600 commits to main since this release

[v0.5.0-rc.75] — 2026-05-24

Changed

  • verify-archive-tier-a.service switched to Type=notify +
    WatchdogSec=1h (#62).
    Replaces the Type=oneshot + fixed
    TimeoutStartSec pair. The binary signals READY=1 at start and
    WATCHDOG=1 every 30 s for the rest of its life; systemd SIGTERMs
    only on real silence (binary hung / crashed / dead-locked), not on
    a guessed wall-clock cap. The walk can take 25h+ on single-chunk
    serial without anyone re-tuning a TimeoutStartSec; as long as
    it's making progress, the watchdog stays satisfied. The previous
    17h → 36h TimeoutStartSec raise from earlier in this Unreleased
    block is superseded by this change. sd_notify is a no-op when
    $NOTIFY_SOCKET isn't set, so manual ratesengine-ops verify- archive invocations from a shell are unaffected. Adds
    github.com/coreos/go-systemd/v22 as a Go dependency.

Added

  • verify-archive per-chunk resume across restarts (#62). The
    state file previously was written only on clean exit ("at the end
    on success" per the doc comment) — a SIGTERM during a bootstrap
    walk discarded everything verified so far, so
    -from-last-verified on the next fire restarted from genesis.
    Yesterday's #62 walk burned 17h05m of clean work (ledger 2 →
    40.5M, every line verified) before the timeout SIGTERMed it; with
    state writing only on completion, that progress couldn't be
    recovered.
    • The state file now carries a per-tier InProgress section with
      the original run plan (from / to / workers / chunks) plus a
      Done flag per chunk. As each chunk's walker completes
      successfully, the orchestrator marks that chunk Done and
      rewrites the state file atomically. SIGTERM at any point leaves
      a coherent record of which chunks finished.
    • On the next fire, verify-archive reads the prior InProgress.
      If the run plan still matches (same from / to / workers /
      chunk-count), only the chunks that aren't Done get re-issued —
      completed chunks are skipped. A plan mismatch (operator changed
      flags, tip moved) ignores the prior state cleanly and starts
      fresh, with a log line naming the difference.
    • Worst case: a SIGTERM mid-walk loses one chunk's in-flight work
      (~1/12 of the bootstrap, not the whole thing). Mid-chunk
      resumption would require persisting a per-chunk -resume-from- hash anchor — deferred until the 1/12-loss proves too costly.
    • The end-of-walk updateTierState unconditionally clears
      InProgress now, including on no-advance success (every chunk
      was already Done from the prior run). Legacy state files
      without the in_progress key parse cleanly as nil.