Skip to content

feat: crash-safe state persistence via periodic flush#303

Merged
BYK merged 1 commit into
mainfrom
byk/crash-safe-state-persistence
May 14, 2026
Merged

feat: crash-safe state persistence via periodic flush#303
BYK merged 1 commit into
mainfrom
byk/crash-safe-state-persistence

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 14, 2026

Summary

Closes #299

Persist remaining volatile session state across process restarts without relying on shutdown handlers (which SIGKILL/crashes bypass).

  • Immediate persistence for session identity (fingerprint, header promotion) — rare mutations
  • 30s periodic flush (dirty-flag gated) for gradient calibration, cache warming state, and cost tracker snapshot — max 30s of data loss

Changes

  • DB migration v24: 12 new columns on session_state — session identity (3), cache warming (2), gradient calibration (7)
  • gradient.ts: New saveGradientState() + atomic restore of gradient fields from DB on first getSessionState() access (uses lastTurnAt > 0 as "ever flushed" proxy)
  • pipeline.ts: Restore session identity + cache warming from DB on session creation; persist fingerprint/headers immediately on mutation; dirty-flag marking on each turn
  • idle.ts: 30s tick flushes gradient + cache warming + cost state for dirty sessions only (skips unchanged sessions)
  • types.ts: _dirty flag on SessionState for periodic flush gating
  • Tests: Schema version bump to 24, 5 new round-trip tests for v24 fields

@BYK BYK force-pushed the byk/crash-safe-state-persistence branch from 4d811d2 to 9937c02 Compare May 14, 2026 10:11
Persist remaining volatile session state across process restarts
without relying on shutdown handlers (which SIGKILL/crashes bypass).

Three-tier persistence strategy:
- Immediate: session identity (fingerprint, header promotion) — rare
- 30s periodic (dirty-flag gated): gradient calibration, cache warming
  state, and cost tracker snapshot via idle tick

DB migration v24 adds 12 columns to session_state for identity (3),
cache warming (2), and gradient calibration (7) fields.

Review fixes:
- Atomic gradient restore (lastTurnAt > 0 proxy, not per-field sentinels)
- Dirty flag on SessionState — only flush changed sessions on 30s tick
- Moved cost persistence from per-turn to 30s tick (reduces write amplification)
- Validate TTL from DB, log.warn on corrupt warmup JSON
@BYK BYK force-pushed the byk/crash-safe-state-persistence branch from 9937c02 to b77e89a Compare May 14, 2026 10:21
@BYK BYK merged commit 21cff91 into main May 14, 2026
7 checks passed
@BYK BYK deleted the byk/crash-safe-state-persistence branch May 14, 2026 10:25
BYK added a commit that referenced this pull request May 14, 2026
## Summary

Follow-up to #303. Fixes a pre-existing bug where `headerSessionIndex`
was empty after process restart, causing Tier 1 session identification
to fail and orphan persisted session state.

- Load header→session mappings from `session_state` table in
`initIfNeeded()` (before the first `identifySession()` call)
- Clear `headerSessionIndex` in `resetPipelineState()` for test
isolation
- New `loadHeaderSessionIndex()` DB function + tests
@craft-deployer craft-deployer Bot mentioned this pull request May 14, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Persist remaining volatile session state across gateway restarts

1 participant