Description
A background watchdog thread monitors engine health and automatically recovers from stalls, deadlocks, or hung operations.
Implementation
- Watchdog checks every 5s:
- Compaction making progress? (check L0 count trend)
- WAL fsyncs completing? (check latency metric)
- Request queue draining? (check in-flight count)
- Last successful mutation timestamp
- On stall detection:
- LOG: dump current thread stacks (SIGQUIT on Unix)
- ACTION: abort hung operations
- ESCALATE: if no recovery after 3 attempts, enter READ_ONLY mode
- Expose watchdog state in
/health
Configuration
resilience:
watchdog:
interval_secs: 5
stall_threshold_secs: 30
max_escalations: 3
Labels
Description
A background watchdog thread monitors engine health and automatically recovers from stalls, deadlocks, or hung operations.
Implementation
/healthConfiguration
Labels