Description
A single panic in a compaction thread or request handler currently kills the entire server. ApexStore should recover from panics gracefully.
Implementation
- Wrap all thread entry points with
std::panic::catch_unwind
- On panic:
- Log full panic payload and backtrace
- Increment panic counter metric
- Restart the thread after a delay
- If panic rate exceeds threshold (5/min), enter DEGRADED mode
- Expose panic count in
/metrics
- Add
/admin/panic-info endpoint returning recent panic details
Code pattern
let result = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
// compaction logic
}));
if let Err(panic_payload) = result {
error!("Compaction thread panicked: {:?}", panic_payload);
metrics.inc_panic_count();
// restart thread after delay
}
Labels
Description
A single panic in a compaction thread or request handler currently kills the entire server. ApexStore should recover from panics gracefully.
Implementation
std::panic::catch_unwind/metrics/admin/panic-infoendpoint returning recent panic detailsCode pattern
Labels