A panic in the apply or WAL writer thread bypasses the orderly fsync sequence at crates/beava-server/src/server.rs:1051-1063. That sequence is only triggered by shutdown_signal() in shutdown.rs, which listens for SIGTERM/SIGINT — a thread panic doesn't fire either. Result: un-acked events in the active WAL buffer can be lost.
Fix: install a global panic hook in main.rs::main at startup that raises SIGTERM to self (via nix::sys::signal::raise(Signal::SIGTERM) or equivalent). The existing handler then runs the same drain + fsync code path used for operator SIGTERM. One shutdown code path for SIGTERM / SIGINT / panic.
Constraints (worth pinning so a contributor doesn't re-litigate):
- Hook MUST NOT call
std::process::abort(). Abort skips Drop impls and the explicit WAL writer join that runs the final fsync.
- If the panicking thread IS the WAL writer, orderly fsync is impossible by definition. The hook should log loudly via
tracing::error! and accept the loss; document this case.
- Chain the existing default panic hook so panic messages still hit stderr / tracing.
Done when:
- Hook installed in
main.rs::main before any server bind.
- Integration test panics an apply thread mid-push and asserts the shutdown path runs + acked events survive recovery on restart.
- Pairs naturally with the fail-parallel WAL coverage issue's
wal::write::pre_fsync=panic fail-point.
~100 LOC + tests. Cohort Track-1 sized.
A panic in the apply or WAL writer thread bypasses the orderly fsync sequence at
crates/beava-server/src/server.rs:1051-1063. That sequence is only triggered byshutdown_signal()inshutdown.rs, which listens for SIGTERM/SIGINT — a thread panic doesn't fire either. Result: un-acked events in the active WAL buffer can be lost.Fix: install a global panic hook in
main.rs::mainat startup that raises SIGTERM to self (vianix::sys::signal::raise(Signal::SIGTERM)or equivalent). The existing handler then runs the same drain + fsync code path used for operator SIGTERM. One shutdown code path for SIGTERM / SIGINT / panic.Constraints (worth pinning so a contributor doesn't re-litigate):
std::process::abort(). Abort skips Drop impls and the explicit WAL writer join that runs the final fsync.tracing::error!and accept the loss; document this case.Done when:
main.rs::mainbefore any server bind.wal::write::pre_fsync=panicfail-point.~100 LOC + tests. Cohort Track-1 sized.