Skip to content

feat(server): panic hook routes thread panics through the SIGTERM shutdown path #49

@petrpan26

Description

@petrpan26

A panic in the apply or WAL writer thread bypasses the orderly fsync sequence at crates/beava-server/src/server.rs:1051-1063. That sequence is only triggered by shutdown_signal() in shutdown.rs, which listens for SIGTERM/SIGINT — a thread panic doesn't fire either. Result: un-acked events in the active WAL buffer can be lost.

Fix: install a global panic hook in main.rs::main at startup that raises SIGTERM to self (via nix::sys::signal::raise(Signal::SIGTERM) or equivalent). The existing handler then runs the same drain + fsync code path used for operator SIGTERM. One shutdown code path for SIGTERM / SIGINT / panic.

Constraints (worth pinning so a contributor doesn't re-litigate):

  • Hook MUST NOT call std::process::abort(). Abort skips Drop impls and the explicit WAL writer join that runs the final fsync.
  • If the panicking thread IS the WAL writer, orderly fsync is impossible by definition. The hook should log loudly via tracing::error! and accept the loss; document this case.
  • Chain the existing default panic hook so panic messages still hit stderr / tracing.

Done when:

  • Hook installed in main.rs::main before any server bind.
  • Integration test panics an apply thread mid-push and asserts the shutdown path runs + acked events survive recovery on restart.
  • Pairs naturally with the fail-parallel WAL coverage issue's wal::write::pre_fsync=panic fail-point.

~100 LOC + tests. Cohort Track-1 sized.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: serverRust server / core / runtime-core crates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions