Skip to content

feat(server): drain in-flight solves on SIGTERM / Ctrl-C#4

Open
KeyCode17 wants to merge 1 commit into
mainfrom
feat/server-graceful-shutdown
Open

feat(server): drain in-flight solves on SIGTERM / Ctrl-C#4
KeyCode17 wants to merge 1 commit into
mainfrom
feat/server-graceful-shutdown

Conversation

@KeyCode17
Copy link
Copy Markdown
Owner

Summary

Wire axum::serve(...).with_graceful_shutdown(shutdown_signal()) so the server stops accepting new connections on the first of SIGINT or SIGTERM and lets the in-flight /v1/solve requests finish before the process exits.

  • SIGINT (Ctrl-C from a TTY) and SIGTERM (systemd / Kubernetes / kill) both trigger the drain.
  • SIGTERM branch is Unix-only; on non-Unix targets it becomes a pending future so SIGINT alone still wins.
  • Signal-handler install failures are logged as warnings, not panics — worst case degrades to the previous "hard kill on close" behaviour.

Why

Under the previous axum::serve(...).await form, sending SIGTERM (the systemd default KillSignal) aborted the runtime and dropped any in-flight /v1/solve requests mid-flight, which can leave the Chromium / Camoufox harvester pool in an inconsistent state (open browsers, zombie processes — exactly what MVP-AC-4 prohibits). A clean drain is also a prerequisite for the "weekly auto-soak" follow-up from ADR-0022 because the soak script can now restart the server between runs without ungraceful kills.

Test plan

  • cargo build -p px-server
  • cargo test -p px-server --lib — all 6 unit tests pass
  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features against the lefthook rule set
  • Lefthook pre-commit hooks pass locally (fmt, clippy, loc≤200, forbidden-patterns)
  • px-server/src/main.rs stays under the 200-LOC axum-best-practice rule (162 LOC)

Notes for the reviewer

  • The signal handler logs once per shutdown so operators can correlate the cause in journald.
  • This does not touch the harvester pools directly; their Drop impls already kill child Chromium / geckodriver processes once the dispatchers are released, and a clean drain makes those Drops actually run.

🤖 Generated with Claude Code

Wire `axum::serve(...).with_graceful_shutdown(shutdown_signal())`.
The signal handler resolves on the first of:

- SIGINT (Ctrl-C from a TTY)
- SIGTERM (sent by systemd, Kubernetes, `kill`) — Unix only; on
  other targets the SIGTERM branch is a pending future so SIGINT
  alone still wins.

After either signal we log a single "draining in-flight solves"
line; axum stops accepting new connections, lets the existing
`/v1/solve` requests finish, and the binary exits with a clean
"px-server stopped cleanly" log. Errors installing the handlers
log a warning but do not abort the process — the worst case
degrades to the previous "hard kill" behaviour.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant