Skip to content

feat(slots): push-driven failure detector for unexpected unit deaths#8

Merged
thinmintdev merged 1 commit into
mainfrom
feat/slot-fail-push-2026-05-15
May 16, 2026
Merged

feat(slots): push-driven failure detector for unexpected unit deaths#8
thinmintdev merged 1 commit into
mainfrom
feat/slot-fail-push-2026-05-15

Conversation

@thinmintdev
Copy link
Copy Markdown
Contributor

Summary

Test plan

  • tests/slots/test_fail_watcher.py green (simulates unit dying, asserts ERROR + SSE within 5s; clean unload cancels watcher with no spurious push)
  • Full slots suite green (45 pass / 3 skipped — integration tests gated on installed systemd template)

🤖 Generated with Claude Code

When a slot's systemd unit died mid-life (OOM, segfault, image-pull
failure during warmup), the state machine only flipped to ERROR on the
next status() poll or after _await_ready's 180s grace — leaving SSE
watchers stuck on "ready" for minutes before any error frame.

Add a per-slot async watcher task that polls `systemctl is-active`
every 2s while the slot is in READY/SERVING/IDLE.  On inactive/failed,
it pushes a forced ERROR transition (broadcast over SSE) within ~1s
of detection.  Watcher is spawned/cancelled automatically from
_transition so live-state entry/exit drives lifecycle.

Tests in tests/slots/test_fail_watcher.py simulate a unit dying and
assert the ERROR transition + SSE frame land within 5s, plus a clean
unload cancels the watcher cleanly with no spurious ERROR push.

Shared fixtures (systemctl_stub, stub_await_ready, slot_root) moved
from test_manager.py to tests/slots/conftest.py so the new test file
reuses them.

Refs task #11.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@thinmintdev thinmintdev merged commit 00ae050 into main May 16, 2026
1 of 5 checks passed
@thinmintdev thinmintdev deleted the feat/slot-fail-push-2026-05-15 branch May 21, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant