fix: redesign autolock timer with separated lifecycle layers#603
fix: redesign autolock timer with separated lifecycle layers#603raman325 wants to merge 11 commits intoFutureTense:mainfrom
Conversation
Builds the new persistent-autolock-timer architecture as a self-
contained subpackage. Phase 2 will wire it into the coordinator and
remove the old KeymasterTimer.
Three layers:
custom_components/keymaster/autolock/
├── store.py — TimerStore: persistence, owns asyncio.Lock,
│ atomic read/write/remove, parse-and-prune-on-corrupt
├── scheduler.py — ScheduledFire: single async_call_later wrapper
│ with awaitable cancel that joins in-flight execution
└── timer.py — AutolockTimer: orchestration with explicit state
machine (FRESH → ACTIVE → DONE), indirect kmlock
resolution via get_kmlock closure (no captured ref)
Public API: AutolockTimer, TimerEntry, TimerState, TimerStore,
TIMER_STORAGE_VERSION, TIMER_STORAGE_KEY.
Design decisions vs. the old KeymasterTimer:
- Three layers separated; each has its own narrow responsibility
- Explicit state machine with runtime-checked transitions
- get_kmlock closure resolves the LIVE kmlock at fire time —
reload-during-firing no longer mutates an orphaned kmlock
- Action failure preserves the store entry for replay on next restart
(the safety-critical lock is the priority; storage-cleanup failures
cannot suppress firing)
- Recovery and start go through the same single-store path
- No nullable-field-as-state pattern; no detach()/_detached/_inflight
flag soup that the old design accumulated
30 tests covering the persistence layer (9), scheduler (6), and
orchestration (15). Each named scenario in test_timer.py corresponds
to a race identified during the PR FutureTense#601 review iterations — addressed
by construction in the redesign.
Refs FutureTense#602.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 2 + 3 of FutureTense#602. Coordinator integration (Phase 2): - Replace `Store[dict[str, TimerStoreEntry]]` with `TimerStore` instance (which owns the asyncio.Lock internally) - `_setup_timer` constructs one `AutolockTimer` per kmlock with a get_kmlock closure resolving through `self.kmlocks`. The closure is the structural fix for the "action mutates orphaned kmlock" race — reload-during-firing transparently resolves the new kmlock. - New `_autolock_duration_seconds` helper centralizes the sun.is_up + autolock_min_day/night lookup (previously inside KeymasterTimer.start) - `_lock_unlocked` callsite computes duration and passes to start(...) - `_update_lock` collapses from ~70 lines to ~10: * `new.inherit_state_from(old)` carries autolock config + state + code slots in a single call (lives on KeymasterLock now) * Transfer the existing AutolockTimer instance from old → new (no detach/setup dance — same instance keeps running, action now resolves the new kmlock via get_kmlock) * No rollback path needed: `_unsubscribe_listeners` is the only destructive op before the kmlocks swap, and it's recoverable via `_update_listeners(new)` that runs after State-transfer methods (lock.py): - `KeymasterCodeSlotDayOfWeek.inherit_state_from(old)` — instance method - `KeymasterCodeSlot.inherit_state_from(old)` — instance method, only inherits DOW keys present on both sides (matches original behavior) - `KeymasterLock.inherit_state_from(old)` — instance method, delegates to slot/dow inherit_state_from for nested data (All three named identically — they do the same thing functionally.) Phase 3 cleanup: - Delete `KeymasterTimer` class (172 lines) from helpers.py - Delete `TIMER_STORAGE_*` constants and `TimerStoreEntry` from helpers.py (now in autolock/store.py) - Delete 24 `test_keymaster_timer_*` tests + `mock_store` fixture from test_helpers.py — coverage moved into tests/autolock/test_*.py - Update test_coordinator.py for new AutolockTimer constructor signature and the new `_autolock_duration_seconds` indirection Net: 653 tests passing (down 24, up 30 in autolock — net +6 tests with much cleaner coverage of the redesigned surface). mypy parity with main. Ruff format + check clean. Refs FutureTense#602. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #603 +/- ##
==========================================
+ Coverage 84.14% 90.11% +5.97%
==========================================
Files 10 32 +22
Lines 801 3673 +2872
==========================================
+ Hits 674 3310 +2636
- Misses 127 363 +236
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR redesigns Keymaster’s autolock timer subsystem by replacing the legacy KeymasterTimer (which mixed persistence, scheduling, and lock-binding lifecycles) with a layered custom_components/keymaster/autolock/ package. The goal is to structurally eliminate the persistence/cancellation/reload races behind #594/#601 and to simplify coordinator reload state transfer.
Changes:
- Introduces layered autolock components:
TimerStore(serialized persistence),ScheduledFire(awaitable cancellation), andAutolockTimer(orchestration with explicit state machine). - Updates coordinator integration to construct timers with a
get_kmlockresolver closure and moves duration computation into_autolock_duration_seconds(). - Centralizes reload state transfer via
inherit_state_from()onKeymasterLock/ code-slot dataclasses and deletes the oldKeymasterTimerimplementation and its tests (replaced with focusedtests/autolock/*coverage).
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
custom_components/keymaster/autolock/store.py |
Adds a serialized HA-Store persistence layer for timer entries. |
custom_components/keymaster/autolock/scheduler.py |
Adds a single-shot scheduler wrapper with awaitable cancellation. |
custom_components/keymaster/autolock/timer.py |
Adds the new AutolockTimer orchestration/state machine. |
custom_components/keymaster/autolock/__init__.py |
Defines the public autolock package surface and exports types/constants. |
custom_components/keymaster/coordinator.py |
Wires coordinator to the new autolock subsystem and simplifies reload handoff. |
custom_components/keymaster/lock.py |
Adds inherit_state_from() methods and updates autolock_timer typing. |
custom_components/keymaster/helpers.py |
Removes the legacy KeymasterTimer implementation and timer-store constants. |
tests/autolock/test_store.py |
Adds tests for the persistence layer. |
tests/autolock/test_scheduler.py |
Adds tests for cancellation and execution behavior of ScheduledFire. |
tests/autolock/test_timer.py |
Adds state-machine + regression/race coverage for AutolockTimer. |
tests/autolock/__init__.py |
Marks the autolock test package. |
tests/test_coordinator.py |
Updates coordinator tests for new timer wiring and duration passing. |
tests/test_helpers.py |
Removes KeymasterTimer tests and associated imports. |
.gitignore |
Ignores .serena/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1. switch.py runtime bug — `autolock_timer.start()` called with no args would TypeError after the new API requires duration. Add public `KeymasterCoordinator.autolock_duration_seconds(kmlock)` helper (was private `_autolock_duration_seconds`); switch.py uses it. 2. AutolockTimer.is_running stale after action failure — derived only from `_state == ACTIVE`, so a failed action that re-persisted (state stays ACTIVE) reported True even though no callback was scheduled. Switch's `not is_running` guard would block a fresh start(). Tie `is_running` to the ScheduledFire's `done` flag too. 3. AutolockTimer._fire when get_kmlock returns None — used to log and return, leaving the persisted entry to replay forever. Treat as terminal: clear entry, remove from store, transition to DONE. 4. ScheduledFire.cancel CancelledError handling — bare `except Exception` would let CancelledError (BaseException) escape and cancel the caller, violating cancel()'s no-re-raise contract. Catch CancelledError explicitly with debug log. 5. Misleading docstring in `_setup_timer` saying reload is "invisible" to the timer when in fact it IS visible — that's the whole point. Reword to "transparently picked up". New regression tests: - test_action_failure_preserves_entry_for_replay extended to assert is_running is False after a failed fire - test_action_with_missing_kmlock_clears_state_terminally (renamed) asserts terminal cleanup on missing kmlock - test_cancel_swallows_in_flight_cancelled_error verifies the BaseException/CancelledError handling 654 tests passing; ruff + mypy clean (1 pre-existing baseline error). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ilure If the recovery action raised, recover() left the timer in FRESH (since _fire's failure path doesn't transition state). Subsequent start() would raise its FRESH-state guard. Fix: set state to ACTIVE and assign _entry BEFORE calling _fire from recover(). _fire's success path still transitions to DONE; failure path leaves state=ACTIVE with _entry set and no _scheduled — symmetric with the in-process firing path. is_running reports False (no scheduled callback), and start() can re-arm cleanly. Strengthen test_recovery_action_failure_preserves_entry to assert the post-recover state and that a subsequent start() doesn't raise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Critical/high silent-failure fixes:
1. _timer_triggered: surface autolock-action failure with a persistent
notification ("Autolock failed for {lock_name}") so the user knows
the door didn't lock — previously the failure was logged + replayed
on next restart with zero UI signal.
2. KeymasterLock.inherit_state_from: log loudly when code_slots is
empty on either side. A transient empty new.code_slots would have
silently dropped all the user's PINs/schedules until they noticed
codes stopped working.
3. TimerEntry: validate end_time is timezone-aware and duration is
non-negative on construction (was only validated when parsing from
disk; direct callers like AutolockTimer.start got no guarantee).
Doc/comment polish:
- timer.py state-machine diagram: rewrite to show all reachable
transitions including ACTIVE→ACTIVE (re-arm/replay) and the kmlock-
missing terminal path; correct that recover() requires FRESH
- scheduler.py: drop stale "detach" reference (no detach in redesign)
- store.py: replace TimerEntry's misleading "inter-layer signatures"
comment with a clearer guarantee statement; add __init__ docstring
New regression test:
- test_start_after_in_process_action_failure_rearms — verifies a
failed in-process fire leaves the timer rearmable via start()
(mirrors the existing test for the recovery path)
655 tests passing, ruff + mypy clean (1 pre-existing baseline error).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three Copilot findings: 1. TimerStore._parse assumed `raw` was a dict; legacy/manually-edited stores could contain non-mapping values (list, string) and the `.get()` would AttributeError, breaking recovery. Add isinstance check; widen the parameter type to `object`. 2. TimerEntry's __post_init__ raises ValueError for negative durations or naive datetimes — but _parse didn't catch it. A bad on-disk duration would propagate and crash recovery. Catch ValueError around TimerEntry construction; treat as invalid + prune. 3. Switch test (test_switch_autolock_starts_timer_when_lock_unlocked) only asserted `start.assert_called_once()` without checking args. The test would pass even if we forgot to pass `duration=`. Strengthen to verify the duration kwarg is present and a positive int. New regression tests: - test_non_mapping_entry_is_removed_on_read — covers fix 1 - test_negative_duration_entry_is_removed_on_read — covers fix 2 657 tests passing; ruff + mypy clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The new persistent notification created when an autolock action fails (commit ae116ee) was never dismissed. A user who then locked the door manually or via the next autolock cycle would still see the stale "Autolock failed" banner. Add dismissal to _lock_locked alongside the existing door_open/ door_closed dismissals — every autolock-related notification now clears once the lock confirms locked. Update test_lock_locked_dismisses_retry_notifications and test_lock_locked_already_locked_clears_pending_retry to assert all three dismissals. 657 tests passing; ruff clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
`dt_util.as_utc()` assumes local/default-zone for naive datetimes — wrong for our data, since we always write UTC. Use `replace(tzinfo=dt_util.UTC)` to actually interpret as UTC. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…C test Round 2 internal-review fixes: 1. helpers.call_hass_service: add raise_on_missing kwarg. coordinator. _lock_lock now sets it so a removed/renamed lock entity surfaces ServiceNotFound up to _timer_triggered, which fires the existing "Autolock failed" persistent notification. Without this, the service-not-found case was the exact silent failure this PR was meant to eliminate — the timer would silently retire as if the action had succeeded. 2. coordinator._lock_locked: dismiss-by-suffix loop driven by a new AUTOLOCK_NOTIFICATION_SUFFIXES module constant, with each dismissal wrapped in try/except. A transient dismiss failure (e.g. HA not yet fully started) no longer aborts the rest of _lock_locked (timer cancel, async_set_updated_data). Single source of truth for the suffix list also prevents drift between create-sites and dismiss-sites. 3. tests/autolock/test_store.py::test_naive_end_time_interpreted_as_utc (renamed from _treated_as_utc): now compares the loaded value against a known UTC instant. Previously asserted only `tzinfo is not None`, which passed under both the buggy `as_utc` (assumes local zone) and the correct `replace(tzinfo=UTC)` — would not catch a regression on non-UTC test hosts. 4. tests/test_coordinator.py::test_timer_triggered_action_failure_ notifies_and_reraises (new): asserts the autolock_failed notification is sent AND the exception propagates. Previously reverting the try/except block would not fail any test. 5. autolock/store.py: rewrite stale comment in _parse's ValueError handler that referenced the removed `as_utc` call. 658 tests passing; ruff + mypy clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comments/docstrings in this PR accumulated PR-context phrasing across
the review iterations ("plagued the previous design", "matching the
original coordinator behavior", "no detach/setup dance required",
"now that the lock has succeeded", etc). That phrasing is useful to a
reviewer of this PR but noise to a future maintainer reading the code
without knowing the migration history.
Trim or rephrase to keep only the long-term WHY:
- timer.py module docstring: drop "plagued the previous design"
- _fire docstring: condense the three-outcome explanation; drop the
"ScheduledFire is now `done`" intra-method note (already in is_running)
- recover() expired-fire comment: shorter, no PR-history framing
- KeymasterCodeSlot.inherit_state_from: drop "matching the original
coordinator behavior"; describe behavior directly
- KeymasterLock.inherit_state_from: tighter docstring
- coordinator._setup_timer / _timer_triggered / _lock_lock /
_lock_locked: tighten the dismiss/raise-on-missing/transfer comments
- AUTOLOCK_NOTIFICATION_SUFFIXES: shorter intent comment
- TimerEntry: tighter class docstring; bare-line __post_init__ docstring
(D105 compliance)
No behavior change. 658 tests passing; ruff format + check clean;
mypy parity with main.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Replaces the
KeymasterTimerclass with a layered redesign that addresses the originalAttributeErrorfrom #594 plus the family of races discovered during PR #601's 17 review iterations. Closes #602; supersedes #601.Why a redesign vs. another patch
PR #601 grew through 17 rounds of Copilot review, fixing ~30 races and edge cases without converging. Root cause: the old
KeymasterTimerconflated three layers with different lifetimes (persistent store, scheduled callbacks, kmlock binding) into one object backed by nullable fields and ad-hoc flags. Each fix patched one interleaving and exposed a narrower one.This PR makes the races structurally impossible (or trivially diagnosable) rather than progressively less likely.
Proposed change
New
custom_components/keymaster/autolock/package — three layersstore.py—TimerStoreasyncio.Lock. Atomicread/write/remove. Defensive parsing of legacy/corrupt entries (non-Mapping, naive datetimes, negative durations) with prune-on-read. Naive on-disk datetimes interpreted as UTC (not local).store.py—TimerEntry__post_init__validates tz-awareend_timeand non-negativedurationso direct callers get the same guarantees as parsed-from-disk entries.scheduler.py—ScheduledFireasync_call_laterwrapper.await scheduled.cancel()joins any in-flight execution. CatchesCancelledErrorexplicitly so it doesn't escape and cancel the caller.timer.py—AutolockTimerTimerStateenum (FRESH → ACTIVE → DONE). Resolves kmlock indirectly viaget_kmlockclosure (no captured reference). Action-failure path preserves the store entry for replay on next restart;is_runningreflectsScheduledFire.doneso callers can re-arm cleanly. Missing kmlock at fire time is terminal (entry cleared, no replay loop).Coordinator integration
Store[dict[str, TimerStoreEntry]]withTimerStoreinstance_setup_timerconstructs oneAutolockTimerper kmlock withget_kmlock=lambda: self.kmlocks.get(entry_id)— eliminates the orphaned-kmlock race because the action resolves the live kmlock at fire timeautolock_duration_seconds(kmlock)helper centralizessun.is_up+autolock_min_*lookup_update_lockcollapses from ~70 lines to ~10:new.inherit_state_from(old)carries autolock config + state + code slotsAutolockTimerinstance transfers from old → new (no detach/setup/rollback dance — same instance keeps running, action transparently resolves the new kmlock)_timer_triggeredwraps_lock_lockintry/exceptthat fires a persistent notification ({lock}_autolock_failed) on action failure, then re-raises so the timer's preserve-entry branch runs for replay-on-restart_lock_lockcallscall_hass_service(..., raise_on_missing=True)so a removed/renamed lock entity surfacesServiceNotFoundup the stack — previously swallowed silently_lock_lockeddismisses ALL autolock-related notifications via a single suffix-loop driven byAUTOLOCK_NOTIFICATION_SUFFIXES(door_open, door_closed, failed); each dismissal wrapped intry/exceptso a transient dismiss failure can't abort the rest of the handlerState-transfer methods on
KeymasterLock/KeymasterCodeSlot/KeymasterCodeSlotDayOfWeekinherit_state_from(old)— instance methods that mutate selfcode_slots) — silent loss would have dropped the user's PINs/schedulesCleanup
KeymasterTimerclass (~172 lines) fromhelpers.pytest_keymaster_timer_*tests fromtest_helpers.py— coverage migrated intotests/autolock/test_*.pywith much tighter scope per testraise_on_missingkwarg tocall_hass_service(default False — existing non-safety-critical callers unaffected)Status of the deferred
_lock_locked()race from #601The original concern:
_lock_lockedsetlock_state = LOCKEDand awaited notification dismissals BEFORE callingcancel(). A reload during those awaits left a stale store entry that fired a phantom autolock later.Resolved by construction in this PR, no code change required. Trace:
_lock_locked(old)starts, mutatesold.lock_state, awaits dismissals_update_lockruns concurrently: transfers the timer (new.autolock_timer = old.autolock_timer; old.autolock_timer = None), swapsself.kmlocks[id] = new_lock_lockedresumes, hitsif kmlock.autolock_timer:—old.autolock_timeris now None → skips cancel (existing guard, no crash)new) fires later_fireresolves the live kmlock viaget_kmlock()→newkmlockWorst case: one redundant idempotent service call. No phantom user-visible autolock, no crash, no lost state — because (a) the timer instance survives the swap via ownership transfer, and (b) the action resolves the live kmlock via closure rather than holding a captured reference.
Validation
tests/autolock/cover the layered classes; each named race-scenario test maps to one of the bugs identified in PR fix: autolock timer persistence, race conditions, and reload safety #601.AttributeErrorfrom ISSUE: Auto Lock Timer "Unknown" #594 is structurally impossible: the persist+cleanup paths are fully serialized throughTimerStore's lock, andScheduledFire.cancel()awaits in-flight callbacks before returning.Type of change
Additional information
🤖 Generated with Claude Code