Problem
The eval (Test 13, concurrency-boundaries cluster) found that the health endpoint reads scheduler state without synchronization. The scheduler runs in the main asyncio event loop (or a background thread via asyncio.to_thread), while the health endpoint serves HTTP requests that read the scheduler's last-run timestamp, cycle count, and status fields. These reads are not synchronized — the health endpoint can observe a partially-updated scheduler state (e.g., cycle count incremented but status not yet updated). This is part of the Silent-Staleness Cascade compound: the operator's primary observability signal cannot be trusted.
Acceptance Criteria
Implementation Notes
The cleanest approach is an atomic snapshot pattern: the scheduler writes to a SchedulerSnapshot dataclass (frozen, immutable) and publishes it to an asyncio.Event-guarded shared reference. The health endpoint reads the latest snapshot — since the snapshot object is immutable and Python object assignment is atomic (due to the GIL), a simple self._latest_snapshot = new_snapshot suffices for thread safety. If asyncio.to_thread is used for blocking I/O (the eval notes this as an extension precedent), ensure the lock discipline covers the cross-thread boundary. The eval finding on _Entry positional tuple fragility is a related but separate concern.
References
- Eval finding: Test 13 (health endpoint reads scheduler state without synchronization)
- Related files: Health endpoint handler (likely in
src/api.py or src/health.py), scheduler class (likely in src/scheduler.py), shared state module
Problem
The eval (Test 13, concurrency-boundaries cluster) found that the health endpoint reads scheduler state without synchronization. The scheduler runs in the main asyncio event loop (or a background thread via
asyncio.to_thread), while the health endpoint serves HTTP requests that read the scheduler's last-run timestamp, cycle count, and status fields. These reads are not synchronized — the health endpoint can observe a partially-updated scheduler state (e.g., cycle count incremented but status not yet updated). This is part of the Silent-Staleness Cascade compound: the operator's primary observability signal cannot be trusted.Acceptance Criteria
last_updatedtimestamp indicating when the snapshot was takenImplementation Notes
The cleanest approach is an atomic snapshot pattern: the scheduler writes to a
SchedulerSnapshotdataclass (frozen, immutable) and publishes it to anasyncio.Event-guarded shared reference. The health endpoint reads the latest snapshot — since the snapshot object is immutable and Python object assignment is atomic (due to the GIL), a simpleself._latest_snapshot = new_snapshotsuffices for thread safety. Ifasyncio.to_threadis used for blocking I/O (the eval notes this as an extension precedent), ensure the lock discipline covers the cross-thread boundary. The eval finding on_Entrypositional tuple fragility is a related but separate concern.References
src/api.pyorsrc/health.py), scheduler class (likely insrc/scheduler.py), shared state module