Skip to content

Synchronize Health Endpoint Reads of Scheduler State #49

@henry0816191

Description

@henry0816191

Problem

The eval (Test 13, concurrency-boundaries cluster) found that the health endpoint reads scheduler state without synchronization. The scheduler runs in the main asyncio event loop (or a background thread via asyncio.to_thread), while the health endpoint serves HTTP requests that read the scheduler's last-run timestamp, cycle count, and status fields. These reads are not synchronized — the health endpoint can observe a partially-updated scheduler state (e.g., cycle count incremented but status not yet updated). This is part of the Silent-Staleness Cascade compound: the operator's primary observability signal cannot be trusted.

Acceptance Criteria

  • Shared state between scheduler and health endpoint protected by a synchronization primitive (asyncio.Lock, threading.Lock, or atomic snapshot pattern)
  • Health endpoint reads a consistent snapshot of scheduler state (all fields from the same point in time)
  • Scheduler updates state atomically — all fields updated under the same lock acquisition
  • Health endpoint response includes a last_updated timestamp indicating when the snapshot was taken
  • Tests verify: concurrent scheduler update and health endpoint read produce consistent state
  • Tests pass in CI (if applicable)
  • PR approved by at least 1 reviewer

Implementation Notes

The cleanest approach is an atomic snapshot pattern: the scheduler writes to a SchedulerSnapshot dataclass (frozen, immutable) and publishes it to an asyncio.Event-guarded shared reference. The health endpoint reads the latest snapshot — since the snapshot object is immutable and Python object assignment is atomic (due to the GIL), a simple self._latest_snapshot = new_snapshot suffices for thread safety. If asyncio.to_thread is used for blocking I/O (the eval notes this as an extension precedent), ensure the lock discipline covers the cross-thread boundary. The eval finding on _Entry positional tuple fragility is a related but separate concern.

References

  • Eval finding: Test 13 (health endpoint reads scheduler state without synchronization)
  • Related files: Health endpoint handler (likely in src/api.py or src/health.py), scheduler class (likely in src/scheduler.py), shared state module

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions