fix: honor request waiter deadlines before admission#681
Conversation
Greptile SummaryThis PR fixes a race where an async waiter could receive an admission lease after its
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/models/request_admission/controller.py | Adds _expire_waiters_locked to purge and wake deadline-exceeded waiters before admission or queue-ahead checks, and adds an early deadline guard at the top of both acquire loops to reject woken-but-expired waiters before they can be re-enqueued. |
| packages/data-designer-engine/src/data_designer/engine/models/request_admission/queue.py | Adds waiters() method returning a snapshot tuple of all queued waiters; the snapshot is necessary to avoid mutating the backing dict while _expire_waiters_locked iterates it. |
| packages/data-designer-engine/tests/engine/models/request_admission/test_controller.py | Adds a regression test that intentionally stalls the event loop with time.sleep after a cross-thread release to verify an expired async waiter raises RequestAdmissionError rather than receiving a lease; _wait_seconds_locked is patched to 10 s to force the wakeup to come only from _expire_waiters_locked, proving the specific race path. |
Sequence Diagram
sequenceDiagram
participant C as Caller (async)
participant L as Event Loop
participant AC as AdmissionController
participant RT as Release Thread
C->>AC: "acquire_async(item, timeout=10ms)"
AC->>AC: "enqueue waiter (deadline = now+10ms)"
AC->>AC: _admit_waiters_locked → no capacity
AC->>L: "await asyncio.wait_for(wakeup, timeout=10s)"
Note over C,RT: Event loop blocks on time.sleep(60ms)
RT->>AC: release(lease)
AC->>AC: _admit_waiters_locked()
AC->>AC: _expire_waiters_locked(now≥deadline)
AC->>AC: _remove_waiter_locked(waiter)
AC->>L: call_soon_threadsafe(wakeup.set)
Note over C,RT: Event loop resumes after sleep
L->>AC: wakeup fires → acquire loop iterates
AC->>AC: Check assigned_lease → None
AC->>AC: Check deadline → now ≥ deadline
AC->>AC: _remove_waiter_locked (no-op, already removed)
AC-->>C: raise RequestAdmissionError(queue_timeout)
Reviews (1): Last reviewed commit: "fix request waiter deadline admission" | Re-trigger Greptile
Summary
Why
A queued async waiter could receive a request lease after its
deadline_monotonichad already passed if the event loop was stalled and capacity was released from another thread. Queue wait timeouts should be terminal before admission assigns a lease.Validation
.venv/bin/ruff check --fix ..venv/bin/ruff format ..venv/bin/pytest packages/data-designer-engine/tests/engine/models/request_admission/test_controller.py.venv/bin/pytest packages/data-designer-engine/tests/engine/dataset_builders/scheduling/test_resolver.py packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py packages/data-designer-engine/tests/engine/models/request_admission/test_controller.py packages/data-designer-engine/tests/engine/models/clients/test_model_request_executor.py