hotfix(events): long-poll releases pool slot via commit between iterations (P0)#100
Merged
Merged
Conversation
…tions (P0) P0 hotfix port: a downstream consumer (cue.dock.svc) experienced pool exhaustion in prod due to `_run_long_poll_wait` holding an idle-in- transaction connection for the full ~30s wait window. At ~15 concurrent long-pollers (SQLAlchemy default pool_size=5 + max_overflow=10 = 15) the entire app-side pool gets saturated; new requests time out waiting for a free connection. Root cause: the while loop runs `await asyncio.sleep` + `await pull_events(db, ...)` with NO commit between iterations. Each pull_events opens an implicit txn that the session holds for the full wait window. Fix (+2 LOC): `await db.commit()` at function entry (releases caller's initial-pull implicit txn before first sleep window) + after each empty iteration (releases per-iteration txn before next sleep window). Empirically verified (2026-05-22) that AsyncSession.commit() returns the pool slot AND ends the Postgres txn; session re-acquires from pool transparently on next operation. Tests: - 2 new regression guards added to tests/test_events_long_poll.py pinning the commit invariant (commit_count assertions) - 13/13 long-poll tests pass locally Cross-port from cueapi/cueapi#925 (mergeCommit 359961b). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parity checkThis PR modifies files tracked in
Please confirm one of the following in a reply or PR description update:
This is a soft check — it does not block merge. The goal is visibility, not friction. See HOSTED_ONLY.md for the open-core policy. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚨 P0 HOTFIX — long-poll holds DB txn → pool exhaustion
Cross-port from cueapi/cueapi#925 (private, mergeCommit
359961b).Incident: a downstream consumer (
cue.dock.svc) experienced pool exhaustion in prod due to_run_long_poll_waitholding anidle in transactionconnection for the full ~30s wait window.Mechanism
At ~15 concurrent long-pollers (SQLAlchemy default
pool_size=5 + max_overflow=10 = 15):Root cause:
_run_long_poll_waitwhile loop runsawait asyncio.sleep+await pull_events(db, ...)with no commit between iterations. Eachpull_eventsopens an implicit txn that the session holds for the full ~30s wait.Fix (+2 LOC)
await db.commit()at function entry — releases caller's initial-pull implicit txn before first sleep windowawait db.commit()after each emptypull_events()iteration — releases per-iteration implicit txn before next sleep windowEmpirical verification (load-bearing)
The fix relies on
AsyncSession.commit()returning the connection to the pool. Verified empirically against real Postgres + asyncpg + SQLAlchemy 2.0.35 viaengine.pool.checkedin()/checkedout()state inspection:Session remains usable across commits; both Postgres
idle in transactionAND SQLAlchemy pool slot exhaustion are addressed.Tests
test_helper_commits_at_entry_and_between_empty_iterationstest_helper_commits_at_entry_even_when_event_arrives_immediatelyTag plan
Suggesting
messaging-v1.1.5-hotfixtag on merge. Single-purpose hotfix tag for downstream consumers to pin-bump on.Parity reference
Same diff shape as private cueapi/cueapi#925 (mergeCommit
359961b). Per CLAUDE.md OSS-vs-private parity policy —app/routers/events.py:_run_long_poll_waitshares verbatim across both repos.🤖 Generated with Claude Code