feat(clients/python): experimental cooperative consumers#216
Conversation
REV Code Review Report
POTENTIAL ISSUES (4)Issues with moderate confidence (4-7/10). Review manually — may be false positives. MEDIUM
MEDIUM
MEDIUM
LOW
Summary
REV-assisted review (AI analysis by postgres-ai/rev) |
Expose the experimental cooperative-consumers SQL surface on ``PgqueClient``: ``subscribe_subconsumer``, ``unsubscribe_subconsumer``, ``receive_coop``, and ``touch_subconsumer``. Each maps 1:1 to its ``pgque.*`` SQL counterpart, wraps psycopg errors via the existing ``_wrap_sql_error`` path, and decodes rows with ``Message.from_row``-style field mapping (matching ``receive``). Tests cover the per-method contract -- registration idempotency, batch ack flow, two-subconsumer split delivery, ``batch_handling`` modes, and ``touch_subconsumer`` heartbeat.
Add ``subconsumer`` and ``dead_interval`` parameters to ``Consumer``. When ``subconsumer`` is set, the poll loop calls ``client.receive_coop(...)`` instead of ``receive(...)``; otherwise the loop is unchanged. ``dead_interval`` only applies in coop mode and raises ``ValueError`` if passed without a subconsumer name. No heartbeat is sent automatically; ``client.touch_subconsumer`` is the manual primitive.
Add an "Experimental: cooperative consumers" section to the Python client README with the recommended caveat block and a two-worker example, plus a parity-matrix row in clients/README.md flagging the feature as experimental and pointing readers to the per-client docs.
Two-thread cooperative consumers walkthrough that publishes events across multiple ticks so each subconsumer claims a different batch. Prints per-message handler lines and a per-worker count summary. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drives N cooperative subconsumers under one logical consumer through a tight receive_coop -> ack loop and reports CSV plus a PNG chart of total events/sec versus N. Workers use autocommit psycopg connections so the FOR UPDATE on the cooperative main row is released between batches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drop in a pointer to bench/coop_demo.py and embed bench/coop_scaling.png with a short interpretation paragraph that explains why the throughput curve plateaus at higher N. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Re-render with the latest measurement run so the PNG matches the CSV captured in the PR comment evidence. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The default coop scaling curve with a no-op handler is monotonically decreasing in N -- pure FOR UPDATE contention on the cooperative main row. That's an honest result but it's the wrong story for users arriving at the README, which exists to motivate why cooperative subconsumers are useful. Add a --handler-work-ms flag (default 1.0) that has each worker time.sleep between receive_coop and ack to simulate per-message handler latency. Python releases the GIL during time.sleep so threads genuinely parallelize, and throughput then scales roughly linearly with N until it plateaus at the FOR UPDATE saturation point. Use --handler-work-ms 0 to reproduce the contention-only curve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-runs coop_scaling.py with the new --handler-work-ms 1.0 default and commits the resulting CSV alongside the PNG. Curve rises from ~949 ev/s at N=1 to ~7,005 ev/s at N=16 (about 7.4x), with the slope softening between N=8 and N=16 as the FOR UPDATE on the cooperative main row starts to dominate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrite the interpretation paragraph under #### Scaling so it matches the new rise-then-soften shape produced with a 1 ms per-message handler. Adds a pointer to --handler-work-ms 0 for the contention-only curve and keeps the existing warning that adding normal consumers (separate register_consumer / subscribe rows) does not share work -- each one is its own fan-out cursor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier scaling benchmark used an artificial sleep handler that mostly measured GIL-released thread-sleep parallelism rather than PgQue cooperative-allocator scaling. A unified bench across all three drivers will land separately.
Remove the chart image, scaling interpretation, and reproduction note tied to the preliminary bench. The experimental caveat, API docs, and demo pointer remain.
778cfd7 to
dc41078
Compare
Summary
subscribe_subconsumer,unsubscribe_subconsumer,receive_coop, andtouch_subconsumermethods onPgqueClient. Each maps 1:1 to itspgque.*SQL counterpart and reuses the existing_wrap_sql_errorpath.Consumerwithsubconsumeranddead_intervalparameters. Whensubconsumeris set, the poll loop callsclient.receive_coop(...)instead ofclient.receive(...).dead_intervalonly applies in coop mode (raisesValueErrorotherwise). No automatic heartbeat —client.touch_subconsumeris the manual primitive.clients/README.mdflagging it experimental.The cooperative-consumers SQL surface itself is already on
main; this PR is purely the Python-client layer over it.Test plan
PGQUE_TEST_DSN=postgres://nik@localhost/pgque_coop_py pytest clients/python/tests— 67 passed (9 new intest_coop.py):subscribe_subconsumerreturns 1 then 0receive_coopreturns messages from a published batch andackfinishes themunsubscribe_subconsumer(..., batch_handling=0)raises on active batchunsubscribe_subconsumer(..., batch_handling=1)routes through retry/DLQtouch_subconsumerreturns 1 on a registered rowConsumer(subconsumer="worker-1")dispatches a handler and acksConsumerwithoutsubconsumeris unchanged (regression)Consumer(dead_interval=..., subconsumer=None)raisesValueErrorManual two-worker e2e (
/tmp/coop_e2e.py): N=40 events across 4 ticks, bothConsumerinstances configured withsubconsumer="worker-1"/worker-2. Output:Disjoint subsets, sums to N, no duplicate delivery — confirms cooperative allocation works end-to-end through the high-level
Consumer.🤖 Generated with Claude Code