fix: restrict cross-consumer primitives (#106)#198
Conversation
Two pgque_reader consumers, A and B. Assert that B cannot reach A's active batch / cursor via the low-level PgQ-compatible primitives: A) pgque.register_consumer_at() -- repositions A's cursor B) pgque.event_retry() -- pushes A's events to retry C) pgque.get_batch_events() -- leaks A's active payloads Test is RED on main: register_consumer_at lets B reposition cons_a's cursor under pgque_reader privileges. Lock the trusted-operator contract so this regresses loudly. Refs #106.
The PgQ-compatible primitives below operate by (queue, consumer) name or by raw batch id and do NOT validate caller context. Any role with the grant can reach into another consumer's active batch / cursor: - register_consumer_at(queue, victim, tick): repositions any consumer's cursor; clears sub_batch, rewrites sub_last_tick. - event_retry(batch_id, ev_id, n): pushes any consumer's events into the retry queue. - get_batch_events(batch_id): leaks any consumer's active payloads. Mitigation for v0.2.0: revoke from pgque_reader, grant only to pgque_admin. The high-level API (pgque.receive / ack / nack) is SECURITY DEFINER and reaches the primitives via the function owner, so application code that uses the modern API is unaffected. Apps that need the PgQ-compatible primitive layer must run as pgque_admin. Per-queue / per-consumer ACLs would close the consumer-vs-consumer boundary in a finer-grained way; that is a multi-week design exercise tracked separately. The smallest-blast-radius fix for v0.2.0 is the trusted-operator gate plus an explicit comment on each function. Source roles.sql, assembled pgque.sql, pg_tle pgque-tle.sql, and docs/reference.md are kept consistent. Closes #106.
|
Closing this PR per maintainer direction. This PR was created by an orchestration agent that picked #106 as a top-5 v0.2.0 must-fix without checking the merged-PR history. #106 was already addressed earlier in the session: producer-side via PR #163 (PgQ producer/consumer role split) and consumer-vs-consumer via PR #170 (3-arg This PR adds a third layer (revoke If the residual primitive-level exposure is worth re-opening for v0.2.0 or v0.2.1, the maintainer can re-open this PR and the corresponding issue. Closing without merge. No further automated work; session unsubscribed. Generated by Claude Code |
Testing evidenceAll commands run on local Postgres 16 (Ubuntu 24.04, PG 16.13) against Red / green for the regression testThe red phase reproduces finding A from the issue body Full SQL suite + acceptance + idempotencyDriver suitesNo driver suite needed adjustment. The Repro of finding A on origin/main (without the test fixture)For completeness, the bare-bones repro from the issue body, run as a Same call returns https://claude.ai/code/session_01TwKyarGpkGKN8LY1CV8dsG Generated by Claude Code |
Summary
Closes #106. Fix the cross-consumer interference paths in the low-level
PgQ-compatible primitive surface.
The three primitives flagged in #106 operate by
(queue, consumer)namepair or by raw batch id and do not validate caller context. Before
this PR, any role with the grant could reach into another consumer's
active batch / cursor. Each finding is an independent attack vector; in
v0.2.0 they all collapse onto whichever role is granted execute on these
functions.
Findings (verbatim from #106) and the surface that exposes each
register_consumer_at(queue, victim, tick)repositions anyconsumer's cursor. Clears
sub_batch, clearssub_next_tick, rewritessub_last_tick. Drops the victim's active batch state and can causemessage loss or duplicate delivery.
event_retry(batch_id, ev_id, n)pushes any consumer's eventsinto the retry queue, causing duplicate processing after the victim
acks and retry-count pollution. Both
(timestamptz)and(integer)overloads are separate privilege rows.
get_batch_events(batch_id)leaks any consumer's activebatch payloads (full
ev_data).Design choice
The issue lists two end-states. Option 1 (treat the primitives as
trusted-operator surface and document loudly) is shipped here because
option 2 (per-queue/per-consumer ACLs) is a multi-week design
exercise that touches the "sacred" PgQ engine and changes function
signatures. Option 1 closes the immediate exposure with a one-line
grant change per primitive.
Mitigation
Revoke from
pgque_reader, grant only topgque_adminfor:pgque.register_consumer_at(text, text, bigint)pgque.get_batch_events(bigint)pgque.event_retry(bigint, bigint, timestamptz)pgque.event_retry(bigint, bigint, integer)The high-level API (
pgque.receive,pgque.ack,pgque.nack,pgque.subscribe) isSECURITY DEFINERand reaches the primitives viathe function owner, so application code that uses the modern API is
unaffected. Apps that genuinely need the PgQ-compatible primitive
layer must run as
pgque_admin(or as a role grantedpgque_admin).A
comment on functionis added to each affected primitive so anyoneinspecting via
\df+sees the trust contract without having to readdocs/reference.md.docs/reference.mditself gets a callout box atthe top of the consumer-primitives section and per-function "Trusted
operator only" notes. The role grants table is updated to reflect the
new shape.
The source
sql/pgque-additions/roles.sql, the assembledsql/pgque.sql, and the pg_tle wrappersql/pgque-tle.sqlare allkept consistent.
Why not also fix
next_batch/finish_batchThese two primitives are also batch-id / consumer-name-keyed and have
the same shape, but they are reachable through the
pgque.receive/pgque.ackmodern API path that application code already uses. Pushingthem to admin-only would force every consumer to either run as
pgque_adminor rewrite viareceive+ack(the recommended pathanyway). Driver tests exercise
receive+ack, notnext_batch/finish_batchdirectly, so we keep them onpgque_readerfor v0.2.0.The narrower "writer cannot call them" boundary is already locked by
tests/test_security_producer_isolation.sql. A follow-up can revisitthis once finer-grained ACLs land.
Test plan
tests/test_security_cross_consumer.sql: red onorigin/main,green after the fix. Two
pgque_readerroles A and B; B cannotregister_consumer_atcons_a,event_retrycons_a's batch (bothoverloads), or
get_batch_eventscons_a's batch.psql -f tests/run_all.sql: ALL TESTS PASSED on PG16.psql -f tests/acceptance/run_acceptance.sql: ALL ACCEPTANCETESTS PASSED on PG16.
tests/test_install_idempotency.sql: green.Evidence is posted as a PR comment.
https://claude.ai/code/session_01TwKyarGpkGKN8LY1CV8dsG
Generated by Claude Code