Skip to content

fix: restrict cross-consumer primitives (#106)#198

Closed
NikolayS wants to merge 2 commits intomainfrom
fix/cross-consumer-isolation-106
Closed

fix: restrict cross-consumer primitives (#106)#198
NikolayS wants to merge 2 commits intomainfrom
fix/cross-consumer-isolation-106

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

@NikolayS NikolayS commented May 4, 2026

Summary

Closes #106. Fix the cross-consumer interference paths in the low-level
PgQ-compatible primitive surface.

The three primitives flagged in #106 operate by (queue, consumer) name
pair or by raw batch id and do not validate caller context. Before
this PR, any role with the grant could reach into another consumer's
active batch / cursor. Each finding is an independent attack vector; in
v0.2.0 they all collapse onto whichever role is granted execute on these
functions.

Findings (verbatim from #106) and the surface that exposes each

  • A — register_consumer_at(queue, victim, tick) repositions any
    consumer's cursor. Clears sub_batch, clears sub_next_tick, rewrites
    sub_last_tick. Drops the victim's active batch state and can cause
    message loss or duplicate delivery.
  • B — event_retry(batch_id, ev_id, n) pushes any consumer's events
    into the retry queue, causing duplicate processing after the victim
    acks and retry-count pollution. Both (timestamptz) and (integer)
    overloads are separate privilege rows.
  • C — get_batch_events(batch_id) leaks any consumer's active
    batch payloads (full ev_data).

Design choice

The issue lists two end-states. Option 1 (treat the primitives as
trusted-operator surface and document loudly) is shipped here because
option 2 (per-queue/per-consumer ACLs) is a multi-week design
exercise that touches the "sacred" PgQ engine and changes function
signatures. Option 1 closes the immediate exposure with a one-line
grant change per primitive.

Mitigation

Revoke from pgque_reader, grant only to pgque_admin for:

  • pgque.register_consumer_at(text, text, bigint)
  • pgque.get_batch_events(bigint)
  • pgque.event_retry(bigint, bigint, timestamptz)
  • pgque.event_retry(bigint, bigint, integer)

The high-level API (pgque.receive, pgque.ack, pgque.nack,
pgque.subscribe) is SECURITY DEFINER and reaches the primitives via
the function owner, so application code that uses the modern API is
unaffected. Apps that genuinely need the PgQ-compatible primitive
layer must run as pgque_admin (or as a role granted pgque_admin).

A comment on function is added to each affected primitive so anyone
inspecting via \df+ sees the trust contract without having to read
docs/reference.md. docs/reference.md itself gets a callout box at
the top of the consumer-primitives section and per-function "Trusted
operator only" notes. The role grants table is updated to reflect the
new shape.

The source sql/pgque-additions/roles.sql, the assembled
sql/pgque.sql, and the pg_tle wrapper sql/pgque-tle.sql are all
kept consistent.

Why not also fix next_batch / finish_batch

These two primitives are also batch-id / consumer-name-keyed and have
the same shape, but they are reachable through the pgque.receive /
pgque.ack modern API path that application code already uses. Pushing
them to admin-only would force every consumer to either run as
pgque_admin or rewrite via receive + ack (the recommended path
anyway). Driver tests exercise receive + ack, not next_batch /
finish_batch directly, so we keep them on pgque_reader for v0.2.0.
The narrower "writer cannot call them" boundary is already locked by
tests/test_security_producer_isolation.sql. A follow-up can revisit
this once finer-grained ACLs land.

Test plan

  • tests/test_security_cross_consumer.sql: red on origin/main,
    green after the fix. Two pgque_reader roles A and B; B cannot
    register_consumer_at cons_a, event_retry cons_a's batch (both
    overloads), or get_batch_events cons_a's batch.
  • psql -f tests/run_all.sql: ALL TESTS PASSED on PG16.
  • psql -f tests/acceptance/run_acceptance.sql: ALL ACCEPTANCE
    TESTS PASSED on PG16.
  • Install + reinstall + tests/test_install_idempotency.sql: green.
  • Python driver suite (54 tests): all passed.
  • Go driver suite: all passed.
  • TypeScript driver suite (37 tests): all passed.

Evidence is posted as a PR comment.

https://claude.ai/code/session_01TwKyarGpkGKN8LY1CV8dsG


Generated by Claude Code

claude added 2 commits May 4, 2026 07:22
Two pgque_reader consumers, A and B. Assert that B cannot reach
A's active batch / cursor via the low-level PgQ-compatible
primitives:

  A) pgque.register_consumer_at()  -- repositions A's cursor
  B) pgque.event_retry()           -- pushes A's events to retry
  C) pgque.get_batch_events()      -- leaks A's active payloads

Test is RED on main: register_consumer_at lets B reposition
cons_a's cursor under pgque_reader privileges. Lock the
trusted-operator contract so this regresses loudly.

Refs #106.
The PgQ-compatible primitives below operate by (queue, consumer) name
or by raw batch id and do NOT validate caller context. Any role with
the grant can reach into another consumer's active batch / cursor:

- register_consumer_at(queue, victim, tick): repositions any
  consumer's cursor; clears sub_batch, rewrites sub_last_tick.
- event_retry(batch_id, ev_id, n): pushes any consumer's events into
  the retry queue.
- get_batch_events(batch_id): leaks any consumer's active payloads.

Mitigation for v0.2.0: revoke from pgque_reader, grant only to
pgque_admin. The high-level API (pgque.receive / ack / nack) is
SECURITY DEFINER and reaches the primitives via the function owner,
so application code that uses the modern API is unaffected. Apps
that need the PgQ-compatible primitive layer must run as pgque_admin.

Per-queue / per-consumer ACLs would close the consumer-vs-consumer
boundary in a finer-grained way; that is a multi-week design
exercise tracked separately. The smallest-blast-radius fix for
v0.2.0 is the trusted-operator gate plus an explicit comment on
each function.

Source roles.sql, assembled pgque.sql, pg_tle pgque-tle.sql, and
docs/reference.md are kept consistent.

Closes #106.
Copy link
Copy Markdown
Owner Author

NikolayS commented May 4, 2026

Closing this PR per maintainer direction.

This PR was created by an orchestration agent that picked #106 as a top-5 v0.2.0 must-fix without checking the merged-PR history. #106 was already addressed earlier in the session: producer-side via PR #163 (PgQ producer/consumer role split) and consumer-vs-consumer via PR #170 (3-arg ack(queue, consumer, batch_id) and 6-arg nack(...) with server-side ownership check). #106 was closed as completed.

This PR adds a third layer (revoke register_consumer_at / event_retry / get_batch_events from pgque_reader, admin-only) — that's option 1 from the original issue and addresses residual exposure on the low-level PgQ-compatible primitive surface. It is not a duplicate diff but it is scope expansion the maintainer didn't pick for v0.2.0.

If the residual primitive-level exposure is worth re-opening for v0.2.0 or v0.2.1, the maintainer can re-open this PR and the corresponding issue.

Closing without merge. No further automated work; session unsubscribed.


Generated by Claude Code

@NikolayS NikolayS closed this May 4, 2026
Copy link
Copy Markdown
Owner Author

NikolayS commented May 4, 2026

Testing evidence

All commands run on local Postgres 16 (Ubuntu 24.04, PG 16.13) against
a fresh pgque_test database.

Red / green for the regression test

=== Phase 1 (RED): test_security_cross_consumer.sql against origin/main install ===
[install: origin/main sql/pgque.sql at e619bad]
psql:tests/test_security_cross_consumer.sql:124: ERROR:  FAIL #106-A: unexpected error: P0001 / FAIL #106-A: cc_consumer_b repositioned cons_a cursor
psql exit code: 3 (3 = error, expected RED)

=== Phase 2 (GREEN): same test against branch HEAD ===
[install: branch HEAD sql/pgque.sql]
psql:tests/test_security_cross_consumer.sql:124: NOTICE:  PASS #106-A (register_consumer_at): denied to pgque_reader
psql:tests/test_security_cross_consumer.sql:153: NOTICE:  PASS #106-B (event_retry int): denied to pgque_reader
psql:tests/test_security_cross_consumer.sql:153: NOTICE:  PASS #106-B (event_retry timestamptz): denied to pgque_reader
psql:tests/test_security_cross_consumer.sql:171: NOTICE:  PASS #106-C (get_batch_events): denied to pgque_reader
PASS: cross-consumer primitive isolation (#106)
psql exit code: 0 (0 = success, expected GREEN)

The red phase reproduces finding A from the issue body
(register_consumer_at lets a second pgque_reader reposition the
victim's cursor). Findings B and C are blocked indirectly on
origin/main only because event_retry and get_batch_events cascade
into batch_event_sql, which has no pgque_reader grant — that
defense is fragile and disappears the moment anyone broadens that
internal grant. The green commit closes the trust gap directly at the
public surface.

Full SQL suite + acceptance + idempotency

=== tests/run_all.sql (full SQL regression suite, PG16) ===
=== ALL TESTS PASSED ===

=== tests/acceptance/run_acceptance.sql ===
US-11: PASSED
=== ALL ACCEPTANCE TESTS PASSED ===

=== install idempotency: re-run install + idempotency test ===
reinstall: ok
(install_idempotency_pass = 1)

Driver suites

=== Python driver suite ===
54 passed in 28.15s

=== Go driver suite ===
ok  github.com/NikolayS/pgque-go  12.350s

=== TypeScript driver suite ===
Test Files  3 passed (3)
     Tests  37 passed (37)

No driver suite needed adjustment. The pgque.receive / pgque.ack /
pgque.nack SECURITY DEFINER wrappers reach the now-admin-only
primitives via the function owner, so application code is unaffected.

Repro of finding A on origin/main (without the test fixture)

For completeness, the bare-bones repro from the issue body, run as a
naked pgque_reader against an origin/main install:

[install: origin/main sql/pgque.sql at e619bad]
--- attack A: register_consumer_at against cons_a ---
 register_consumer_at
----------------------
                    0
(1 row)

attack succeeded: B repositioned A's cursor (returned 0 = "already
registered, repositioned"; sub_batch and sub_next_tick are now NULL).

Same call returns permission denied for function register_consumer_at
on the branch HEAD.

https://claude.ai/code/session_01TwKyarGpkGKN8LY1CV8dsG


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: low-level primitives let writers mutate/read other consumers' active batches

2 participants