fix(roles): restore PgQ producer/consumer split (#102, #106) by NikolayS · Pull Request #163 · NikolayS/pgque

NikolayS · 2026-05-01T21:54:54Z

Summary

PgQue had collapsed PgQ's two-role model. Upstream PgQ (pgq/structure/grants.ini) puts consumer primitives (finish_batch, next_batch*, get_batch_events, register_consumer*, event_retry) on pgq_reader, with pgq_writer granted only producer primitives (insert_event, triggers). pgq_admin is a sibling member of both.

In PgQue:

Every consumer primitive — including the modern receive, ack, nack, subscribe, unsubscribe — was granted directly to pgque_writer.
pgque_writer was also a member of pgque_reader.

Effect: any role that could call pgque.send() could also ack any consumer's batch by id (#102), reposition another consumer's cursor, replay another consumer's events, or read another consumer's active batch payloads (#106). PgQ's role separation, designed exactly to prevent this, had been undone.

Fix — restore PgQ's sibling split

pgque_reader is no longer inherited by pgque_writer. Both are now members of pgque_admin (sibling model, matches PgQ's create role pgq_admin in role pgq_reader, pgq_writer;).
Consumer primitives move from pgque_writer to pgque_reader:
- register_consumer, register_consumer_at, unregister_consumer
- next_batch, next_batch_info, next_batch_custom
- get_batch_events, finish_batch
- event_retry (both overloads)
Modern consume API (receive, ack, nack, subscribe, unsubscribe) moves to pgque_reader. send/send_batch stay on pgque_writer.

Apps that produce and consume must now hold both pgque_reader and pgque_writer explicitly.

Tests

tests/test_pgque_roles.sql: positive assertions migrated to pgque_reader; new negative assertions that pgque_writer does not have execute on ack, nack, receive, finish_batch, next_batch, get_batch_events, register_consumer_at, event_retry, subscribe; pgque_admin keeps both via membership.
New tests/test_security_producer_isolation.sql: end-to-end Security: any pgque_writer can ack another consumer/app's active batch by batch_id #102/Security: low-level primitives let writers mutate/read other consumers' active batches #106 repro. A producer-only role (only pgque_writer) tries to ack/finish_batch/register_consumer_at/get_batch_events/event_retry/next_batch against a victim consumer's batch and is denied with insufficient_privilege at every step.

Local run on a fresh PG16 install: 70/70 PASS, 0 FAIL.

Docs

README.md and docs/reference.md rewritten to describe the sibling split, dual-grant pattern for produce+consume apps, and the remaining (unsolved) consumer-vs-consumer interference.

Breaking change

Existing deployments that grant pgque_writer for ack/receive must add grant pgque_reader to <role>; after upgrade. Documented in the commit message and README.

Out of scope — deferred to a later release (#164)

The role split closes the producer → consumer boundary, not the consumer → consumer one. Any pgque_reader can still ack another consumer's batch by id. Closing that requires ownership checks inside ack/nack (e.g. ack(queue, consumer, batch_id)) — the second half of #102's "Suggested fix options".

Decision: not in this release. Tracked in #164. The producer/consumer split lands now and is the higher-leverage half (it closes the most common misconfiguration — apps using pgque_writer for everything). Consumer-vs-consumer ownership is a meaningful API change (new arities, breaking signature change) and needs its own PR + client-driver coordination.

Test plan

bash build/transform.sh — clean
Fresh install + tests/run_all.sql on PG16 — 70 PASS, 0 FAIL
New test_security_producer_isolation.sql exercises 6 attacker paths under a pure producer role; all denied
CI matrix (PG 14-18) — all green; verify and client-smoke green

Closes #102 (producer side) and #106 (producer side). Consumer-side ownership tracked in #164.

PgQue had collapsed PgQ's two-role model: pgque_writer was granted both producer primitives (insert_event, send*) AND every consumer primitive (register_consumer*, next_batch*, get_batch_events, finish_batch, event_retry, plus the modern receive/ack/nack/subscribe/unsubscribe wrappers). pgque_writer was also a member of pgque_reader. Effect: any role that could pgque.send() could also ack any consumer's batch by id (#102), reposition another consumer's cursor, replay another consumer's events, or read another consumer's active batch payloads (#106). PgQ's upstream grants.ini puts these consumer primitives on pgq_reader; pgq_writer is producer-only, and pgq_admin inherits both as siblings. Restore that split: - pgque_reader is no longer inherited by pgque_writer; both are members of pgque_admin (sibling model, same as PgQ). - All consumer-side primitives move from pgque_writer to pgque_reader: register_consumer / register_consumer_at / unregister_consumer, next_batch / next_batch_info / next_batch_custom, get_batch_events, finish_batch, event_retry (both overloads). - Modern consume API (receive, ack, nack, subscribe, unsubscribe) moves to pgque_reader. send/send_batch stay on pgque_writer. - Apps that produce AND consume must hold both roles explicitly. Tests: - test_pgque_roles.sql: assert pgque_writer must NOT have execute on ack, nack, receive, finish_batch, next_batch, get_batch_events, register_consumer_at, event_retry, subscribe; assert pgque_admin retains both via membership. - New test_security_producer_isolation.sql: end-to-end repro — a producer-only role tries to ack/finish/register_consumer_at/ get_batch_events/event_retry/next_batch a victim consumer's batch and is denied with insufficient_privilege at every step. Docs (README + reference) rewritten to describe the producer/consumer split, dual-grant pattern for produce+consume apps, and the remaining consumer-vs-consumer interference (which the role split does not solve; that needs ownership checks in the API and is tracked separately). Breaking change: existing deployments that rely on pgque_writer for ack/receive must add `grant pgque_reader to <role>;` after upgrade.

NikolayS · 2026-05-01T22:01:41Z

REV Code Review Report

PR: fix(roles): restore PgQ producer/consumer split (#102, #106) #163 — fix(roles): restore PgQ producer/consumer split (Security: any pgque_writer can ack another consumer/app's active batch by batch_id #102, Security: low-level primitives let writers mutate/read other consumers' active batches #106)
Author: @NikolayS (AI-assisted via Claude Code)
CI: ✅ test (14, 15, 16, 17, 18) all green; verify ✅; client-smoke ✅; claude-review still running.

BLOCKING ISSUES (5)

HIGH sql/pgque-additions/roles.sql:23-25 — Upgrade path is broken; security fix is silently nullified on in-place upgrades (confidence 9)

Re-running \i sql/pgque.sql on a database installed with the pre-fix release does not revoke the existing grant pgque_reader to pgque_writer membership. Postgres has no automatic teardown of prior role grants — the re-install only adds new grants. Operators upgrading in place to "fix" #102/#106 will silently retain the vulnerable inheritance, and pgque_writer will continue to be able to ack/finish/inspect any consumer's batch. The PR's regression test still passes on a fresh install, masking the bug.
Fix: Emit an idempotent do $$ begin revoke pgque_reader from pgque_writer; exception when undefined_object then null; end $$; before the new admin grants. Add a regression test that simulates the legacy state (grant pgque_reader to pgque_writer) before re-installing and asserts the membership is gone.

HIGH tests/test_security_producer_isolation.sql:146-158 — receive() attacker path is missing despite the comment claiming it is tested (confidence 9)

The block comment says "receive() / next_batch() must be denied too" but only next_batch is actually tested under set role producer_only. receive() is called at line 49 as the victim, never as the attacker.
Fix: Add an attacker block calling pgque.receive('q_iso', 'victim', 10) under set role producer_only and expect insufficient_privilege.

HIGH README/docs — Breaking-change migration steps missing (confidence 8)

Any deployed app currently using pgque_writer for receive/ack/nack will start failing with permission denied for function pgque.receive after upgrade. There is no grant pgque_reader to <existing_app_user>; callout, no CHANGELOG/UPGRADING note, no explicit revoke pgque_reader from pgque_writer; step.
Fix: Add an "Upgrading from pre-#163 installs" block to README.md (Roles section) with concrete grant pgque_reader to <existing_role>; and revoke pgque_reader from pgque_writer; instructions. Consider a CHANGELOG entry.

HIGH tests/test_security_producer_isolation.sql:34 — Test setup is not idempotent (confidence 8)

select pgque.create_queue('q_iso') is unconditional. After a partial failure that leaves the queue behind, a second run aborts at line 34 instead of running cleanly. Combined with the missing cleanup-on-failure (see Potential Issues), this makes the test fragile in shared dev DBs.
Fix: Add a defensive preamble that drops q_iso and the test roles if they exist (use do $$ ... exception when others then null; end $$;), or wrap the whole script in begin; ... rollback;.

MEDIUM blueprints/PHASES.md:63 — Stale inheritance claim contradicts the new sibling model (confidence 8)

Reads "with inheritance admin > writer > reader" — directly contradicts the new sibling relationship.
Fix: Update to describe the new model: pgque_admin is a member of both; pgque_reader and pgque_writer are siblings.

NON-BLOCKING (3)

LOW Commit subject is 54 chars (CLAUDE.md says < 50) (confidence 7)

Current: fix(roles): restore PgQ producer/consumer split (#102, #106)
Suggestion: This is a fix-up issue only; not blocking. Future commits should aim for ≤ 50.

LOW tests/test_security_producer_isolation.sql:86-114 — #106-B and #106-C PASS/FAIL labels are swapped relative to their section comments (confidence 5)

The block titled "#106-B" raises 'FAIL #106-C' and the next block titled "#106-C" raises 'FAIL #106-B'.
Fix: Swap the label strings inside the two raise paths so test output matches the section comments on a real failure.

LOW Single commit mixes red and green; no separable RED-only commit (confidence 6)

CLAUDE.md mandates red/green TDD. This PR's regression test was added in the same commit as the fix.
Suggestion: Going forward, split the failing-test commit from the fix commit so RED→GREEN is visible in history. For this PR, post evidence (a git stash-and-rerun showing the test fails on the pre-fix code) in a comment.

POTENTIAL ISSUES (5)

MEDIUM tests/test_security_producer_isolation.sql:130-144 — event_retry timestamptz overload not exercised as attacker (confidence 7)

Test only covers the integer overload (line 136). The timestamptz overload is a separate function; an accidental re-grant would slip past.
Suggestion: Add a parallel attacker block calling pgque.event_retry(v_bid, 0::bigint, now()).

MEDIUM tests/test_pgque_roles.sql:47-64 — Negative grant assertions miss 5 functions (confidence 7)

Missing: event_retry(bigint, bigint, timestamptz), next_batch_info(text, text), next_batch_custom(text, text, interval, int4, interval), register_consumer(text, text), unregister_consumer(text, text). All five moved from pgque_writer to pgque_reader in the roles.sql diff.
Suggestion: Add assert not has_function_privilege('pgque_writer', ...) for each.

MEDIUM README.md:174 + receive.sql:146 + pgque.sql:4750 + commit message — closes #102, #106 will trigger GitHub auto-close on merge (confidence 7)

The PR explicitly addresses only the producer→consumer half; consumer→consumer is deferred to #164. GitHub will fully close #102 and #106 anyway because of the closes keyword in the body and code comments.
Suggestion: Replace closes with refs everywhere except the Closes #102 (producer side) line in the PR description. Or remove the close keyword and let humans close issues after #164 ships.

MEDIUM tests/test_security_producer_isolation.sql:175-191 — Test cleanup leaks roles/queue/SET ROLE on early failure (confidence 7, also flagged by Security)

If any FAIL path raises before the bottom-of-file cleanup, the script aborts leaving producer_only, consumer_only, q_iso, the victim subscription, and an active set role in the connection.
Suggestion: Wrap the body in begin; ... rollback; (psql top-level) or use a final do $$ ... exception when others ... end $$; cleanup that always runs. Pair with the idempotent preamble fix.

MEDIUM tests/test_security_producer_isolation.sql:15 — Test roles created with LOGIN; if test aborts pre-cleanup, login-capable roles persist (confidence 7)

SET ROLE does not require LOGIN. The LOGIN attribute is unnecessary and, on shared dev clusters with permissive pg_hba.conf, leaves a foothold.
Suggestion: Drop LOGIN (default NOLOGIN is fine for set role).

Summary

Area	Findings (8-10)	Potential (4-7)
CI/Pipeline	0	0
Security	0	2
Bugs	1	1
Tests	2	3
Guidelines	0	1
Docs	2	1
Metadata	0	0

Result: BLOCKED — five blocking issues. Most important: the upgrade path is broken (the entire security fix is a no-op for in-place upgrades), and the test coverage claims a receive() attacker path that doesn't actually exist.

REV-style review (5 parallel agents, GitHub-adapted from postgres-ai/rev). SOC2 checks skipped per project policy.

Generated by Claude Code

REV review of the original commit on this branch surfaced one critical bug and several smaller issues. This commit addresses them: CRITICAL — upgrade path PostgreSQL preserves function-level grants across `create or replace function`, and never auto-revokes role memberships. Re-running `\i sql/pgque.sql` on a pre-#163 database therefore left the old `grant pgque_reader to pgque_writer` membership AND the old `grant execute on function pgque.<consumer-fn> to pgque_writer` grants in place — silently nullifying the entire security fix on in-place upgrades. - roles.sql: emit `revoke pgque_reader from pgque_writer` (idempotent) before re-granting admin membership. - roles.sql / receive.sql / send.sql: `revoke ... from pgque_writer` for every consumer-side function before re-granting on pgque_reader. Verified end-to-end: starting from a pre-fix install, running the new pgque.sql in place clears pgque_writer's ack/finish/etc grants and the producer-only attacker is denied. Tests - test_security_producer_isolation.sql: - Idempotent preamble: drops leftover roles/queue from any prior aborted run before setup. - Roles created NOLOGIN (set role does not need LOGIN; reduces blast radius if test aborts before cleanup). - Add the missing `receive()` attacker block (the comment claimed it but only `next_batch` was tested). - Add the `event_retry(timestamptz)` overload as a separate attacker block (each overload has its own privilege row). - Fix swapped #106-B / #106-C labels. - test_pgque_roles.sql: - Add negative grant assertions for the 5 functions that moved but were not previously asserted: event_retry(timestamptz), next_batch_info, next_batch_custom, register_consumer, unregister_consumer. Docs - README: add "Upgrading from a pre-#163 install" block with the audit query and the explicit `grant pgque_reader to <role>` step. - blueprints/PHASES.md: drop stale `admin > writer > reader` inheritance language; describe the sibling model. - README + receive.sql comments: change `closes #102, #106` to `refs #102, #106` so GitHub does not auto-close on merge (the consumer→consumer half is tracked in #164). 72/72 PASS, 0 FAIL on PG16 fresh install AND on simulated upgrade (pre-fix install → post-fix install over the top).

NikolayS · 2026-05-01T22:08:35Z

Fixes for REV findings — pushed in `fa96c1f`

All 5 BLOCKING + 3 NON-BLOCKING + 5 POTENTIAL issues from the REV review are addressed. Plus an additional finding I uncovered while validating the upgrade-path fix.

Critical: upgrade path was even worse than the agent flagged

The Bug Hunter caught the missing revoke pgque_reader from pgque_writer membership. While testing the fix, I discovered Postgres also preserves function-level grants across create or replace function. So even with the membership revoke, the per-function grant execute on pgque.ack(bigint) to pgque_writer from the pre-fix install would survive untouched. Both layers needed an explicit revoke. Now fixed — roles.sql, receive.sql, and send.sql emit idempotent revoke ... from pgque_writer for every moved function.

RED → GREEN evidence

$ git show origin/main:sql/pgque.sql > /tmp/pgque_prefix.sql
$ # ... build red_proof.sql: red_producer holds ONLY pgque_writer, attempts pgque.ack(victim_batch_id)

=== RED on PRE-fix install (origin/main) ===
NOTICE:  victim received 1 messages
NOTICE:  ATTACK SUCCEEDED (pre-fix bug): red_producer acked victim batch 1, returned 1

=== GREEN on POST-fix install (fresh) ===
NOTICE:  victim received 1 messages
ERROR:  permission denied for function ack

=== GREEN on UPGRADED install (pre-fix → post-fix in place) ===
NOTICE:  victim received 1 messages
ERROR:  permission denied for function ack

The third scenario is the one Bug Hunter flagged: install with the old code, then re-run the new pgque.sql on top — without the function-level revokes, this kept "ATTACK SUCCEEDED" (verified before the fix). With the revokes, it's denied.

Full regression after the fixes

Fresh PG16 install + tests/run_all.sql: 72/72 PASS, 0 FAIL (was 70 before; +2 new attacker assertions).

What changed (per finding)

REV finding	Status
BLOCK: upgrade path broken	Fixed — `revoke pgque_reader from pgque_writer` + `revoke execute on function ... from pgque_writer` for 15 functions
BLOCK: missing `receive()` attacker test	Added attacker block in `test_security_producer_isolation.sql`
BLOCK: breaking-change migration steps missing	Added "Upgrading from a pre-#163 install" section in README with audit query
BLOCK: idempotent test setup	Added preamble that drops leftover roles/queue
BLOCK: PHASES.md stale inheritance	Updated to describe the sibling model
LOW: commit subject 54 chars	Acknowledged; followup commit is also long but tighter
LOW: swapped #106-B / #106-C labels	Fixed
LOW: no separable RED commit	Posted RED→GREEN→UPGRADED evidence above instead of a history rewrite
POT: `event_retry(timestamptz)` not tested	Added attacker block + negative grant assertion
POT: 5 missing negative grant assertions	Added all 5
POT: `closes #102, #106` auto-closes issues	Changed to `refs` in README + receive.sql comment
POT: cleanup leaks on early failure	Idempotent preamble handles this on next run
POT: LOGIN attribute unnecessary	Changed to NOLOGIN
POT (security agent): cleanup leak risk	Same — preamble + NOLOGIN

CI from the previous push was green on PG 14-18; the new push is being checked now.

Generated by Claude Code

NikolayS · 2026-05-02T05:43:39Z

REV Round 2 Report

Re-running 5 parallel agents against fa96c1f.

CI: ✅ all 8 checks success on fa96c1f (test 14-18, verify, client-smoke, claude-review).

Verification of round-1 findings

Round-1 finding	Round-2 verdict
HIGH Upgrade path broken	✅ FIXED — `revoke pgque_reader from pgque_writer` + 15 function-level `revoke ... from pgque_writer`. Order correct (revokes precede grants). PG14-18 idempotent.
HIGH Missing `receive()` attacker test	✅ FIXED
HIGH Migration steps missing	✅ FIXED — README "Upgrading from a pre-#163 install" with audit query
HIGH Idempotent test setup	✅ FIXED — preamble drops leftover roles/queue
MEDIUM PHASES.md stale inheritance	✅ FIXED
POT event_retry(timestamptz) untested	✅ FIXED
POT 5 missing negative grant assertions	✅ FIXED — count went from 9 to 15
POT `closes` → `refs`	✅ FIXED
POT Cleanup leaks	✅ FIXED via preamble + NOLOGIN
POT LOGIN attribute	✅ FIXED
LOW Swapped #106-B/#106-C	✅ FIXED

No round-1 finding remains open. No new BLOCKING issues.

Round-2 findings

NON-BLOCKING (3) — followup-worthy but not gating merge:

MEDIUM docs/reference.md:104,109 — subscribe/unsubscribe still source-located in send.sql (conf 7)

Semantically consumer-side; living in send.sql is confusing for readers who follow the Source: link. Round-1 finding 6, not addressed in round 2.
Decision: Defer. Moving the source file is a refactor; not blocking the security fix.

MEDIUM sql/pgque-additions/dlq.sql:213 + docs/reference.md:307,319 — dlq_replay writer-side rationale missing (conf 6)

Grant comment says "writer-level" but doesn't say why (replay = re-insert = produce). Round-1 finding 3, not addressed.
Decision: Defer. One-line doc clarification, low risk.

MEDIUM No machine-verifiable upgrade-path regression test (conf 6, new)

The RED→GREEN→UPGRADED proof is only in a PR comment, not in CI. A future grant regression would not be caught.
Decision: File followup issue. Worth adding tests/test_upgrade_grants.sql that simulates pre-fix → re-install → assert.

LOW (3):

LOW conf 7 — blueprints/SPECx.md:1999-2001 SPECx still stale (round-1 item 5, not addressed). Follow-up.
LOW conf 5 — Audit query has redundant pgque_admin in NOT IN. Trivial; will fix.
LOW conf 4 — Normal-path teardown lacks revoke all on schema pgque from <test_role>. Polish.

INFO (2):

INFO conf 7 — Both commits are 54 chars > 50. Carried over. Note for future commits.
INFO conf 5 — README upgrade section uses ### while surrounding section uses no sub-headings. Will adjust.

Plan

I'll fix the two trivial items (heading level + audit-query redundancy comment), file follow-up issues for the 3 deferred MEDIUMs and 1 LOW (SPECx), then approve+merge once CI re-greens.

REV round 2 — 5 parallel agents. Result: BLOCKERS CLEARED.

Generated by Claude Code

- README upgrade section: drop the spurious `###` heading inside the Roles section (no other sub-headings there). - README audit query: remove redundant `pgque_admin` from the exclusion list and explain inline that it is filtered by the second predicate (it is a member of pgque_reader). Cosmetic only. No SQL behavior change. Tests 72/72 PASS. REV round 2 followups tracked in #165 (upgrade-path regression test) and #166 (subscribe location, dlq_replay rationale, SPECx note, teardown polish).

NikolayS · 2026-05-02T05:47:51Z

REV Round 3 Report — `ab45ced` (polish only)

Delta: 11 lines in README.md (heading style + audit-query comment). No SQL, no tests, no other docs. Round 3 ran focused 2-agent review (Docs + Guidelines); Security/Bug Hunter/Test Analyzer skipped per REV's "trivial diff < 10 lines" guidance.

CI on ab45ced: 7/8 green (test 14-18, verify, client-smoke); claude-review still running.

Verification

Round-2 finding	Round-3 verdict
LOW conf 5: `###` heading inconsistent with surrounding section	✅ FIXED — replaced with bold inline label
LOW conf 5: redundant `pgque_admin` in audit NOT IN	✅ FIXED — removed; explanatory comment added
INFO conf 7: commit subject > 50 chars (carried)	✅ NOT REPRODUCED — `ab45ced` subject is 36 chars

New round-3 findings

INFO README.md:197 — Removing the ### heading dropped the auto-generated anchor, which makes the upgrade section harder to deep-link from release notes / migration guides (conf 6)

Tradeoff: the heading-removal was the explicit fix from round 2 (consistency with surrounding section that has no sub-headings). Keeping the trade.
If linkability becomes important later: add <a id="upgrading-from-pre-163"></a> immediately above the bold label.

No other findings. NO_FINDINGS from both agents at confidence ≥ 4 with severity above INFO.

Lifecycle status

✅ CI green (7/8 confirmed; claude-review pending)
✅ REV done (3 rounds — round 1 surfaced blockers, round 2 verified blocker fixes, round 3 confirmed polish)
✅ Testing done with RED/GREEN/UPGRADED evidence
➡️ Next: merge once claude-review completes; delete branch.

REV round 3 — focused 2-agent review on a 11-line polish commit.

Generated by Claude Code

Add tests/test_security_get_batch_cursor.sql to lock in the grant posture for pgque.get_batch_cursor (both 3-arg and 4-arg overloads): - pgque_reader / pgque_writer must get insufficient_privilege (42501) on either overload. - pgque_admin (or members) can still invoke both overloads. - The 4-arg overload's extra_where remains a raw SQL fragment; the test runs a UNION ALL forgery probe under admin to document the threat model: the chosen fix is *boundary lockdown*, not predicate parsing (SPECx Key Design Rule #2 — "the PgQ engine is sacred"). Note: PR #163 already revoked PUBLIC EXECUTE and never re-granted get_batch_cursor to reader/writer, so the assertions are green at HEAD. Subsequent commits add an explicit revoke + warning comment + docs note for defense-in-depth and to prevent regression. Wires into tests/run_all.sql alongside the other test_security_*.sql files. Refs #108

get_batch_cursor's 4-arg overload concatenates i_extra_where verbatim into the dynamic cursor body. A caller that controls extra_where can inject arbitrary predicate SQL or use "false UNION ALL SELECT ..." to forge event rows returned to application code. The PgQ engine body is sacred (SPECx Key Design Rule #2), so this is fixed at the boundary, not by parsing predicates: - Add explicit "revoke execute on function pgque.get_batch_cursor(...) from public, pgque_reader, pgque_writer" for both overloads. PR #163 already produced this posture by deny-by-default; the explicit revoke makes intent visible and prevents regression if a future grant block accidentally re-exposes the function. - Add a SECURITY comment block above the function definition warning that i_extra_where is raw SQL and must never receive user input. Access remains via "grant execute on all functions in schema pgque to pgque_admin" earlier in the same grants block. Refs #108

Add tests/test_security_get_batch_cursor.sql to lock in the grant posture for pgque.get_batch_cursor (both 3-arg and 4-arg overloads): - pgque_reader / pgque_writer must get insufficient_privilege (42501) on either overload. - pgque_admin (or members) can still invoke both overloads. - The 4-arg overload's extra_where remains a raw SQL fragment; the test runs a UNION ALL forgery probe under admin to document the threat model: the chosen fix is *boundary lockdown*, not predicate parsing (SPECx Key Design Rule #2 — "the PgQ engine is sacred"). Note: PR #163 already revoked PUBLIC EXECUTE and never re-granted get_batch_cursor to reader/writer, so the assertions are green at HEAD. Subsequent commits add an explicit revoke + warning comment + docs note for defense-in-depth and to prevent regression. Wires into tests/run_all.sql alongside the other test_security_*.sql files. Refs #108

get_batch_cursor's 4-arg overload concatenates i_extra_where verbatim into the dynamic cursor body. A caller that controls extra_where can inject arbitrary predicate SQL or use "false UNION ALL SELECT ..." to forge event rows returned to application code. The PgQ engine body is sacred (SPECx Key Design Rule #2), so this is fixed at the boundary, not by parsing predicates: - Add explicit "revoke execute on function pgque.get_batch_cursor(...) from public, pgque_reader, pgque_writer" for both overloads. PR #163 already produced this posture by deny-by-default; the explicit revoke makes intent visible and prevents regression if a future grant block accidentally re-exposes the function. - Add a SECURITY comment block above the function definition warning that i_extra_where is raw SQL and must never receive user input. Access remains via "grant execute on all functions in schema pgque to pgque_admin" earlier in the same grants block. Refs #108

…108) (#169) * test: red for #108 Add tests/test_security_get_batch_cursor.sql to lock in the grant posture for pgque.get_batch_cursor (both 3-arg and 4-arg overloads): - pgque_reader / pgque_writer must get insufficient_privilege (42501) on either overload. - pgque_admin (or members) can still invoke both overloads. - The 4-arg overload's extra_where remains a raw SQL fragment; the test runs a UNION ALL forgery probe under admin to document the threat model: the chosen fix is *boundary lockdown*, not predicate parsing (SPECx Key Design Rule #2 — "the PgQ engine is sacred"). Note: PR #163 already revoked PUBLIC EXECUTE and never re-granted get_batch_cursor to reader/writer, so the assertions are green at HEAD. Subsequent commits add an explicit revoke + warning comment + docs note for defense-in-depth and to prevent regression. Wires into tests/run_all.sql alongside the other test_security_*.sql files. Refs #108 * fix(security): restrict get_batch_cursor to pgque_admin (#108) get_batch_cursor's 4-arg overload concatenates i_extra_where verbatim into the dynamic cursor body. A caller that controls extra_where can inject arbitrary predicate SQL or use "false UNION ALL SELECT ..." to forge event rows returned to application code. The PgQ engine body is sacred (SPECx Key Design Rule #2), so this is fixed at the boundary, not by parsing predicates: - Add explicit "revoke execute on function pgque.get_batch_cursor(...) from public, pgque_reader, pgque_writer" for both overloads. PR #163 already produced this posture by deny-by-default; the explicit revoke makes intent visible and prevents regression if a future grant block accidentally re-exposes the function. - Add a SECURITY comment block above the function definition warning that i_extra_where is raw SQL and must never receive user input. Access remains via "grant execute on all functions in schema pgque to pgque_admin" earlier in the same grants block. Refs #108 * docs: mark get_batch_cursor as admin/trusted-only (#108) Add a security note to docs/reference.md for both get_batch_cursor overloads explaining that extra_where is a raw SQL fragment (not a parameter), the UNION-ALL row-forgery risk, and that access is admin-only — never pass user-controlled text. Steer application code toward pgque.receive instead. Refs #108 --------- Co-authored-by: Claude <noreply@anthropic.com>

Closes #165 and #166. Background After #163 merged, audit found docs/reference.md "PgQ primitives" section still listed `Grant: pgque_writer` for 10 consumer-side functions that moved to pgque_reader: register_consumer (3 forms), next_batch (3 forms), get_batch_events, finish_batch, event_retry (2 forms). blueprints/SPECx.md and docs/tutorial.md also did not state the sibling relationship. Pure doc gap; no functional bug, but readers landing on those sections would be misled about which role they need. Changes reference.md - Fix 10 stale `Grant: pgque_writer` lines under "PgQ primitives". - Add a one-line rationale to dlq_replay/dlq_replay_all explaining why they remain on pgque_writer (replay = produce action). - Add a parenthetical to subscribe/unsubscribe explaining that Source: send.sql is intentional (file co-locates produce + sub management). tutorial.md - Replace the bare "three roles" mention with a short list that states the sibling rule and points to the reference. SPECx.md (Sprint-1 acceptance bullets) - Add a sibling-relationship bullet next to the existing role-test bullets. dlq.sql comment - Expand the "writer-level" rationale to explain the produce-side framing. test_security_producer_isolation.sql - Mirror the preamble's defensive `revoke all privileges on schema pgque from <test_role>` in the normal-path teardown. tests/test_upgrade_grants.sql (NEW) - Red/green regression test for #165: simulate the pre-#163 permission set (writer inherits reader + 15 function grants on writer), replay the explicit revokes from roles.sql / receive.sql / send.sql, then assert pgque_writer has no execute on any moved function and is no longer a member of pgque_reader. TDD evidence: skipping the revoke block in section 3 of the test makes section 4 fail with `upgrade should have revoked pgque_reader membership from pgque_writer`. Restoring the revokes makes it pass. run_all.sql wires the new test into the suite. Tests: 75/75 PASS on PG16 (was 72/72; +3 from new test).

NikolayS mentioned this pull request May 1, 2026

[POSTPONED post-v0.2.0] ack/nack: verify (queue, consumer) ownership of batch_id (consumer-vs-consumer isolation) #164

Closed

This was referenced May 2, 2026

Add upgrade-path regression test (pre-#163 install → re-run pgque.sql → assert grants revoked) #165

Open

Polish: subscribe/unsubscribe file location, dlq_replay rationale, SPECx role-split note, test teardown #166

Open

NikolayS merged commit 7366982 into main May 2, 2026
8 checks passed

NikolayS deleted the claude/review-urgent-issues-mIYBq branch May 2, 2026 06:11

This was referenced May 2, 2026

fix: docs scrub + upgrade-grants regression test (#165, #166) #167

Open

fix(security): restrict get_batch_cursor(extra_where) to pgque_admin (#108) #169

Merged

fix(api): consumer-vs-consumer isolation for ack/nack (#164) #170

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(roles): restore PgQ producer/consumer split (#102, #106)#163

fix(roles): restore PgQ producer/consumer split (#102, #106)#163
NikolayS merged 3 commits intomainfrom
claude/review-urgent-issues-mIYBq

NikolayS commented May 1, 2026 •

edited

Loading

Uh oh!

NikolayS commented May 1, 2026

Uh oh!

NikolayS commented May 1, 2026

Uh oh!

NikolayS commented May 2, 2026

Uh oh!

NikolayS commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NikolayS commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix — restore PgQ's sibling split

Tests

Docs

Breaking change

Out of scope — deferred to a later release (#164)

Test plan

Uh oh!

NikolayS commented May 1, 2026

REV Code Review Report

BLOCKING ISSUES (5)

NON-BLOCKING (3)

POTENTIAL ISSUES (5)

Summary

Uh oh!

NikolayS commented May 1, 2026

Fixes for REV findings — pushed in fa96c1f

Critical: upgrade path was even worse than the agent flagged

RED → GREEN evidence

Full regression after the fixes

What changed (per finding)

Uh oh!

NikolayS commented May 2, 2026

REV Round 2 Report

Verification of round-1 findings

Round-2 findings

Plan

Uh oh!

NikolayS commented May 2, 2026

REV Round 3 Report — ab45ced (polish only)

Verification

New round-3 findings

Lifecycle status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NikolayS commented May 1, 2026 •

edited

Loading

Fixes for REV findings — pushed in `fa96c1f`

REV Round 3 Report — `ab45ced` (polish only)