Skip to content

Umbrella: raw SQL interface audit findings from testing rounds 1–4 #113

@NikolayS

Description

@NikolayS

Summary

This is an umbrella / triage issue for the raw SQL interface audit performed against main at 9b3f89f.

The audit produced issues #95#112. This issue summarizes them, separates likely release blockers from docs/hardening/design questions, and calls out which findings appear inherited from PgQ assumptions vs locally introduced by PgQue wrappers/additions.

Recommended triage

Keep as real bugs / release blockers

Keep, but reframe / downgrade

Merge or downgrade depending product decision

These are real behaviors, but may be by design if pgque_writer is a coarse trusted global DB role like old PgQ.

Recommendation:

Merge with related issues if desired

PgQ-inherited vs PgQue-local

Likely inherited from PgQ assumptions / primitive API design

Clearly PgQue-local / introduced by modernization, wrappers, additions, docs, or transformations

Suggested immediate release-blocker shortlist

If the goal is v0.2 readiness, I would prioritize:

  1. Security: pgque roles are bypassed by default PUBLIC EXECUTE on functions #96 — role/public execute hardening
  2. Security: pgque_admin can escalate via queue_extra_maint executed by SECURITY DEFINER maint() #101 — SECURITY DEFINER queue_extra_maint escalation
  3. Bug: nack() DLQ path trusts caller-supplied pgque.message and allows forged DLQ rows #98 + Bug: repeated nack() of same DLQ-bound message creates duplicate replayable DLQ rows #104 — canonical/idempotent nack() / DLQ terminal handling
  4. Bug: receive() can strand consumer on empty active batch with no batch_id to ack #103 — empty-batch receive() wrapper trap
  5. Bug: batch_retry() fails on xid8 schema (NULL::int8 inserted into ev_txid xid8) #107 — broken batch_retry() on current schema
  6. Bug: pgque.maint() VACUUM branch fails because VACUUM cannot run inside PL/pgSQL function #110 — broken maint() VACUUM branch
  7. Docs: manual scheduler and reference are stale for retry_events, maint(), force_tick(), nack(), grants #99/Docs: README/examples client snippets have invalid ack usage, event-type mismatch, benchmark/version drift #105 — docs that would cause bad production usage

#97 is also important if same consumer may have multiple workers. If the intended model is “one worker per consumer name,” document it; otherwise fix locking.

Audit note

I do not think #95#112 are all equally severe. Some are true implementation bugs; some are docs/API contract issues; some are inherited PgQ/global-role design assumptions. This umbrella is meant to prevent overreacting to every finding as a release blocker while keeping the real sharp edges visible.


Status update — 2026-04-30 11:50 UTC

Closed / merged to main and locally verified

Verification evidence:

Open PRs verified PASS

Docs PRs

Still open / not fixed yet

Suggested merge / action order

  1. Merge fix: revoke PUBLIC EXECUTE and harden SECURITY DEFINER queue_extra_maint #118 (security roles + SECURITY DEFINER hardening).
  2. Merge fix(pgque.batch_retry): cast NULL to xid8 not int8 #120, fix(pgque.create_queue): reject queue names > 57 bytes #122, fix(pgque.maint): drop VACUUM (cannot run inside PL/pgSQL) #119, fix(clients/python): consumer warns + acks unhandled event types #121 (runtime/client fixes already verified).
  3. Rebase docs: reference + examples + roles cleanup (#99, #105, #112) #126, then merge docs cleanup.
  4. Decide Bug: concurrent receive() for same consumer can double-deliver same events #97 scope: v0.2 now vs v0.2.1 with explicit docs.
  5. Triage Security: any pgque_writer can ack another consumer/app's active batch by batch_id #102/Security: low-level primitives let writers mutate/read other consumers' active batches #106/Security/docs: get_batch_cursor(extra_where) executes raw SQL predicate and can forge returned rows #108 as design/docs vs future ACL after Docs: clarify PgQue roles are global/coarse, not per-queue multi-tenant isolation #112 lands.

Status update — 2026-04-30 16:55 UTC

Landed / closed since prior update

The main raw-SQL audit fixes have moved significantly. These issues are now closed:

Also merged and related:

Current open audit issues

Current open PRs relevant to this audit

Suggested next actions

  1. Finish/merge fix(clients/go): green up live-pg test suite #132 if checks go green; this should unblock Go live tests and help ci: client-tests job runs all 3 driver suites against live PG #84.
  2. Rebase/rework docs: reference + examples + roles cleanup (#99, #105, #112) #126 after current main:
  3. Decide Bug: concurrent receive() for same consumer can double-deliver same events #97 scope: merge [ON HOLD v0.2.1] fix(pgque.receive): prevent double-delivery on concurrent same-consumer calls #125 now or keep for v0.2.1 with explicit one-worker-per-consumer docs.
  4. Decide Security: any pgque_writer can ack another consumer/app's active batch by batch_id #102/Security: low-level primitives let writers mutate/read other consumers' active batches #106/Security/docs: get_batch_cursor(extra_where) executes raw SQL predicate and can forge returned rows #108 after Docs: clarify PgQue roles are global/coarse, not per-queue multi-tenant isolation #112 docs: close/downgrade as trusted-global-role design, or keep for future ACL/safe-wrapper roadmap.
  5. Leave Hardening: validate queue config values and external ticker monotonicity #100 as hardening backlog unless v0.2 includes external ticker/config validation work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions