Skip to content

fix: follow-ups to #75 (quote_ident, dlq partial-success, Nack tests)#79

Merged
NikolayS merged 1 commit intomainfrom
fix/pr75-followups
Apr 30, 2026
Merged

fix: follow-ups to #75 (quote_ident, dlq partial-success, Nack tests)#79
NikolayS merged 1 commit intomainfrom
fix/pr75-followups

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

Follow-up to #75 addressing review items and a build-pipeline reconciliation that #75 missed.

Why a follow-up rather than rewriting #75

#75 landed three valuable fixes from @hobostay — Go consumer per-message nack, dlq_replay_all exception isolation, queue-existence check in insert_event_raw. The substantive logic is correct; the issues are at the edges (one regression, one visibility gap, missing test, and several edits placed in an auto-generated file). Merging #75 as-is preserved their commits cleanly; this PR layers the fixes on top.

What this PR does

sql/pgque-api/maint.sql — quote_ident regression on schema-qualified VACUUM

quote_ident(f.func_arg) is wrong. func_arg comes from pgque.maint_tables_to_vacuum() as "schema.table" — one string with a literal dot — so quote_ident wraps the whole thing as a single identifier and produces vacuum "pgque.event_0" instead of vacuum "pgque"."event_0". The latter is what PostgreSQL actually needs. Fix: split on the dot and quote each side via format('vacuum %I.%I', ...). Safer than the pre-#75 raw concatenation and correct under quoting.

The bug is invisible to existing tests because maint_tables_to_vacuum() returns nothing when autovacuum = on (the default) — flagged in the original review.

sql/pgque-additions/dlq.sqldlq_replay_all partial-success visibility

The per-event exception block used raise notice. NOTICE is hidden under most production configs (log_min_messages = warning by default), so partial failures became silent. Two changes:

  1. Switch to raise warning — visible in default logs.
  2. Change the return type from bare integer to a record (replayed bigint, failed bigint, first_error text) so callers can detect partial success programmatically.

drop function if exists added before the create or replace because PG doesn't allow CREATE OR REPLACE to change a function's return type. docs/reference.md updated.

Breaking change: Existing callers doing select pgque.dlq_replay_all('q') need to switch to select replayed from pgque.dlq_replay_all('q') (or destructure all three columns). Acceptable at the v0.2 cut. No existing tests called this function.

clients/go/pgque_test.go — Nack and per-message nack coverage

Adds TestNack and TestConsumerHandlerNacksOnError covering the behavior #75 introduced. Verifies that a failing handler nacks its own message into pgque.retry_queue (not the DLQ — retry_count starts below max_retries) without aborting the rest of the batch. Env-gated via PGQUE_TEST_DSN, matching the existing test style.

build/transform.sh — reconcile direct edits to the generated pgque.sql

Two of #75's source-level changes were applied directly to sql/pgque.sql:

  • upgrade_schema() — flattened create role pgque_admin in role pgque_reader, pgque_writer into explicit grant ... to pgque_admin calls; changed return 0 to return cnt.
  • insert_event_raw() — raises queue not found: % when the lookup misses, instead of falling through to a NULL-deref later.

sql/pgque.sql is auto-generated from pgq/ + sql/pgque-additions/ + sql/pgque-api/ by build/transform.sh. Direct edits to it get clobbered on the next rebuild. Re-applied both as post-transform patches in transform.sh so the changes survive.

sql/pgque.sql — regenerated

Verification runner reports all checks pass: schema rename, search_path on every SECURITY DEFINER, no queue_per_tx_limit/default_with_oids leakage, idempotent CREATE TABLE/SEQUENCE/INDEX.

Build pipeline note

Anyone editing sql/pgque.sql should instead edit the corresponding source under sql/pgque-additions/, sql/pgque-api/, or add a patch in build/transform.sh. The build's self-verification catches transformation drift but does not catch direct edits to the generated file that lack a source.

Test plan

  • CI green across PG14–18
  • client-smoke job green (Go test compilation; integration tests env-gated)
  • verify job green (transform.sh reproducibility check)
  • Manual: \i sql/pgque.sql succeeds against a fresh PG18
  • Manual: select * from pgque.dlq_replay_all('q') returns the expected three-column shape

🤖 Generated with Claude Code

#75 landed three valuable fixes — Go consumer per-message nack, dlq_replay_all
exception isolation, queue-existence check in insert_event_raw — but the SQL
changes either lived in the wrong place or used patterns that don't survive
the build. This PR addresses each.

* sql/pgque-api/maint.sql — quote_ident(f.func_arg) is wrong: func_arg comes
  from pgque.maint_tables_to_vacuum() as "schema.table" (one string with a
  literal dot), so quote_ident wraps it as one identifier and produces
  vacuum "schema.table" instead of vacuum "schema"."table". Fix: split on
  the dot and quote each side via format('vacuum %I.%I', ...). The bug was
  invisible to tests because maint_tables_to_vacuum() returns nothing when
  autovacuum = on (the default).

* sql/pgque-additions/dlq.sql — dlq_replay_all's per-event exception block
  used raise notice, which is hidden under most production configs
  (log_min_messages = warning by default). Two changes: switch to raise
  warning, and change the return type from bare integer to a record
  (replayed bigint, failed bigint, first_error text) so callers can detect
  partial success programmatically. Drop function if exists added before
  the create or replace because PG does not allow CREATE OR REPLACE to
  change a function's return type. docs/reference.md updated; no existing
  tests call dlq_replay_all.

* clients/go/pgque_test.go — adds TestNack and TestConsumerHandlerNacksOnError
  covering the per-message nack behavior introduced in #75. Verifies a
  failing handler nacks its own message into pgque.retry_queue (not the
  DLQ, since retry_count starts below max_retries) without aborting the
  rest of the batch.

* build/transform.sh — two of #75's source-level changes (upgrade_schema
  role grants flattened, insert_event_raw raises a clear "queue not found"
  exception) were applied directly to sql/pgque.sql, but pgque.sql is
  auto-generated from pgq/ + pgque-additions/ + pgque-api/ on each build.
  Re-applied them as post-transform patches in transform.sh so the next
  rebuild does not silently drop them.

* sql/pgque.sql — regenerated. Verification runner reports all checks pass.

Build pipeline note: anyone touching sql/pgque.sql should also touch the
corresponding source under sql/pgque-additions/, sql/pgque-api/, or add a
patch in build/transform.sh. The build script's self-verification catches
schema-rename / search-path / queue_per_tx_limit drift but does not catch
direct edits to the generated pgque.sql that lack a source.

Co-Authored-By: Qiaochu Hu <110803307+hobostay@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@NikolayS
Copy link
Copy Markdown
Owner Author

REV Review

CI: ✅ green (8/8 checks pass — test (14/15/16/17/18), verify, claude-review, client-smoke)
Functional evidence: test (14/15/16/17/18) on CI exercise the SQL fixes against real Postgres; Go integration tests env-gated and run via client-smoke job on CI (passing). The new TestNack and TestConsumerHandlerNacksOnError cover the per-message nack + batch-ack semantics from PR #75.
Verdict: READY TO MERGE

Blocking

none

Non-blocking

  • [LOW] [confidence 6/10] clients/go/pgque_test.go:122TestConsumerHandlerNacksOnError has a race window between the seen >= 2 break and consumerCancel(): the consumer may re-poll the redelivered message during that window, ack the second processing, and remove the row from retry_queue before the assertion runs. Probably stable in practice (CI passes), but flaky under heavy load.
    Fix: cancel the consumer context immediately when seen >= 2, then sleep ≥1× poll interval before the retry_queue query, or add a t.Cleanup that drops retry_queue rows at the end.
  • [LOW] [confidence 6/10] clients/go/pgque_test.go:63TestNack relies on the next setupQueue (rather than t.Cleanup) to wipe state; if the next test fails before setupQueue runs, this test's residue may pollute downstream runs.
    Fix: add t.Cleanup(func() { client.Pool().Exec(ctx, "DELETE FROM pgque.retry_queue ...") }) for explicit per-test cleanup.

Potential

  • [LOW] [confidence 7/10] sql/pgque-api/maint.sql:25format('vacuum %I.%I', split_part(f.func_arg, '.', 1), split_part(f.func_arg, '.', 2)) assumes func_arg is exactly schema.table. If func_arg lacks a dot, split_part(..., 2) returns '' and %I produces "" — invalid SQL surfaces at EXECUTE. Today maint_tables_to_vacuum() always returns schema.table, but a future schema change could regress silently.
    Fix: assert the dot is present, or fall back to quote_ident(f.func_arg) (with a check for embedded dot) — defensive guard, not a v0.2 blocker.
  • [LOW] [confidence 7/10] build/transform.sh:512 and :524 — the new post-transform patches use awk literal patterns that match specific upstream PgQ source lines (/create role pgque_admin in role pgque_reader, pgque_writer;/, /from pgque\.queue q where q\.queue_name = _qname into qstate;/). If upstream changes those lines (whitespace, comma ordering, identifier rename), the patch silently no-ops; only the echo "PASS: ..." runs unconditionally, so transform.sh reports success while the fix is missing.
    Fix: after each patch, grep -q for the expected new content and exit 1 if not found, mirroring the verification pattern used elsewhere in the file.

REV-style review (security, bugs, tests, guidelines, docs). SOC2 items skipped per project policy. Nothing blocking — non-blocking and potential items are good targets for v0.2.1.

@NikolayS NikolayS merged commit 2963324 into main Apr 30, 2026
8 checks passed
@NikolayS NikolayS deleted the fix/pr75-followups branch April 30, 2026 01:10
NikolayS added a commit that referenced this pull request Apr 30, 2026
client-smoke ran only TestSmoke-named tests and didn't set
PGQUE_TEST_DSN, so env-gated integration tests silently skipped.
That's how the Nack placeholder bug shipped green in PR #79.

New client-tests job:
- Sets PGQUE_TEST_DSN pointing at the docker PG.
- Runs full Go suite: go test -race -v ./...
- Runs full Python suite: pytest -v clients/python/tests/
- Runs full TypeScript suite: npx vitest --run (replaces tsx src/smoke.ts).

Switching the TS step to vitest also unblocks PR #83, which deletes
the hardcoded src/smoke.ts in favor of the vitest suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NikolayS added a commit that referenced this pull request Apr 30, 2026
client-smoke ran only TestSmoke-named tests and didn't set
PGQUE_TEST_DSN, so env-gated integration tests silently skipped.
That's how the Nack placeholder bug shipped green in PR #79.

New client-tests job:
- Sets PGQUE_TEST_DSN pointing at the docker PG.
- Runs full Go suite: go test -race -v ./...
- Runs full Python suite: pytest -v clients/python/tests/
- Runs full TypeScript suite: npx vitest --run (replaces tsx src/smoke.ts).

Switching the TS step to vitest also unblocks PR #83, which deletes
the hardcoded src/smoke.ts in favor of the vitest suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NikolayS added a commit that referenced this pull request Apr 30, 2026
client-smoke ran only TestSmoke-named tests and didn't set
PGQUE_TEST_DSN, so env-gated integration tests silently skipped.
That's how the Nack placeholder bug shipped green in PR #79.

New client-tests job:
- Sets PGQUE_TEST_DSN pointing at the docker PG.
- Runs full Go suite: go test -race -v ./...
- Runs full Python suite: pytest -v clients/python/tests/
- Runs full TypeScript suite: npx vitest --run (replaces tsx src/smoke.ts).

Switching the TS step to vitest also unblocks PR #83, which deletes
the hardcoded src/smoke.ts in favor of the vitest suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NikolayS added a commit that referenced this pull request Apr 30, 2026
client-smoke ran only TestSmoke-named tests and didn't set
PGQUE_TEST_DSN, so env-gated integration tests silently skipped.
That's how the Nack placeholder bug shipped green in PR #79.

New client-tests job:
- Sets PGQUE_TEST_DSN pointing at the docker PG.
- Runs full Go suite: go test -race -v ./...
- Runs full Python suite: pytest -v clients/python/tests/
- Runs full TypeScript suite: npx vitest --run (replaces tsx src/smoke.ts).

Switching the TS step to vitest also unblocks PR #83, which deletes
the hardcoded src/smoke.ts in favor of the vitest suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@NikolayS NikolayS mentioned this pull request May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant