Skip to content

fix(pgque.receive): reject max_return < 1 and document batch-ownership#114

Merged
NikolayS merged 4 commits intomainfrom
fix/receive-max-return-validation
Apr 30, 2026
Merged

fix(pgque.receive): reject max_return < 1 and document batch-ownership#114
NikolayS merged 4 commits intomainfrom
fix/receive-max-return-validation

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

Summary

Closes #95.

Two related issues in pgque.receive(queue, consumer, max_return) are fixed or documented in this PR:

  • Input validation (fix): max_return < 1 now raises pgque.receive: max_return must be >= 1, got <n> immediately, before any batch is opened. This eliminates the off-by-one where max_return = 0 silently returned 1 row.
  • Batch-ownership caveat (docs): A warning is added to docs/reference.md clarifying that ack(batch_id) always advances the consumer cursor past the full underlying batch, regardless of how many rows receive() returned. Changing batch semantics is deferred to v0.3.

Test evidence

Red commit confirmed failure before fix:

psql:tests/test_api_receive.sql:203: ERROR:  receive(..., 0) must raise an error
CONTEXT:  PL/pgSQL function inline_code_block line 13 at ASSERT

Green after fix — full suite output:

=== pgque regression test suite ===
...
psql:tests/test_api_receive.sql:203: NOTICE:  PASS: receive rejects max_return < 1
psql:tests/test_api_receive.sql:217: NOTICE:  PASS: receive + ack semantics
...
=== ALL TESTS PASSED ===

Commits

  1. test: red regression test asserting receive(..., 0) raises an error
  2. fix: input guard in sql/pgque.sqlraise exception when i_max_return < 1
  3. docs: batch-ownership caveat added to docs/reference.md under pgque.receive()

NikolayS and others added 3 commits April 30, 2026 02:15
Add failing regression test asserting that pgque.receive(..., 0)
raises an error containing 'max_return must be >= 1'. Test confirms
the off-by-one bug: currently returns 1 row instead of erroring.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add input validation at the top of pgque.receive(): raise exception
with message 'pgque.receive: max_return must be >= 1, got <n>' when
i_max_return < 1. Fixes the off-by-one where max_return=0 returned
1 row due to exit-after-return loop ordering.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Document that max_return must be >= 1 and that ack(batch_id) always
advances the consumer cursor past the full underlying batch regardless
of how many rows were returned by receive(). Warns callers using
partial receives to consume the full batch before acking.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NikolayS
Copy link
Copy Markdown
Owner Author

REV Review — PR #114

CI: ❌ FAIL — test (14), test (15), test (16), test (17), test (18) all failing on Run regression tests. verify and client-smoke pass.
Functional evidence: The new red test tests/test_api_receive.sql:182-203 correctly fires on CI: psql:tests/test_api_receive.sql:203: ERROR: receive(..., 0) must raise an error — meaning the validation guard never raised, and the test fell through to assert v_raised. Root cause is the next item.
Verdict: FIXES NEEDED

Blocking

  1. Fix lives in the build artifact, not the source. The diff edits sql/pgque.sql (lines 4583-4585) but leaves sql/pgque-api/receive.sql untouched. CI runs bash build/transform.sh (see .github/workflows/ci.yml), which regenerates sql/pgque.sql from sql/pgque-api/* and overwrites the validation guard. After rebuild, pgque.receive(..., 0) falls through to next_batch/get_batch_events exactly as before, the new red test fires, and the suite aborts on PG 14-18. The fix must be added to sql/pgque-api/receive.sql:33 (right after begin) — and ideally sql/pgque.sql regenerated locally so the committed artifact matches.

Non-blocking

  1. Docs caveat is slightly imprecise. docs/reference.md:79 says "If max_return < ticker_max_count, calling ack() after a partial receive will drop the unreturned rows." The actual condition is max_return < (number of events in the underlying batch). A batch can be smaller than queue_ticker_max_count (small inserts, idle ticks, max_lag flush). The guidance "consume the full batch before acking" is correct; the threshold phrasing is a near-miss. Suggest: "If max_return is smaller than the batch produced by next_batch, …".
  2. Commit subject length. fix(pgque.receive): reject max_return < 1 is fine. docs(reference.md): add batch-ownership caveat for pgque.receive is 60 chars (over the 50-char Conventional Commits guideline in CLAUDE.md). Not blocking — project precedent allows ~60.

Potential

  1. NULL max_return. if i_max_return < 1 evaluates to NULL (not true) when i_max_return is null, so the guard does not fire. The function would then proceed with cnt >= null ⇒ never exits the loop and returns the whole batch. Issue Bug: receive(max_return) can return extra row and ack can skip unreturned messages #95 didn't ask for this, and the function has default 100, so callers passing explicit NULL is unusual — but a coalesce or explicit is null check would be more defensive. (Score: 5 — worth a follow-up, not blocking.)
  2. SQLSTATE class. raise exception '...' defaults to P0001 (raise_exception). Issue Bug: receive(max_return) can return extra row and ack can skip unreturned messages #95 commentary leaned toward "explicit error rejecting". The more idiomatic class for this case is 22023 invalid_parameter_value via using errcode = '22023', which would let driver code distinguish bad-input from generic plpgsql errors. Test asserts on message text so behavior is preserved either way. (Score: 4 — stylistic.)
  3. Test does not assert SQLSTATE. assert sqlerrm like '%max_return must be >= 1%' couples the test to the human-readable message. If the message text is later refined (e.g., to mention that max_return is the partial-batch knob), the test will need editing. Asserting on sqlstate would be more durable. (Score: 4.)

Summary

Area Findings Potential Filtered
Security 0 0 0
Bugs 1 (blocking) 1 (NULL handling) 0
Tests 0 1 (assert on message text) 0
Guidelines 0 (1 nit on commit subject length) 0 0
Docs 0 (1 nit on threshold wording) 0 0

Anti-leak: ✅ clean


REV-style review (security, bugs, tests, guidelines, docs). SOC2 items skipped per project policy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NikolayS
Copy link
Copy Markdown
Owner Author

Addressing the REV blocking finding from #114 (comment):

Root cause (1 sentence): build/transform.sh regenerates sql/pgque.sql from source files on every CI run, so any guard applied only to the built artifact is silently overwritten before the red test fires.

Fix: Guard moved to sql/pgque-api/receive.sql (the true source). Build re-run locally — all 17 pipeline checks passed, including the assembly verification. Regenerated sql/pgque.sql contains the guard at line 4584.

Commit: acc362efix: apply receive guard at source, not built artifact

Diff on this branch (git diff origin/main..HEAD --name-only) includes both:

  • sql/pgque-api/receive.sql — source-level fix
  • sql/pgque.sql — regenerated built artifact (unchanged from previous commit since the regen output is identical)

CI is running — will update with gh pr checks 114 output when complete.

@NikolayS
Copy link
Copy Markdown
Owner Author

CI results (gh pr checks 114):

verify          pass   10s
test (14)       pass   25s
test (15)       pass   22s
test (16)       pass   23s
test (17)       pass   25s
test (18)       pass   27s
claude-review   pending  (async)
client-smoke    pending  (async)

All 5 PG matrix jobs (14–18) green. verify green. Source fix confirmed working.

@NikolayS
Copy link
Copy Markdown
Owner Author

REV Review — PR #114 (round 2)

CI: PASS — test (14), test (15), test (16), test (17), test (18), verify, client-smoke all green. (claude-review is the bot job, not blocking.)
R1 BLOCKING resolution status: Addressed correctly.
Functional evidence: Loop fix commit acc362e adds the guard at sql/pgque-api/receive.sql:33 (right after begin), exactly where R1 prescribed. The committed sql/pgque.sql regen matches the source: grep "max_return must be" sql/pgque.sql returns line 4584 inside the pgque.receive() body. pgque.receive() is security definer set search_path = pgque, pg_catalog, so the guard runs under the pinned search_path (CLAUDE.md key design rule #1 satisfied). The full PR diff is 4 files / +36 / -1 — no incidental drift in the regen.
Verdict: READY FOR USER REVIEW

Blocking

None.

Non-blocking

  1. Atomicity of source + regen split across two commits. The original built-artifact edit (0e7fc3c, sql/pgque.sql) and the source edit (acc362e, sql/pgque-api/receive.sql) live in separate commits. The PR-as-a-whole is consistent (final tree has both source and regenerated artifact in sync), but the ideal pattern is "source change + regen in the same commit" so any single commit, including bisect points, builds cleanly. Carry-forward note for future fixes; not worth a rebase here.
  2. R1 docs nit still applies. docs/reference.md:79 still phrases the threshold as max_return < ticker_max_count; the actual condition is max_return < (size of the produced batch), which can be smaller than ticker_max_count due to ticker_max_lag / ticker_idle_period flushes. Not regressing — already flagged in R1 — left for a follow-up.

Potential

  1. NULL max_return still slips past the guard. if i_max_return < 1 is NULL-false when i_max_return is null, so an explicit pgque.receive('q','c', null) call falls through. R1 noted this; deferred to a follow-up. (Score: 5.)
  2. SQLSTATE not asserted. raise exception '...' defaults to P0001; test asserts on message text. R1 noted this; durability nit, not a regression. (Score: 4.)

Summary

Area Findings Potential Filtered
Security 0 0 0
Bugs 0 1 (NULL max_return, carried over) 0
Tests 0 1 (assert on message text, carried over) 0
Guidelines 0 (1 nit: source+regen atomicity) 0 0
Docs 0 (1 nit: threshold wording, carried over) 0 0

Anti-leak: clean — diff, PR body, comments, and commit messages all scanned.


REV-style review round 2 (security, bugs, tests, guidelines, docs). SOC2 items skipped per project policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: receive(max_return) can return extra row and ack can skip unreturned messages

1 participant