feat: configurable sub-second tick rate (default 10 ticks/sec) by NikolayS · Pull Request #204 · NikolayS/PgQue

NikolayS · 2026-05-04T20:55:37Z

Summary

Configurable polling rate even when driven by pg_cron. Default jumps from
1 tick/sec (the pg_cron floor) to 10 ticks/sec (every 100 ms).

New pgque.ticker_loop() PROCEDURE: pg_cron fires it once a second; the
procedure re-invokes pgque.ticker() every tick_period_ms ms inside that
one slot, with a commit between iterations so each tick gets its own
transaction and held-xmin doesn't pile up against rotation.
New column pgque.config.tick_period_ms (default 100, range 1..1000),
added with a safe alter table ... add column if not exists so existing
installs upgrade cleanly.
New helper pgque.set_tick_period_ms(ms) — takes effect on the next
pg_cron slot (≤1 s) without rescheduling.
pgque.start() now schedules CALL pgque.ticker_loop() instead of
SELECT pgque.ticker().
pgque.status() reports the current cadence in ticks/sec.

Why a PROCEDURE (and why no `SET search_path` on it)

Each pgque.ticker() call must run in its own transaction:

Snapshot semantics. A tick records pg_snapshot to mark a batch
boundary. Two ticks inside one transaction would record the same
snapshot — consumers couldn't tell them apart.
Held-xmin / rotation. A long-running transaction holds the cluster
xmin floor and blocks maint_rotate_tables. Per-iteration commit
bounds the held-xmin window to tick_period_ms (100 ms by default)
instead of the 1-second pg_cron slot.

Postgres only allows COMMIT mid-flight inside a procedure — and forbids
combining COMMIT with a SET clause. The body is therefore fully
schema-qualified, runs as SECURITY INVOKER, and is admin-only:

revoke execute on procedure pgque.ticker_loop() from public;
grant  execute on procedure pgque.ticker_loop() to pgque_admin;

The actual security boundary stays in the SECURITY DEFINER functions
(pgque.ticker, the pgque.config updater) that ticker_loop calls.

Known limitation: no statement_timeout guardrail on ticker()

A misbehaving pgque.ticker() call can pin a pg_cron worker until an admin
runs pg_cancel_backend(). Two approaches tried, neither fires:

SET statement_timeout = '...'; CALL pgque.ticker_loop() in the
pg_cron command — pg_cron joins them into one multi-statement
transaction; the procedure's COMMIT then raises "invalid transaction
termination".
set_config('statement_timeout', '...', is_local := true) inside the
procedure body — updates the GUC, but statement_timeout is a
top-level-statement timer. The CALL is the statement; its timer is fixed
at invocation, so changing the GUC mid-procedure has no effect on
subsequent pg_sleep / ticker() calls.

The clock_timestamp()-based budget inside ticker_loop limits how many
additional iterations a slow run can chain, but it cannot cancel a stuck
inner call. pgque.ticker() has no indefinite-block code paths under
normal operation; we accept the residual risk over shipping a guardrail
that doesn't actually fire. Documented inline in
sql/pgque-additions/lifecycle.sql.

Docs

README: new "Tick rate" section; latency framing updated; pg_cron
log-hygiene clarified to not be made worse by sub-second ticking
(still one pg_cron slot per second, regardless of tick_period_ms);
bg-worker / cron.max_running_jobs note added.
docs/three-latencies.md: cadence table rewritten around tick_period_ms;
trade-off bullets (WAL, NOTIFY, metadata dead tuples).
docs/reference.md: entries for ticker_loop() and set_tick_period_ms.
docs/tutorial.md: production-cadence section explains the 10-tick/sec
default and why ticker_loop is a procedure.

Bench

benchmark/tick-rate/ — full harness + results in the directory README.

Idle sweep (30 s / cell, 100 ev/s, single laptop, PG 16):

`tick_period_ms`	p50	p95	p99	max
1000	503	954	994	1004
100 (default)	53	99	103	105
10	8	264	864	1013
1	3	162	460	548

Default tick_period_ms = 100 is the clean point: median ~53 ms, max
~105 ms. Sub-10 ms periods improve the median but the tail blows up to ~1 s
(procedure can't always finish its inner iterations within one pg_cron
slot, so the next slot lands on a still-running worker).

Held-xmin (default 100 ms tick, 1000 ev/s):

condition	duration	p50	p95	p99	max
baseline	60 s	52.6	99.0	103.4	143.8
held-xmin (RR tx open)	300 s	53.8	100.2	104.7	235.6

Median essentially flat; worst-case roughly doubles. Milder than expected
on a 5-min window; longer durations / higher tick rates would amplify the
tail (PR #62 territory, out of scope).

Test plan

tests/test_tick_period.sql — defaults / setter / validation /
multi-tick / single-tick / pgque.start() schedule wiring.
Full tests/run_all.sql suite green locally on PG 16, with and
without pg_cron.
CI run on PG 14/15/16/17/18 (existing test job) — green.
NEW: pg_cron CI job (ci/Dockerfile.pgcron + pgcron-test
workflow). Runs the full suite with pg_cron preloaded; explicitly
fails if any test prints "SKIP: pg_cron not installed", closing the
coverage gap that previously kept the schedule-wiring test
effectively unrun in CI.
Latency bench: producer→consumer p50/p99 at
tick_period_ms ∈ {1, 10, 100, 1000} plus 5-min held-xmin run.

Refs #69

Three small follow-ups from review: 1. Range 1..1000 (was 1..60000): pg_cron's minimum schedule is 1 s, so tick_period_ms > 1000 collapses to "one tick per pg_cron slot" inside ticker_loop's clamp. Reject explicitly rather than silently clamping. To tick less often than once per second, edit the pg_cron schedule string directly. 2. status() now includes the configured tick_period_ms even when pg_cron is unavailable, so operators driving the ticker manually can see the value they configured. 3. README "Tick rate" section gains a note about cron.max_running_jobs: ticker_loop holds one pg_cron bg worker for ~1 s per slot (vs. ~10 ms previously), bounding ~30 pgque-bearing databases per cluster at pg_cron's default of 32. Note on the statement_timeout guardrail considered earlier: NOT folded in. pg_cron concatenates `SET statement_timeout = '...'; CALL pgque.ticker_loop()` into a single multi-statement transaction, and the COMMIT inside the procedure raises "invalid transaction termination" in that wrapper. Documented inline in pgque.start(). Refs #69, PR #204

Two scenarios: - idle: sweep tick_period_ms in {1, 10, 100, 1000} and measure producer→consumer e2e latency at light load. - held-xmin: a long-running RR transaction holds xmin while a 1000 ev/s producer runs at the default 100 ms tick. Demonstrates how tick/subscription metadata UPDATEs degrade under blocked vacuum. Latency is measured server-side: producer stamps clock_timestamp() into the payload, consumer subtracts at receive time. No client clock skew. Refs #69, PR #204

Idle sweep (30 s per cell, 100 ev/s, single laptop, PG16): tick_period_ms=1000: p50 503, p99 994, max 1004 ms (1 tick/sec floor) tick_period_ms=100: p50 53, p99 103, max 105 ms (default; clean) tick_period_ms=10: p50 8, p99 864, max 1013 ms (tail blows up) tick_period_ms=1: p50 3, p99 460, max 548 ms (tail blows up) Held-xmin (tick=100ms, 1000 ev/s, 5 min RR tx open): baseline: p50 52.6, p99 103.4, max 143.8 ms held-xmin: p50 53.8, p99 104.7, max 235.6 ms Median essentially flat under held-xmin; worst-case roughly doubles. Milder than expected — consistent with the design goal of stable latency under load on moderate timescales. Longer runs / higher tick rates would amplify the bloat-driven tail (PR #62 territory). Refs #69, PR #204

NikolayS · 2026-05-04T23:57:33Z

Latency bench results

Harness landed in benchmark/tick-rate/. Single laptop, Postgres 16, pg_cron 1.6.2 with cron.use_background_workers = on. Latency is server-side clock_timestamp() delta (producer stamps payload at send(), consumer subtracts at receive()).

1. Idle sweep — tick rate vs. e2e latency

30 s per cell, 100 ev/s producer, queue configured to tick on every event (ticker_max_count=1, max_lag=0, idle_period=0).

`tick_period_ms`	rate	sent	recvd	p50	p95	p99	max	mean
1000	100	3000	3000	503.32	953.87	993.89	1004.16	503.32
100 (default)	100	3000	3000	52.62	98.86	103.48	105.28	52.74
10	100	3000	3000	8.05	263.53	863.50	1013.39	41.86
1	100	3000	3000	3.26	161.67	460.47	547.91	22.07

Reads:

tick_period_ms = 100 (default) — clean: median 53 ms, max 105 ms, exactly tracks "wait for next tick, mean ≈ period/2". This validates the 10-tick/sec design point.
Sub-10 ms tick periods — median improves to single digits but the tail blows up to ~1 s. Suspected cause: at sub-10 ms periods the procedure can't complete its inner iterations within one pg_cron 1-second slot, so the next slot lands on a still-running worker and effectively skips a tick window.
The originally-requested 0.1 ticks/sec (tick_period_ms = 10000) is not measurable by design: ticker_loop clamps the internal period at the 1-second pg_cron slot. Pushed an explicit 1..1000 range check into set_tick_period_ms to remove the dead zone (commit aa75879 on this branch). Tick "less than once per second" is what the pg_cron schedule string itself controls.

2. Held-xmin — does metadata bloat hurt the default path?

5 min at the default 100 ms tick / 1000 ev/s producer, with a separate session running BEGIN ISOLATION LEVEL REPEATABLE READ; ... pg_sleep(...); for the duration so xmin is held and autovacuum can't reclaim dead tuples on pgque.tick / pgque.subscription. Compared against a 60 s baseline at the same tick + rate without the holder.

condition	duration	sent	recvd	p50	p95	p99	max	mean
baseline (no held-xmin)	60 s	60 000	60 000	52.60	98.96	103.40	143.76	52.75
held-xmin (RR tx open)	300 s	300 000	300 000	53.83	100.18	104.73	235.56	54.03

Reads:

p50 / p95 / p99 essentially unchanged. Median e2e at 53 ms; p99 at 104 ms.
Worst case roughly doubles (144 ms → 236 ms). Real, but mild.
This is much milder than the upstream-PgQ R7 90-min picture (peak dead tuples > 21k upstream vs. ≤ 1000 with PR Rotate subscription and tick tables to avoid held-xmin bloat (#61) #62). Two reasons it looks calm here:
- 5 min is short. Metadata bloat scales with held duration; 30+ min would visibly degrade the tail further.
- 10 ticks/sec is the default. Higher tick rates amplify metadata-table churn proportionally.

The takeaway is not "held-xmin is fine, deprioritise #62". It's that under a moderate held-vacuum window the median is robust and the tail degrades gracefully — which is the design goal. PR #62's rotation fix extends the ceiling on that stability. Out of scope here as previously noted.

Status of the "known gaps" from the PR description

Set up PgQ git submodule (vendor/pgq/) #1 Latency benchmark — the two tables above.
Create build/transform.sh — mechanical transformation pipeline #2 pgque.start() / ticker_loop wiring test — confirmed passing locally with pg_cron. Output: pgque started: ticker=1 (10.00 ticks/sec), retry_events=2, maint=3, rotate_step2=4. CI still skips this test (no pg_cron); a pg_cron-enabled CI variant is a separate follow-up.
Add pgque-specific schema additions (config table, roles, queue_max_retries) #3 bg-worker note — added to README "Tick rate" trade-offs (commit aa75879).
CI pipeline and regression test suite (PG 14–18) #5 status() shows tick_period_ms without pg_cron — done (commit aa75879).
Assemble pgque-install.sql and pgque-uninstall.sql with idempotency #4 statement_timeout guardrail — not folded in. pg_cron concatenates SET statement_timeout = '...'; CALL pgque.ticker_loop() into one multi-statement transaction, and the procedure's COMMIT raises "invalid transaction termination" inside that wrapper. Documented inline in pgque.start(). A SET LOCAL statement_timeout inside the procedure body would work but adds non-trivial complexity for a defensive guardrail; punting.

Bench raw output and a methodology README are committed in benchmark/tick-rate/.

Generated by Claude Code

Two follow-ups from the PR review. 1. pg_cron CI variant (closes the test-coverage gap) New ci/Dockerfile.pgcron installs postgresql-NN-cron over the official postgres image; new pgcron-test workflow job builds it, starts PG with shared_preload_libraries=pg_cron and cron.use_background_workers=on, runs the full regression suite, and explicitly fails if any pg_cron-only test prints "SKIP: pg_cron not installed" (so a future accidental loss of coverage is loud, not silent). This makes test_tick_period's "pgque.start() schedules CALL pgque.ticker_loop()" assertion run in CI for the first time, plus the four pgque.start() / pgque.stop() cases in test_pgcron_lifecycle. 2. statement_timeout guardrail: honestly cannot be enforced Tried two approaches; neither works: - SET statement_timeout = '...'; CALL pgque.ticker_loop() in the pg_cron command -- pg_cron concatenates them into one multi-statement transaction, and the procedure's COMMIT raises "invalid transaction termination". - set_config('statement_timeout', '1500ms', is_local := true) inside the procedure body -- this updates the GUC value, but statement_timeout is a top-level-statement timer. The CALL is the statement; its timer is fixed at invocation. Setting the GUC mid-procedure does not restart or re-arm the timer, so subsequent pg_sleep / pgque.ticker() work runs unguarded. Verified by reproduction. Reverted the set_config attempt and replaced the inline comment with the full diagnosis. ticker_loop self-bounds via clock_timestamp() to limit how many additional iterations a slow ticker can chain, but a genuinely hung pgque.ticker() will pin the pg_cron worker until an admin pg_cancel_backend()s it. ticker() has no indefinite-block code paths under normal operation; we accept the residual risk over shipping a guardrail that doesn't actually fire. Refs #69, PR #204

NikolayS · 2026-05-05T00:45:42Z

Closing the remaining gaps

Both items left from the earlier "what's left" rundown are addressed in commit a43967d:

✅ pg_cron CI variant (gap #2)

New ci/Dockerfile.pgcron (mirrors the pg_tle pattern) plus a pgcron-test workflow job that:

builds a postgres image with postgresql-NN-cron installed,
starts PG with shared_preload_libraries=pg_cron, cron.use_background_workers=on, cron.database_name=pgque_test,
runs the full tests/run_all.sql,
explicitly fails if any test still prints SKIP: pg_cron not installed — so a future accidental loss of pg_cron coverage is loud, not silent.

This makes the pgque.start() → CALL pgque.ticker_loop() schedule-wiring assertion run in CI for the first time, plus the four pgque.start() / pgque.stop() cases in test_pgcron_lifecycle.

❌→📝 statement_timeout guardrail (gap #4)

Cannot be made to work; documented honestly. Tried:

SET statement_timeout = '1500ms'; CALL pgque.ticker_loop() in the pg_cron command — pg_cron concatenates the two into one multi-statement transaction, the procedure's COMMIT then raises "invalid transaction termination". (This is what we already knew.)
set_config('statement_timeout', '1500ms', is_local := true) at the top of each iteration inside the procedure body — confirmed: this updates the GUC value (verifiable with current_setting) but does not enforce the timeout. statement_timeout is a top-level-statement timer; the CALL is the statement; its timer was set at CALL invocation. Changing the GUC mid-procedure does not restart or re-arm the timer, so subsequent pg_sleep / ticker() work runs unguarded. Reproduced with a slow stub: pg_sleep(2.5) inside the procedure returned cleanly even with set_config('statement_timeout','1500ms',true) set immediately before it.

Reverted the set_config attempt. Inline source comment in sql/pgque-additions/lifecycle.sql records the diagnosis so the next person doesn't re-walk the same ground. Residual risk: a genuinely hung pgque.ticker() will pin the pg_cron worker until an admin pg_cancel_backend()s it. ticker() is short and has no indefinite-block code paths under normal operation; we accept the risk over shipping a guardrail that doesn't actually fire.

PR description updated. Ready for another look.

Generated by Claude Code

NikolayS · 2026-05-05T01:51:12Z

REV review — PR #204

Scope: configurable sub-second tick rate / default 10 ticks/sec, v0.2.0 readiness.

Verdict

Request changes if the API promises arbitrary 1..1000 ms cadence.

The default 100 ms / 10 ticks/sec path looks coherent and well covered. The issue is general configurability: non-divisor periods are reported/documented as if they run at the exact requested cadence, but the loop floors the number of iterations per 1-second cron window.

Blocking / must fix before merge

MAJOR — Non-divisor tick periods produce inaccurate effective rates

Evidence from sql/pgque-additions/lifecycle.sql:

v_iter_budget := greatest(1, v_window_ms / v_period_ms);

This is integer floor division within the 1000 ms window. But status/docs report idealized cadence as 1000.0 / tick_period_ms.

Examples:

tick_period_ms = 750 reports ~1.33 ticks/sec, but the loop runs 1 tick/sec.
tick_period_ms = 251 reports ~3.98 ticks/sec, but the loop runs 3 ticks/sec.

Tests cover clean divisor values such as 100, 200, 1000, but not values like 251 or 750.

Recommended v0.2.0 fix: restrict accepted values to divisors of 1000 ms, or explicitly document/report the floored effective cadence. I prefer restriction for now: simple, honest, and testable.

Non-blocking docs fixes

LOW — WAL math in docs/three-latencies.md

The docs say 25 ms is “25× WAL of default”. With the new default 100 ms, 25 ms is ~4× default, not 25×.

LOW — 1 ms docs overstate latency

Docs imply 1 ms gives sub-ms behavior, but the benchmark data shows roughly p50 3.26 ms, p99 161.67 ms, max 547.91 ms. Better to frame it as “very aggressive / not recommended broadly” rather than sub-ms.

Positive evidence

Default 100 ms behavior appears coherent.
Upgrade path adding pgque.config.tick_period_ms default 100 not null looks safe.
GitHub CI is green across PG 14–18, clients, pg_tle, and pg_cron-related checks.
The benchmark harness is useful and should stay.

Summary

This is likely a good v0.2.0 feature, but only after the accepted-value contract matches actual scheduler behavior.

NikolayS · 2026-05-05T06:01:50Z

REV blocker fix: reject non-divisor tick periods

Pushed 5feed16 to address the cadence-reporting blocker for arbitrary tick_period_ms values.

What changed

tick_period_ms is now restricted to exact divisors of 1000 in the 1..1000 ms range.
- Examples accepted: 1, 2, 4, 5, 10, 20, 25, 40, 50, 100, 125, 200, 250, 500, 1000.
- Examples rejected: 251, 750.
Enforcement is in both layers:
- pgque.set_tick_period_ms(...) validates before update.
- pgque.config has a check constraint so direct table writes cannot bypass it.
Upgrade path normalizes any pre-constraint experimental invalid value back to 100 before adding the stricter constraint.
Tests now cover rejecting 251 and 750.
Docs updated:
- accepted periods must divide 1000 exactly;
- 25ms WAL/churn wording fixed to ~4× default, not 25×;
- 1ms no longer claims sub-ms e2e; docs now describe low single-digit ms in the current benchmark.

Validation

./build/transform.sh ✅
git diff --check ✅
fresh install of generated sql/pgque.sql into local temp DB ✅
tests/test_tick_period.sql ✅
- pg_cron-specific section skipped locally because pg_cron is not installed in that DB
direct DB checks ✅
- set_tick_period_ms(250) accepted
- direct pgque.config updates to 750 and 0 rejected by the check constraint

This keeps the v0.2.0 behavior honest: every advertised tick period maps exactly to an integer number of ticker iterations inside pg_cron's 1000 ms slot.

NikolayS · 2026-05-05T06:03:28Z

Pulled and verified 5feed16 locally:

Build green: bash build/transform.sh → ASSEMBLY + PACKAGING complete.
Full tests/run_all.sql green on a fresh PG 16 database (without pg_cron) — including the new test-3 cases that reject 251 and 750.

Both enforcement layers fire as intended:

select pgque.set_tick_period_ms(250);   -- → 250
select pgque.set_tick_period_ms(251);   -- ERROR: tick_period_ms must be an exact divisor of 1000 (got 251)
select pgque.set_tick_period_ms(750);   -- ERROR: tick_period_ms must be an exact divisor of 1000 (got 750)
update pgque.config set tick_period_ms = 750;  -- ERROR: new row violates check constraint "config_tick_period_ms_check"

Function-level rejection plus table-level CHECK is the right shape — keeps invariants honest even if a future caller bypasses the setter.

The honest-cadence framing ("every advertised tick period maps exactly to an integer number of iterations inside the 1000 ms slot") is materially better than the silent-floor behavior the original implementation had. Doc fixes (25 ms ≈ 4× WAL not 25×, 1 ms no longer claims sub-ms e2e) match the bench numbers.

LGTM. Ready for external review on my end.

Generated by Claude Code

pg_cron's minimum schedule is 1 second, which capped end-to-end latency at ~1 s for non-LISTEN consumers. Drive sub-second ticking from inside a single pg_cron slot via a new pgque.ticker_loop() procedure that re-invokes pgque.ticker() at pgque.config.tick_period_ms cadence (default 100 ms = 10 Hz). The procedure commits between iterations so each tick gets its own transaction and rotation isn't blocked by a held xmin (this is also why ticker_loop is a PROCEDURE, not a FUNCTION). Tunable at runtime with pgque.set_tick_period_ms(ms); changes apply on the next pg_cron slot without rescheduling. Refs #69

…iene - README: replace "1 second tick" framing with "10 Hz default"; new "Tick rate" section covers `pgque.set_tick_period_ms`, trade-offs (WAL, NOTIFY, metadata-table dead tuples), and clarifies that the per-second pg_cron slot count does NOT increase with sub-second ticking — the cron.job_run_details growth rate is unchanged. - docs/three-latencies.md: rewrite the cadence table around tick_period_ms with new rough numbers per rate; explicitly note the pg_cron logging problem is independent of sub-second ticking. - docs/reference.md: document `pgque.ticker_loop()` and `pgque.set_tick_period_ms(ms)`; expand `start()` and `ticker()` notes. - docs/tutorial.md: extend "Production cadence" with the 10 Hz default, why ticker_loop is a procedure (per-iteration commit / snapshot semantics / xmin), and the unchanged log-hygiene recipe. - lifecycle.sql: switch status() detail strings from `||` concatenation to format(). Refs #69

"Hz" reads as electronics/CPU-clock vocabulary in queue/DB context, and implies a precision the loop doesn't actually deliver — real cycle is tick_work + sleep, so 100 ms period yields ~9 ticks/s, not exactly 10. "ticks/sec" is more natural for this domain. Replaces all user-facing strings (status() detail, start() notice, README, docs, test message). No behaviour change.

Three small follow-ups from review: 1. Range 1..1000 (was 1..60000): pg_cron's minimum schedule is 1 s, so tick_period_ms > 1000 collapses to "one tick per pg_cron slot" inside ticker_loop's clamp. Reject explicitly rather than silently clamping. To tick less often than once per second, edit the pg_cron schedule string directly. 2. status() now includes the configured tick_period_ms even when pg_cron is unavailable, so operators driving the ticker manually can see the value they configured. 3. README "Tick rate" section gains a note about cron.max_running_jobs: ticker_loop holds one pg_cron bg worker for ~1 s per slot (vs. ~10 ms previously), bounding ~30 pgque-bearing databases per cluster at pg_cron's default of 32. Note on the statement_timeout guardrail considered earlier: NOT folded in. pg_cron concatenates `SET statement_timeout = '...'; CALL pgque.ticker_loop()` into a single multi-statement transaction, and the COMMIT inside the procedure raises "invalid transaction termination" in that wrapper. Documented inline in pgque.start(). Refs #69, PR #204

Two scenarios: - idle: sweep tick_period_ms in {1, 10, 100, 1000} and measure producer→consumer e2e latency at light load. - held-xmin: a long-running RR transaction holds xmin while a 1000 ev/s producer runs at the default 100 ms tick. Demonstrates how tick/subscription metadata UPDATEs degrade under blocked vacuum. Latency is measured server-side: producer stamps clock_timestamp() into the payload, consumer subtracts at receive time. No client clock skew. Refs #69, PR #204

Idle sweep (30 s per cell, 100 ev/s, single laptop, PG16): tick_period_ms=1000: p50 503, p99 994, max 1004 ms (1 tick/sec floor) tick_period_ms=100: p50 53, p99 103, max 105 ms (default; clean) tick_period_ms=10: p50 8, p99 864, max 1013 ms (tail blows up) tick_period_ms=1: p50 3, p99 460, max 548 ms (tail blows up) Held-xmin (tick=100ms, 1000 ev/s, 5 min RR tx open): baseline: p50 52.6, p99 103.4, max 143.8 ms held-xmin: p50 53.8, p99 104.7, max 235.6 ms Median essentially flat under held-xmin; worst-case roughly doubles. Milder than expected — consistent with the design goal of stable latency under load on moderate timescales. Longer runs / higher tick rates would amplify the bloat-driven tail (PR #62 territory). Refs #69, PR #204

Two follow-ups from the PR review. 1. pg_cron CI variant (closes the test-coverage gap) New ci/Dockerfile.pgcron installs postgresql-NN-cron over the official postgres image; new pgcron-test workflow job builds it, starts PG with shared_preload_libraries=pg_cron and cron.use_background_workers=on, runs the full regression suite, and explicitly fails if any pg_cron-only test prints "SKIP: pg_cron not installed" (so a future accidental loss of coverage is loud, not silent). This makes test_tick_period's "pgque.start() schedules CALL pgque.ticker_loop()" assertion run in CI for the first time, plus the four pgque.start() / pgque.stop() cases in test_pgcron_lifecycle. 2. statement_timeout guardrail: honestly cannot be enforced Tried two approaches; neither works: - SET statement_timeout = '...'; CALL pgque.ticker_loop() in the pg_cron command -- pg_cron concatenates them into one multi-statement transaction, and the procedure's COMMIT raises "invalid transaction termination". - set_config('statement_timeout', '1500ms', is_local := true) inside the procedure body -- this updates the GUC value, but statement_timeout is a top-level-statement timer. The CALL is the statement; its timer is fixed at invocation. Setting the GUC mid-procedure does not restart or re-arm the timer, so subsequent pg_sleep / pgque.ticker() work runs unguarded. Verified by reproduction. Reverted the set_config attempt and replaced the inline comment with the full diagnosis. ticker_loop self-bounds via clock_timestamp() to limit how many additional iterations a slow ticker can chain, but a genuinely hung pgque.ticker() will pin the pg_cron worker until an admin pg_cancel_backend()s it. ticker() has no indefinite-block code paths under normal operation; we accept the residual risk over shipping a guardrail that doesn't actually fire. Refs #69, PR #204

NikolayS · 2026-05-05T08:09:25Z

Rebased on current `main`

Rebased PR #204 on current main after #203/#206 landed, resolving the docs conflict by keeping the new canonical force_next_tick() wording from #206 and the exact-divisor tick-period wording from this PR.

Current head includes:

default 100 ms / 10 ticks/sec ticker loop;
tick_period_ms restricted to exact divisors of 1000;
pg_cron CI path;
pg_tle generated install script regenerated on current main;
docs use force_next_tick() in user-facing examples where applicable.

Local validation after rebase:

bash build/transform.sh ✅
git diff --check ✅
fresh install of generated sql/pgque.sql ✅
tests/test_tick_period.sql ✅
- pg_cron-specific section skipped locally because this DB lacks pg_cron; GitHub pg_cron job covers it.

My current verdict: still ready after CI reruns. The previous REV blocker was the non-divisor cadence issue, and this branch now rejects those values honestly at setter + config constraint layers.

NikolayS changed the title ~~feat: configurable sub-second tick rate (default 10 Hz)~~ feat: configurable sub-second tick rate (default 10 ticks/sec) May 4, 2026

NikolayS marked this pull request as ready for review May 4, 2026 23:35

claude and others added 8 commits May 5, 2026 08:05

fix: reject non-divisor tick periods

8b58cb2

NikolayS force-pushed the claude/configurable-polling-rate-BmevU branch from 5feed16 to 8b58cb2 Compare May 5, 2026 08:09

NikolayS merged commit ab9183a into main May 5, 2026
11 checks passed

NikolayS deleted the claude/configurable-polling-rate-BmevU branch May 5, 2026 08:10

NikolayS mentioned this pull request May 5, 2026

docs: document the separate-transactions rule across user-facing surfaces #190

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: configurable sub-second tick rate (default 10 ticks/sec)#204

feat: configurable sub-second tick rate (default 10 ticks/sec)#204
NikolayS merged 8 commits intomainfrom
claude/configurable-polling-rate-BmevU

NikolayS commented May 4, 2026 •

edited

Loading

Uh oh!

NikolayS commented May 4, 2026

Uh oh!

NikolayS commented May 5, 2026

Uh oh!

NikolayS commented May 5, 2026

Uh oh!

NikolayS commented May 5, 2026

Uh oh!

NikolayS commented May 5, 2026

Uh oh!

NikolayS commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NikolayS commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why a PROCEDURE (and why no SET search_path on it)

Known limitation: no statement_timeout guardrail on ticker()

Docs

Bench

Test plan

Uh oh!

NikolayS commented May 4, 2026

Latency bench results

1. Idle sweep — tick rate vs. e2e latency

2. Held-xmin — does metadata bloat hurt the default path?

Status of the "known gaps" from the PR description

Uh oh!

NikolayS commented May 5, 2026

Closing the remaining gaps

✅ pg_cron CI variant (gap #2)

❌→📝 statement_timeout guardrail (gap #4)

Uh oh!

NikolayS commented May 5, 2026

REV review — PR #204

Verdict

Blocking / must fix before merge

Non-blocking docs fixes

Positive evidence

Summary

Uh oh!

NikolayS commented May 5, 2026

REV blocker fix: reject non-divisor tick periods

What changed

Validation

Uh oh!

NikolayS commented May 5, 2026

Uh oh!

NikolayS commented May 5, 2026

Rebased on current main

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NikolayS commented May 4, 2026 •

edited

Loading

Why a PROCEDURE (and why no `SET search_path` on it)

Rebased on current `main`