Skip to content

feat: configurable sub-second tick rate (default 10 ticks/sec)#204

Merged
NikolayS merged 8 commits intomainfrom
claude/configurable-polling-rate-BmevU
May 5, 2026
Merged

feat: configurable sub-second tick rate (default 10 ticks/sec)#204
NikolayS merged 8 commits intomainfrom
claude/configurable-polling-rate-BmevU

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

@NikolayS NikolayS commented May 4, 2026

Summary

Configurable polling rate even when driven by pg_cron. Default jumps from
1 tick/sec (the pg_cron floor) to 10 ticks/sec (every 100 ms).

  • New pgque.ticker_loop() PROCEDURE: pg_cron fires it once a second; the
    procedure re-invokes pgque.ticker() every tick_period_ms ms inside that
    one slot, with a commit between iterations so each tick gets its own
    transaction and held-xmin doesn't pile up against rotation.
  • New column pgque.config.tick_period_ms (default 100, range 1..1000),
    added with a safe alter table ... add column if not exists so existing
    installs upgrade cleanly.
  • New helper pgque.set_tick_period_ms(ms) — takes effect on the next
    pg_cron slot (≤1 s) without rescheduling.
  • pgque.start() now schedules CALL pgque.ticker_loop() instead of
    SELECT pgque.ticker().
  • pgque.status() reports the current cadence in ticks/sec.

Why a PROCEDURE (and why no SET search_path on it)

Each pgque.ticker() call must run in its own transaction:

  1. Snapshot semantics. A tick records pg_snapshot to mark a batch
    boundary. Two ticks inside one transaction would record the same
    snapshot — consumers couldn't tell them apart.
  2. Held-xmin / rotation. A long-running transaction holds the cluster
    xmin floor and blocks maint_rotate_tables. Per-iteration commit
    bounds the held-xmin window to tick_period_ms (100 ms by default)
    instead of the 1-second pg_cron slot.

Postgres only allows COMMIT mid-flight inside a procedure — and forbids
combining COMMIT with a SET clause. The body is therefore fully
schema-qualified, runs as SECURITY INVOKER, and is admin-only:

revoke execute on procedure pgque.ticker_loop() from public;
grant  execute on procedure pgque.ticker_loop() to pgque_admin;

The actual security boundary stays in the SECURITY DEFINER functions
(pgque.ticker, the pgque.config updater) that ticker_loop calls.

Known limitation: no statement_timeout guardrail on ticker()

A misbehaving pgque.ticker() call can pin a pg_cron worker until an admin
runs pg_cancel_backend(). Two approaches tried, neither fires:

  • SET statement_timeout = '...'; CALL pgque.ticker_loop() in the
    pg_cron command — pg_cron joins them into one multi-statement
    transaction; the procedure's COMMIT then raises "invalid transaction
    termination"
    .
  • set_config('statement_timeout', '...', is_local := true) inside the
    procedure body — updates the GUC, but statement_timeout is a
    top-level-statement timer. The CALL is the statement; its timer is fixed
    at invocation, so changing the GUC mid-procedure has no effect on
    subsequent pg_sleep / ticker() calls.

The clock_timestamp()-based budget inside ticker_loop limits how many
additional iterations a slow run can chain, but it cannot cancel a stuck
inner call. pgque.ticker() has no indefinite-block code paths under
normal operation; we accept the residual risk over shipping a guardrail
that doesn't actually fire. Documented inline in
sql/pgque-additions/lifecycle.sql.

Docs

  • README: new "Tick rate" section; latency framing updated; pg_cron
    log-hygiene clarified to not be made worse by sub-second ticking
    (still one pg_cron slot per second, regardless of tick_period_ms);
    bg-worker / cron.max_running_jobs note added.
  • docs/three-latencies.md: cadence table rewritten around tick_period_ms;
    trade-off bullets (WAL, NOTIFY, metadata dead tuples).
  • docs/reference.md: entries for ticker_loop() and set_tick_period_ms.
  • docs/tutorial.md: production-cadence section explains the 10-tick/sec
    default and why ticker_loop is a procedure.

Bench

benchmark/tick-rate/ — full harness + results in the directory README.

Idle sweep (30 s / cell, 100 ev/s, single laptop, PG 16):

tick_period_ms p50 p95 p99 max
1000 503 954 994 1004
100 (default) 53 99 103 105
10 8 264 864 1013
1 3 162 460 548

Default tick_period_ms = 100 is the clean point: median ~53 ms, max
~105 ms. Sub-10 ms periods improve the median but the tail blows up to ~1 s
(procedure can't always finish its inner iterations within one pg_cron
slot, so the next slot lands on a still-running worker).

Held-xmin (default 100 ms tick, 1000 ev/s):

condition duration p50 p95 p99 max
baseline 60 s 52.6 99.0 103.4 143.8
held-xmin (RR tx open) 300 s 53.8 100.2 104.7 235.6

Median essentially flat; worst-case roughly doubles. Milder than expected
on a 5-min window; longer durations / higher tick rates would amplify the
tail (PR #62 territory, out of scope).

Test plan

  • tests/test_tick_period.sql — defaults / setter / validation /
    multi-tick / single-tick / pgque.start() schedule wiring.
  • Full tests/run_all.sql suite green locally on PG 16, with and
    without pg_cron.
  • CI run on PG 14/15/16/17/18 (existing test job) — green.
  • NEW: pg_cron CI job (ci/Dockerfile.pgcron + pgcron-test
    workflow). Runs the full suite with pg_cron preloaded; explicitly
    fails if any test prints "SKIP: pg_cron not installed", closing the
    coverage gap that previously kept the schedule-wiring test
    effectively unrun in CI.
  • Latency bench: producer→consumer p50/p99 at
    tick_period_ms ∈ {1, 10, 100, 1000} plus 5-min held-xmin run.

Refs #69

@NikolayS NikolayS changed the title feat: configurable sub-second tick rate (default 10 Hz) feat: configurable sub-second tick rate (default 10 ticks/sec) May 4, 2026
@NikolayS NikolayS marked this pull request as ready for review May 4, 2026 23:35
NikolayS pushed a commit that referenced this pull request May 4, 2026
Three small follow-ups from review:

1. Range 1..1000 (was 1..60000): pg_cron's minimum schedule is 1 s, so
   tick_period_ms > 1000 collapses to "one tick per pg_cron slot" inside
   ticker_loop's clamp. Reject explicitly rather than silently clamping.
   To tick less often than once per second, edit the pg_cron schedule
   string directly.

2. status() now includes the configured tick_period_ms even when pg_cron
   is unavailable, so operators driving the ticker manually can see the
   value they configured.

3. README "Tick rate" section gains a note about cron.max_running_jobs:
   ticker_loop holds one pg_cron bg worker for ~1 s per slot (vs. ~10 ms
   previously), bounding ~30 pgque-bearing databases per cluster at
   pg_cron's default of 32.

Note on the statement_timeout guardrail considered earlier: NOT folded
in. pg_cron concatenates `SET statement_timeout = '...'; CALL
pgque.ticker_loop()` into a single multi-statement transaction, and the
COMMIT inside the procedure raises "invalid transaction termination" in
that wrapper. Documented inline in pgque.start().

Refs #69, PR #204
NikolayS pushed a commit that referenced this pull request May 4, 2026
Two scenarios:
- idle: sweep tick_period_ms in {1, 10, 100, 1000} and measure
  producer→consumer e2e latency at light load.
- held-xmin: a long-running RR transaction holds xmin while a 1000 ev/s
  producer runs at the default 100 ms tick. Demonstrates how
  tick/subscription metadata UPDATEs degrade under blocked vacuum.

Latency is measured server-side: producer stamps clock_timestamp() into
the payload, consumer subtracts at receive time. No client clock skew.

Refs #69, PR #204
NikolayS pushed a commit that referenced this pull request May 4, 2026
Idle sweep (30 s per cell, 100 ev/s, single laptop, PG16):
  tick_period_ms=1000: p50 503, p99 994, max 1004 ms (1 tick/sec floor)
  tick_period_ms=100:  p50  53, p99  103, max  105 ms (default; clean)
  tick_period_ms=10:   p50   8, p99  864, max 1013 ms (tail blows up)
  tick_period_ms=1:    p50   3, p99  460, max  548 ms (tail blows up)

Held-xmin (tick=100ms, 1000 ev/s, 5 min RR tx open):
  baseline:     p50 52.6, p99 103.4, max 143.8 ms
  held-xmin:    p50 53.8, p99 104.7, max 235.6 ms

Median essentially flat under held-xmin; worst-case roughly doubles.
Milder than expected — consistent with the design goal of stable latency
under load on moderate timescales. Longer runs / higher tick rates
would amplify the bloat-driven tail (PR #62 territory).

Refs #69, PR #204
Copy link
Copy Markdown
Owner Author

NikolayS commented May 4, 2026

Latency bench results

Harness landed in benchmark/tick-rate/. Single laptop, Postgres 16, pg_cron 1.6.2 with cron.use_background_workers = on. Latency is server-side clock_timestamp() delta (producer stamps payload at send(), consumer subtracts at receive()).

1. Idle sweep — tick rate vs. e2e latency

30 s per cell, 100 ev/s producer, queue configured to tick on every event (ticker_max_count=1, max_lag=0, idle_period=0).

tick_period_ms rate sent recvd p50 p95 p99 max mean
1000 100 3000 3000 503.32 953.87 993.89 1004.16 503.32
100 (default) 100 3000 3000 52.62 98.86 103.48 105.28 52.74
10 100 3000 3000 8.05 263.53 863.50 1013.39 41.86
1 100 3000 3000 3.26 161.67 460.47 547.91 22.07

Reads:

  • tick_period_ms = 100 (default) — clean: median 53 ms, max 105 ms, exactly tracks "wait for next tick, mean ≈ period/2". This validates the 10-tick/sec design point.
  • Sub-10 ms tick periods — median improves to single digits but the tail blows up to ~1 s. Suspected cause: at sub-10 ms periods the procedure can't complete its inner iterations within one pg_cron 1-second slot, so the next slot lands on a still-running worker and effectively skips a tick window.
  • The originally-requested 0.1 ticks/sec (tick_period_ms = 10000) is not measurable by design: ticker_loop clamps the internal period at the 1-second pg_cron slot. Pushed an explicit 1..1000 range check into set_tick_period_ms to remove the dead zone (commit aa75879 on this branch). Tick "less than once per second" is what the pg_cron schedule string itself controls.

2. Held-xmin — does metadata bloat hurt the default path?

5 min at the default 100 ms tick / 1000 ev/s producer, with a separate session running BEGIN ISOLATION LEVEL REPEATABLE READ; ... pg_sleep(...); for the duration so xmin is held and autovacuum can't reclaim dead tuples on pgque.tick / pgque.subscription. Compared against a 60 s baseline at the same tick + rate without the holder.

condition duration sent recvd p50 p95 p99 max mean
baseline (no held-xmin) 60 s 60 000 60 000 52.60 98.96 103.40 143.76 52.75
held-xmin (RR tx open) 300 s 300 000 300 000 53.83 100.18 104.73 235.56 54.03

Reads:

  • p50 / p95 / p99 essentially unchanged. Median e2e at 53 ms; p99 at 104 ms.
  • Worst case roughly doubles (144 ms → 236 ms). Real, but mild.
  • This is much milder than the upstream-PgQ R7 90-min picture (peak dead tuples > 21k upstream vs. ≤ 1000 with PR Rotate subscription and tick tables to avoid held-xmin bloat (#61) #62). Two reasons it looks calm here:
    • 5 min is short. Metadata bloat scales with held duration; 30+ min would visibly degrade the tail further.
    • 10 ticks/sec is the default. Higher tick rates amplify metadata-table churn proportionally.

The takeaway is not "held-xmin is fine, deprioritise #62". It's that under a moderate held-vacuum window the median is robust and the tail degrades gracefully — which is the design goal. PR #62's rotation fix extends the ceiling on that stability. Out of scope here as previously noted.

Status of the "known gaps" from the PR description

Bench raw output and a methodology README are committed in benchmark/tick-rate/.


Generated by Claude Code

NikolayS pushed a commit that referenced this pull request May 5, 2026
Two follow-ups from the PR review.

1. pg_cron CI variant (closes the test-coverage gap)
   New ci/Dockerfile.pgcron installs postgresql-NN-cron over the official
   postgres image; new pgcron-test workflow job builds it, starts PG with
   shared_preload_libraries=pg_cron and cron.use_background_workers=on,
   runs the full regression suite, and explicitly fails if any
   pg_cron-only test prints "SKIP: pg_cron not installed" (so a future
   accidental loss of coverage is loud, not silent).

   This makes test_tick_period's "pgque.start() schedules CALL
   pgque.ticker_loop()" assertion run in CI for the first time, plus the
   four pgque.start() / pgque.stop() cases in test_pgcron_lifecycle.

2. statement_timeout guardrail: honestly cannot be enforced
   Tried two approaches; neither works:

   - SET statement_timeout = '...'; CALL pgque.ticker_loop() in the
     pg_cron command -- pg_cron concatenates them into one
     multi-statement transaction, and the procedure's COMMIT raises
     "invalid transaction termination".
   - set_config('statement_timeout', '1500ms', is_local := true) inside
     the procedure body -- this updates the GUC value, but
     statement_timeout is a top-level-statement timer. The CALL is the
     statement; its timer is fixed at invocation. Setting the GUC
     mid-procedure does not restart or re-arm the timer, so subsequent
     pg_sleep / pgque.ticker() work runs unguarded. Verified by
     reproduction.

   Reverted the set_config attempt and replaced the inline comment with
   the full diagnosis. ticker_loop self-bounds via clock_timestamp() to
   limit how many additional iterations a slow ticker can chain, but a
   genuinely hung pgque.ticker() will pin the pg_cron worker until an
   admin pg_cancel_backend()s it. ticker() has no indefinite-block code
   paths under normal operation; we accept the residual risk over
   shipping a guardrail that doesn't actually fire.

Refs #69, PR #204
Copy link
Copy Markdown
Owner Author

NikolayS commented May 5, 2026

Closing the remaining gaps

Both items left from the earlier "what's left" rundown are addressed in commit a43967d:

✅ pg_cron CI variant (gap #2)

New ci/Dockerfile.pgcron (mirrors the pg_tle pattern) plus a pgcron-test workflow job that:

  • builds a postgres image with postgresql-NN-cron installed,
  • starts PG with shared_preload_libraries=pg_cron, cron.use_background_workers=on, cron.database_name=pgque_test,
  • runs the full tests/run_all.sql,
  • explicitly fails if any test still prints SKIP: pg_cron not installed — so a future accidental loss of pg_cron coverage is loud, not silent.

This makes the pgque.start()CALL pgque.ticker_loop() schedule-wiring assertion run in CI for the first time, plus the four pgque.start() / pgque.stop() cases in test_pgcron_lifecycle.

❌→📝 statement_timeout guardrail (gap #4)

Cannot be made to work; documented honestly. Tried:

  1. SET statement_timeout = '1500ms'; CALL pgque.ticker_loop() in the pg_cron command — pg_cron concatenates the two into one multi-statement transaction, the procedure's COMMIT then raises "invalid transaction termination". (This is what we already knew.)
  2. set_config('statement_timeout', '1500ms', is_local := true) at the top of each iteration inside the procedure body — confirmed: this updates the GUC value (verifiable with current_setting) but does not enforce the timeout. statement_timeout is a top-level-statement timer; the CALL is the statement; its timer was set at CALL invocation. Changing the GUC mid-procedure does not restart or re-arm the timer, so subsequent pg_sleep / ticker() work runs unguarded. Reproduced with a slow stub: pg_sleep(2.5) inside the procedure returned cleanly even with set_config('statement_timeout','1500ms',true) set immediately before it.

Reverted the set_config attempt. Inline source comment in sql/pgque-additions/lifecycle.sql records the diagnosis so the next person doesn't re-walk the same ground. Residual risk: a genuinely hung pgque.ticker() will pin the pg_cron worker until an admin pg_cancel_backend()s it. ticker() is short and has no indefinite-block code paths under normal operation; we accept the risk over shipping a guardrail that doesn't actually fire.

PR description updated. Ready for another look.


Generated by Claude Code

@NikolayS
Copy link
Copy Markdown
Owner Author

NikolayS commented May 5, 2026

REV review — PR #204

Scope: configurable sub-second tick rate / default 10 ticks/sec, v0.2.0 readiness.

Verdict

Request changes if the API promises arbitrary 1..1000 ms cadence.

The default 100 ms / 10 ticks/sec path looks coherent and well covered. The issue is general configurability: non-divisor periods are reported/documented as if they run at the exact requested cadence, but the loop floors the number of iterations per 1-second cron window.

Blocking / must fix before merge

MAJOR — Non-divisor tick periods produce inaccurate effective rates

Evidence from sql/pgque-additions/lifecycle.sql:

v_iter_budget := greatest(1, v_window_ms / v_period_ms);

This is integer floor division within the 1000 ms window. But status/docs report idealized cadence as 1000.0 / tick_period_ms.

Examples:

  • tick_period_ms = 750 reports ~1.33 ticks/sec, but the loop runs 1 tick/sec.
  • tick_period_ms = 251 reports ~3.98 ticks/sec, but the loop runs 3 ticks/sec.

Tests cover clean divisor values such as 100, 200, 1000, but not values like 251 or 750.

Recommended v0.2.0 fix: restrict accepted values to divisors of 1000 ms, or explicitly document/report the floored effective cadence. I prefer restriction for now: simple, honest, and testable.

Non-blocking docs fixes

LOW — WAL math in docs/three-latencies.md

The docs say 25 ms is “25× WAL of default”. With the new default 100 ms, 25 ms is ~4× default, not 25×.

LOW — 1 ms docs overstate latency

Docs imply 1 ms gives sub-ms behavior, but the benchmark data shows roughly p50 3.26 ms, p99 161.67 ms, max 547.91 ms. Better to frame it as “very aggressive / not recommended broadly” rather than sub-ms.

Positive evidence

  • Default 100 ms behavior appears coherent.
  • Upgrade path adding pgque.config.tick_period_ms default 100 not null looks safe.
  • GitHub CI is green across PG 14–18, clients, pg_tle, and pg_cron-related checks.
  • The benchmark harness is useful and should stay.

Summary

This is likely a good v0.2.0 feature, but only after the accepted-value contract matches actual scheduler behavior.

@NikolayS
Copy link
Copy Markdown
Owner Author

NikolayS commented May 5, 2026

REV blocker fix: reject non-divisor tick periods

Pushed 5feed16 to address the cadence-reporting blocker for arbitrary tick_period_ms values.

What changed

  • tick_period_ms is now restricted to exact divisors of 1000 in the 1..1000 ms range.
    • Examples accepted: 1, 2, 4, 5, 10, 20, 25, 40, 50, 100, 125, 200, 250, 500, 1000.
    • Examples rejected: 251, 750.
  • Enforcement is in both layers:
    • pgque.set_tick_period_ms(...) validates before update.
    • pgque.config has a check constraint so direct table writes cannot bypass it.
  • Upgrade path normalizes any pre-constraint experimental invalid value back to 100 before adding the stricter constraint.
  • Tests now cover rejecting 251 and 750.
  • Docs updated:
    • accepted periods must divide 1000 exactly;
    • 25ms WAL/churn wording fixed to ~4× default, not 25×;
    • 1ms no longer claims sub-ms e2e; docs now describe low single-digit ms in the current benchmark.

Validation

  • ./build/transform.sh
  • git diff --check
  • fresh install of generated sql/pgque.sql into local temp DB ✅
  • tests/test_tick_period.sql
    • pg_cron-specific section skipped locally because pg_cron is not installed in that DB
  • direct DB checks ✅
    • set_tick_period_ms(250) accepted
    • direct pgque.config updates to 750 and 0 rejected by the check constraint

This keeps the v0.2.0 behavior honest: every advertised tick period maps exactly to an integer number of ticker iterations inside pg_cron's 1000 ms slot.

Copy link
Copy Markdown
Owner Author

NikolayS commented May 5, 2026

Pulled and verified 5feed16 locally:

  • Build green: bash build/transform.sh → ASSEMBLY + PACKAGING complete.

  • Full tests/run_all.sql green on a fresh PG 16 database (without pg_cron) — including the new test-3 cases that reject 251 and 750.

  • Both enforcement layers fire as intended:

    select pgque.set_tick_period_ms(250);   -- → 250
    select pgque.set_tick_period_ms(251);   -- ERROR: tick_period_ms must be an exact divisor of 1000 (got 251)
    select pgque.set_tick_period_ms(750);   -- ERROR: tick_period_ms must be an exact divisor of 1000 (got 750)
    update pgque.config set tick_period_ms = 750;  -- ERROR: new row violates check constraint "config_tick_period_ms_check"
    

    Function-level rejection plus table-level CHECK is the right shape — keeps invariants honest even if a future caller bypasses the setter.

The honest-cadence framing ("every advertised tick period maps exactly to an integer number of iterations inside the 1000 ms slot") is materially better than the silent-floor behavior the original implementation had. Doc fixes (25 ms ≈ 4× WAL not 25×, 1 ms no longer claims sub-ms e2e) match the bench numbers.

LGTM. Ready for external review on my end.


Generated by Claude Code

claude and others added 8 commits May 5, 2026 08:05
pg_cron's minimum schedule is 1 second, which capped end-to-end latency
at ~1 s for non-LISTEN consumers.  Drive sub-second ticking from inside
a single pg_cron slot via a new pgque.ticker_loop() procedure that
re-invokes pgque.ticker() at pgque.config.tick_period_ms cadence
(default 100 ms = 10 Hz).

The procedure commits between iterations so each tick gets its own
transaction and rotation isn't blocked by a held xmin (this is also why
ticker_loop is a PROCEDURE, not a FUNCTION).

Tunable at runtime with pgque.set_tick_period_ms(ms); changes apply on
the next pg_cron slot without rescheduling.

Refs #69
…iene

- README: replace "1 second tick" framing with "10 Hz default"; new "Tick
  rate" section covers `pgque.set_tick_period_ms`, trade-offs (WAL,
  NOTIFY, metadata-table dead tuples), and clarifies that the per-second
  pg_cron slot count does NOT increase with sub-second ticking — the
  cron.job_run_details growth rate is unchanged.
- docs/three-latencies.md: rewrite the cadence table around tick_period_ms
  with new rough numbers per rate; explicitly note the pg_cron logging
  problem is independent of sub-second ticking.
- docs/reference.md: document `pgque.ticker_loop()` and
  `pgque.set_tick_period_ms(ms)`; expand `start()` and `ticker()` notes.
- docs/tutorial.md: extend "Production cadence" with the 10 Hz default,
  why ticker_loop is a procedure (per-iteration commit / snapshot
  semantics / xmin), and the unchanged log-hygiene recipe.
- lifecycle.sql: switch status() detail strings from `||` concatenation
  to format().

Refs #69
"Hz" reads as electronics/CPU-clock vocabulary in queue/DB context, and
implies a precision the loop doesn't actually deliver — real cycle is
tick_work + sleep, so 100 ms period yields ~9 ticks/s, not exactly 10.
"ticks/sec" is more natural for this domain.

Replaces all user-facing strings (status() detail, start() notice,
README, docs, test message). No behaviour change.
Three small follow-ups from review:

1. Range 1..1000 (was 1..60000): pg_cron's minimum schedule is 1 s, so
   tick_period_ms > 1000 collapses to "one tick per pg_cron slot" inside
   ticker_loop's clamp. Reject explicitly rather than silently clamping.
   To tick less often than once per second, edit the pg_cron schedule
   string directly.

2. status() now includes the configured tick_period_ms even when pg_cron
   is unavailable, so operators driving the ticker manually can see the
   value they configured.

3. README "Tick rate" section gains a note about cron.max_running_jobs:
   ticker_loop holds one pg_cron bg worker for ~1 s per slot (vs. ~10 ms
   previously), bounding ~30 pgque-bearing databases per cluster at
   pg_cron's default of 32.

Note on the statement_timeout guardrail considered earlier: NOT folded
in. pg_cron concatenates `SET statement_timeout = '...'; CALL
pgque.ticker_loop()` into a single multi-statement transaction, and the
COMMIT inside the procedure raises "invalid transaction termination" in
that wrapper. Documented inline in pgque.start().

Refs #69, PR #204
Two scenarios:
- idle: sweep tick_period_ms in {1, 10, 100, 1000} and measure
  producer→consumer e2e latency at light load.
- held-xmin: a long-running RR transaction holds xmin while a 1000 ev/s
  producer runs at the default 100 ms tick. Demonstrates how
  tick/subscription metadata UPDATEs degrade under blocked vacuum.

Latency is measured server-side: producer stamps clock_timestamp() into
the payload, consumer subtracts at receive time. No client clock skew.

Refs #69, PR #204
Idle sweep (30 s per cell, 100 ev/s, single laptop, PG16):
  tick_period_ms=1000: p50 503, p99 994, max 1004 ms (1 tick/sec floor)
  tick_period_ms=100:  p50  53, p99  103, max  105 ms (default; clean)
  tick_period_ms=10:   p50   8, p99  864, max 1013 ms (tail blows up)
  tick_period_ms=1:    p50   3, p99  460, max  548 ms (tail blows up)

Held-xmin (tick=100ms, 1000 ev/s, 5 min RR tx open):
  baseline:     p50 52.6, p99 103.4, max 143.8 ms
  held-xmin:    p50 53.8, p99 104.7, max 235.6 ms

Median essentially flat under held-xmin; worst-case roughly doubles.
Milder than expected — consistent with the design goal of stable latency
under load on moderate timescales. Longer runs / higher tick rates
would amplify the bloat-driven tail (PR #62 territory).

Refs #69, PR #204
Two follow-ups from the PR review.

1. pg_cron CI variant (closes the test-coverage gap)
   New ci/Dockerfile.pgcron installs postgresql-NN-cron over the official
   postgres image; new pgcron-test workflow job builds it, starts PG with
   shared_preload_libraries=pg_cron and cron.use_background_workers=on,
   runs the full regression suite, and explicitly fails if any
   pg_cron-only test prints "SKIP: pg_cron not installed" (so a future
   accidental loss of coverage is loud, not silent).

   This makes test_tick_period's "pgque.start() schedules CALL
   pgque.ticker_loop()" assertion run in CI for the first time, plus the
   four pgque.start() / pgque.stop() cases in test_pgcron_lifecycle.

2. statement_timeout guardrail: honestly cannot be enforced
   Tried two approaches; neither works:

   - SET statement_timeout = '...'; CALL pgque.ticker_loop() in the
     pg_cron command -- pg_cron concatenates them into one
     multi-statement transaction, and the procedure's COMMIT raises
     "invalid transaction termination".
   - set_config('statement_timeout', '1500ms', is_local := true) inside
     the procedure body -- this updates the GUC value, but
     statement_timeout is a top-level-statement timer. The CALL is the
     statement; its timer is fixed at invocation. Setting the GUC
     mid-procedure does not restart or re-arm the timer, so subsequent
     pg_sleep / pgque.ticker() work runs unguarded. Verified by
     reproduction.

   Reverted the set_config attempt and replaced the inline comment with
   the full diagnosis. ticker_loop self-bounds via clock_timestamp() to
   limit how many additional iterations a slow ticker can chain, but a
   genuinely hung pgque.ticker() will pin the pg_cron worker until an
   admin pg_cancel_backend()s it. ticker() has no indefinite-block code
   paths under normal operation; we accept the residual risk over
   shipping a guardrail that doesn't actually fire.

Refs #69, PR #204
@NikolayS
Copy link
Copy Markdown
Owner Author

NikolayS commented May 5, 2026

Rebased on current main

Rebased PR #204 on current main after #203/#206 landed, resolving the docs conflict by keeping the new canonical force_next_tick() wording from #206 and the exact-divisor tick-period wording from this PR.

Current head includes:

  • default 100 ms / 10 ticks/sec ticker loop;
  • tick_period_ms restricted to exact divisors of 1000;
  • pg_cron CI path;
  • pg_tle generated install script regenerated on current main;
  • docs use force_next_tick() in user-facing examples where applicable.

Local validation after rebase:

  • bash build/transform.sh
  • git diff --check
  • fresh install of generated sql/pgque.sql
  • tests/test_tick_period.sql
    • pg_cron-specific section skipped locally because this DB lacks pg_cron; GitHub pg_cron job covers it.

My current verdict: still ready after CI reruns. The previous REV blocker was the non-divisor cadence issue, and this branch now rejects those values honestly at setter + config constraint layers.

@NikolayS NikolayS force-pushed the claude/configurable-polling-rate-BmevU branch from 5feed16 to 8b58cb2 Compare May 5, 2026 08:09
@NikolayS NikolayS merged commit ab9183a into main May 5, 2026
11 checks passed
@NikolayS NikolayS deleted the claude/configurable-polling-rate-BmevU branch May 5, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants