Sub-second end-to-end latency

### Context

PgQue's end-to-end latency (latency #3 — producer `INSERT` → consumer visibility) is currently bounded by the `pg_cron` tick period (default 1 second). A discussion today (2026-04-18) on LinkedIn between Nikolay Samokhvalov (PgQue author) and Hannu Krosing (ex-Skype / Google database engineer — relevant because Skype originated PgQ) surfaced several techniques for breaking past the 1-second floor without requiring upstream pg_cron changes.

See LinkedIn comments on today's HN front-page thread covering the R4–R7 bench work.

### The ideas

**1. Single cron job with internal `pg_sleep` loop (Hannu Krosing)**

A single `pg_cron` callout fires every 1s but internally loops `pg_sleep()` 100× at 10ms intervals (or 1000× at 1ms for kHz delivery). One schedule slot covers the full second of tick activity.

Trade-offs:
- Pros: works today; no pg_cron changes needed; single worker per queue.
- Cons: heavy per-slot work; if a sleep cycle overruns, handover to the next slot gets messy.

**2. Two (or more) coordinating cron jobs with a shared advisory lock (Hannu Krosing, refinement)**

Register N `pg_cron` jobs at 1-second cadence, each naturally offset, coordinating via a common advisory lock. 10 jobs = 10 ticks/sec (100ms granularity); 100 jobs = 10ms granularity.

Trade-offs:
- Pros: no pg_cron API changes; scales naturally; clean handover via advisory lock.
- Cons: `cron.job` registration overhead grows with N; pg_cron may dedupe or throttle at scale.

**3. Support function that yields when the next pg_cron run is pending (Hannu Krosing)**

A pgque-provided function called from `pg_cron` that tight-loops doing tick work, checks "next pg_cron run is pending", and gracefully exits. Pre-pg_cron Skype/PgQ did immediate handover this way.

Trade-offs:
- Pros: simplest mental model; minimal new code.
- Cons: requires reading pg_cron internal state from inside a job; not a stable API.

**4. Upstream: pg_cron supports sub-second scheduling (Nikolay Samokhvalov)**

Longer-term: ask the pg_cron project to support e.g. `'100 milliseconds'` schedules. 10/sec is enough for many workloads.

Trade-offs:
- Pros: cleanest semantics; no workarounds.
- Cons: upstream dependency; not available today.

### Concern (Nikolay): metadata-table bloat at high tick rates

Ticking more often means `pgque.subscription` and `pgque.tick` (and potentially `pgque.queue`) are `UPDATE`d / `INSERT`ed far more frequently. Under any held-xmin condition (idle-in-tx, long-running writer, stale logical replication slot, physical standby with `hot_standby_feedback=on`), those dead tuples can't be vacuumed → index bloat → `next_batch` / `finish_batch` lookups slow down → the very latency we're trying to minimize gets worse.

This is the motivation behind #61 and the rotation fix in #62. R7 bench confirmed at 1-second tick: pgque with PR #62 stays at sub+tick peak dead ≤ 1000; upstream pgq reaches 21k+. At 10× or 100× higher tick rates, bloat scales linearly — **PR #62's rotation is what makes sub-ms ticking viable at all**.

Any high-frequency-tick design must budget for metadata rotation cadence that scales with tick rate (e.g., at 10ms ticking, rotation cadence should drop from 30s to ~3s to keep per-table dead-tuple peak bounded). See #66 for bench methodology.

### Next steps

- Prototype **approach #1** (single cron + `pg_sleep` loop) — least invasive. Benchmark subscriber latency at 10ms target tick and verify #62's metadata rotation keeps pace.
- If that passes, evaluate **approach #2** for better handover characteristics.
- Separately, open a pg_cron upstream issue for native sub-second scheduling (**approach #4**) — useful for the whole ecosystem, not just pgque.

### Provenance

- LinkedIn discussion, 2026-04-18, under today's HN front-page post of pgque.com / the R4-R5-R6-R7 bench work.
- Hannu Krosing's public writeups on the Skype-era PgQ tick design are a secondary reference.
- Ideas 1, 2, 3 credited to **Hannu Krosing**. Bloat concern and idea 4 credited to **Nikolay Samokhvalov**.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sub-second end-to-end latency #69

Context

The ideas

Concern (Nikolay): metadata-table bloat at high tick rates

Next steps

Provenance

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sub-second end-to-end latency #69

Description

Context

The ideas

Concern (Nikolay): metadata-table bloat at high tick rates

Next steps

Provenance

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions