Skip to content

Clean up world-postgres LISTEN self-healing for upstream PR#1

Closed
Pom4H wants to merge 2 commits intobotify/world-postgres-self-healingfrom
botify/world-postgres-self-healing-v2
Closed

Clean up world-postgres LISTEN self-healing for upstream PR#1
Pom4H wants to merge 2 commits intobotify/world-postgres-self-healingfrom
botify/world-postgres-self-healing-v2

Conversation

@Pom4H
Copy link
Copy Markdown
Owner

@Pom4H Pom4H commented Apr 27, 2026

What this PR does

Reviews the cleanup pass over botify/world-postgres-self-healing before submitting upstream to vercel#1855. Base = the original branch, so this PR's diff shows everything the cleanup removes/changes vs. the first draft.

The end result on the head branch is a single commit reset onto vercel:stable that contains only the actual fix from issue vercel#1855.

What changed vs. the original two commits

Dropped — out of scope for vercel#1855:

  • ListenAdapter interface, createPgListenAdapter, and ListenAdapter config option (commit 11e860b3). Adding a public pluggable transport ahead of any consumer asking for it. Worth doing, but separately.
  • createBunSqlListenAdapter stub. Exporting a function that throws referencing an unmerged Bun PR (add sql listen, unlisten, notify for postgres oven-sh/bun#29710) creates a public surface that can never be removed without a breaking change. Out.
  • queue.ts duplex: 'half' removal. Different bug, different package layer, different PR.

Kept and tightened:

  • listenChannel reconnect with bounded exponential backoff (250 ms - 30 s).
  • readFromStream polling fallback.
  • pollIntervalMs is now configurable via PostgresWorldConfig.streamPollIntervalMs (default 5000 ms; 0 disables). Production deployments with many readers can dial it up; tests can disable.
  • Polling tick is race-safe: re-checks closed after the await, and enqueue short-circuits if closed is set or controller.close() already fired on EOF.
  • Removed the // eslint-disable-next-line no-console comments. The repo lints with biome, which doesn't have a no-console rule, and existing world packages use console.warn directly. The disables were dead.

Verified

  • pnpm exec tsc --noEmit clean.
  • pnpm vitest run for world-postgres: 11 unit tests pass; the 2 integration test files fail because no docker/Postgres is running locally (preexisting, unrelated).
  • biome check: 3 cognitive-complexity warnings on getStreamChunks (preexisting, 21), start (17, was 16 before), enqueue (17, was 16 before). All warns, not errors.

Still TODO before this can be sent upstream

This is the part that is not done by this PR — it is staged for follow-up:

  • Rebase onto vercel:main. The current head sits on vercel:stable. On main, streamer.ts has been refactored — writeToStream is now nested under streams: { write }. The fix needs to be re-applied to main's structure first; [core] Move stream reconnect logic to getReadable level vercel/workflow#1847 explicitly says fixes go to main then backport to stable.
  • Integration test. No new test in this PR. The reliability fix needs an integration test in packages/world-postgres/test/ that drops the LISTEN client mid-stream (e.g. via pg_terminate_backend) and asserts that subsequent chunks still reach the reader via the polling fallback. streamPollIntervalMs: 100 would make the test fast.

Pom4H added 2 commits April 27, 2026 17:17
The dedicated `Client` used for NOTIFY subscription is long-lived and will
eventually be dropped by the server (idle TCP timeout, pgbouncer rotation,
k8s CNI eviction). The unpatched implementation does not reconnect, so a
process running for more than a few hours stops receiving notifications and
only a restart restores delivery (cf. brianc/node-postgres#967).

Two layers, matching vercel#1855:

  * `listenChannel` now wraps the dedicated `pg.Client` in a reconnect loop
    with bounded exponential backoff (250 ms cap 30 s). Initial connect
    must succeed; subsequent reconnects are best-effort. `error`/`end`
    re-arm; `close()` stops further attempts.

  * `readFromStream` runs a periodic re-query of `streams WHERE chunk_id >
    lastChunkId` as an always-on safety net for chunks delivered while the
    LISTEN socket was down. Dedup is via the existing `enqueue` ordering
    check; the poll skips when an EOF has already closed the controller.
    Interval is configurable via `PostgresWorldConfig.streamPollIntervalMs`
    (default 5000 ms; set to 0 to disable).

Tracks vercel#1855.
Strategy 'ours' merges the original two commits (7214a62 + 11e860b)
into v2 without taking any of their tree contents. Resolves the conflict
between v2's clean rewrite and the original's scope-creep commits while
keeping a record that the original branch was superseded.
@github-actions
Copy link
Copy Markdown

🧪 E2E Test Results

Tests are running...

This comment will be updated with the results when the tests complete.


Started at: 2026-04-27T14:47:33Z

@Pom4H
Copy link
Copy Markdown
Owner Author

Pom4H commented Apr 27, 2026

Superseded by v3 branch which is rebased on vercel:main and applies the fix to the new streams namespace API. New PR coming next.

@Pom4H Pom4H closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant