Clean up world-postgres LISTEN self-healing for upstream PR#1
Closed
Pom4H wants to merge 2 commits intobotify/world-postgres-self-healingfrom
Closed
Clean up world-postgres LISTEN self-healing for upstream PR#1Pom4H wants to merge 2 commits intobotify/world-postgres-self-healingfrom
Pom4H wants to merge 2 commits intobotify/world-postgres-self-healingfrom
Conversation
The dedicated `Client` used for NOTIFY subscription is long-lived and will eventually be dropped by the server (idle TCP timeout, pgbouncer rotation, k8s CNI eviction). The unpatched implementation does not reconnect, so a process running for more than a few hours stops receiving notifications and only a restart restores delivery (cf. brianc/node-postgres#967). Two layers, matching vercel#1855: * `listenChannel` now wraps the dedicated `pg.Client` in a reconnect loop with bounded exponential backoff (250 ms cap 30 s). Initial connect must succeed; subsequent reconnects are best-effort. `error`/`end` re-arm; `close()` stops further attempts. * `readFromStream` runs a periodic re-query of `streams WHERE chunk_id > lastChunkId` as an always-on safety net for chunks delivered while the LISTEN socket was down. Dedup is via the existing `enqueue` ordering check; the poll skips when an EOF has already closed the controller. Interval is configurable via `PostgresWorldConfig.streamPollIntervalMs` (default 5000 ms; set to 0 to disable). Tracks vercel#1855.
🧪 E2E Test Results⏳ Tests are running... This comment will be updated with the results when the tests complete. Started at: 2026-04-27T14:47:33Z |
Owner
Author
|
Superseded by v3 branch which is rebased on vercel:main and applies the fix to the new |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Reviews the cleanup pass over
botify/world-postgres-self-healingbefore submitting upstream to vercel#1855. Base = the original branch, so this PR's diff shows everything the cleanup removes/changes vs. the first draft.The end result on the head branch is a single commit reset onto
vercel:stablethat contains only the actual fix from issue vercel#1855.What changed vs. the original two commits
Dropped — out of scope for vercel#1855:
ListenAdapterinterface,createPgListenAdapter, andListenAdapterconfig option (commit11e860b3). Adding a public pluggable transport ahead of any consumer asking for it. Worth doing, but separately.createBunSqlListenAdapterstub. Exporting a function that throws referencing an unmerged Bun PR (add sql listen, unlisten, notify for postgres oven-sh/bun#29710) creates a public surface that can never be removed without a breaking change. Out.queue.tsduplex: 'half'removal. Different bug, different package layer, different PR.Kept and tightened:
listenChannelreconnect with bounded exponential backoff (250 ms - 30 s).readFromStreampolling fallback.pollIntervalMsis now configurable viaPostgresWorldConfig.streamPollIntervalMs(default 5000 ms;0disables). Production deployments with many readers can dial it up; tests can disable.closedafter theawait, andenqueueshort-circuits ifclosedis set orcontroller.close()already fired on EOF.// eslint-disable-next-line no-consolecomments. The repo lints with biome, which doesn't have a no-console rule, and existing world packages useconsole.warndirectly. The disables were dead.Verified
pnpm exec tsc --noEmitclean.pnpm vitest runforworld-postgres: 11 unit tests pass; the 2 integration test files fail because no docker/Postgres is running locally (preexisting, unrelated).biome check: 3 cognitive-complexity warnings ongetStreamChunks(preexisting, 21),start(17, was 16 before),enqueue(17, was 16 before). All warns, not errors.Still TODO before this can be sent upstream
This is the part that is not done by this PR — it is staged for follow-up:
vercel:main. The current head sits onvercel:stable. Onmain,streamer.tshas been refactored —writeToStreamis now nested understreams: { write }. The fix needs to be re-applied tomain's structure first; [core] Move stream reconnect logic to getReadable level vercel/workflow#1847 explicitly says fixes go tomainthen backport tostable.packages/world-postgres/test/that drops the LISTEN client mid-stream (e.g. viapg_terminate_backend) and asserts that subsequent chunks still reach the reader via the polling fallback.streamPollIntervalMs: 100would make the test fast.