realm-server: wait for port release in test fixture closeServer#4647
realm-server: wait for port release in test fixture closeServer#4647
Conversation
The cached-template builder for `setupPermissionedRealmsCached` (a) starts
a real RealmServer on the realm URL's port, (b) populates the template DB,
(c) tears down via closeServer + DB cleanup. Then the actual test's `before`
hook starts a *second* RealmServer on the same port via the same fixture
plumbing.
`server.close(cb)` only stops accepting new connections; the kernel still
holds the bind slot briefly, and the next listen() races into EADDRINUSE.
Locally this reproduces on the very first prerendering test as
`EADDRINUSE :::4455`.
Add `awaitPortRelease(host, port, { timeoutMs })` and call it from
`closeServer` after the existing close path (idle/all connections + close
callback). It opens a TCP probe and waits for ECONNREFUSED, with a 2s
ceiling and a clear diagnostic warning on timeout so the next failure
points at the leaked port rather than the downstream EADDRINUSE.
Per-cycle port assignment for the builder (so it could bind a different
port from the test) was considered but ruled out: `boxel_index` rows are
keyed by realm_url in the primary key, so changing the URL during build
would invalidate the template DB the test reads back.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d018ba4be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Host Test Results 1 files ±0 1 suites ±0 1h 41m 57s ⏱️ - 1m 16s Results for commit 0aba64c. ± Comparison against earlier commit 7d018ba. Realm Server Test Results 1 files ±0 1 suites ±0 18m 14s ⏱️ - 1m 4s Results for commit 0aba64c. ± Comparison against earlier commit 7d018ba. |
Codex review caught: when server.address().address is '::' (IPv6 wildcard), probing 127.0.0.1 can falsely report the port as released on systems with IPv6-only binding behavior — the IPv4 probe gets ECONNREFUSED while the IPv6 listener is still bound. Map '::' to '::1' instead so the probe runs in the same address family as the listener. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens the realm-server test fixture teardown by ensuring the OS has fully released a server port after server.close() resolves, preventing intermittent EADDRINUSE when back-to-back fixtures re-bind the same port (notably in setupPermissionedRealmsCached flows).
Changes:
- Capture
server.address()info before closing, then wait for the port to stop accepting TCP connections afterserver.close(). - Add an
awaitPortRelease(host, port, { timeoutMs, intervalMs })helper that polls via a TCP connect probe and emits a targeted warning on timeout.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Running
packages/realm-server/tests/prerendering-test.tslocally bails on the very first test withEADDRINUSE :::4455fromRealmServer.listeninsiderunTestRealmServer. The mechanism:setupPermissionedRealmsCachedinstalls twobeforehooks:acquirePermissionedRealmsTemplate→buildPermissionedRealmsTemplate→startPermissionedRealmsFixture→runTestRealmServer→ binds the realm URL's port (e.g. 4455), runs from-scratch indexing, then callsteardownPermissionedRealmsFixture(which callscloseServer).setupPermissionedRealms's innerbefore→startPermissionedRealmsFixtureagain, trying to bind the same 4455.closeServeralready doescloseIdleConnections()+closeAllConnections()+await server.close(...), but Node'sserver.close(cb)only waits for the listener to stop accepting new connections — the kernel can still hold the bind slot briefly and the nextbind()races into EADDRINUSE.A related symptom of the same lifecycle gap is the warning:
The page-pool's
#sharedContextsentry from the cache builder's run is still registered when the test's setup tries to register a fresh context for the same realm URL. That warning is informational; the EADDRINUSE is the bug that actually breaks the run.Fix
Add
awaitPortRelease(host, port, { timeoutMs })and call it fromcloseServerafter the existing close path. It opens a TCP probe and resolves on firstECONNREFUSED(or any error indicating the socket isn't listening), polling every 25ms with a 2s ceiling. On timeout it logs a clearawaitPortRelease: 127.0.0.1:4455 still appears bound after 2000mswarning so the next failure points at the leaked port rather than the downstream EADDRINUSE.Centralizing the wait in
closeServermeans every fixture path (single-realm, multi-realm, cached, builder) gets the guarantee for free.Per-cycle port for the builder — ruled out
The other fix considered was making the cache builder bind a different port from the test (e.g.
original_port + 100). That turns out to be too invasive:boxel_indexrows are keyed byrealm_urlin the primary key (packages/postgres/migrations/1733253128046_remove-type-from-index-pk.jsmakes[url, realm_version, realm_url]the PK). If the builder rewrote the realm URL, the template DB rows would be keyed under the wrong URL and the test reading them back via the original URL wouldn't find them. Shipping (1) alone is what unblocks local runs.Test plan
pnpm --filter @cardstack/realm-server test-module --module 'prerender - mutating tests'(or anothersetupPermissionedRealmsCachedconsumer) locally and confirm it no longer fails withEADDRINUSEon the first test.awaitPortReleasetimeout warnings.🤖 Generated with Claude Code