Skip to content

realm-server: wait for port release in test fixture closeServer#4647

Merged
habdelra merged 2 commits intomainfrom
realm-server-fixture-port-cleanup
May 5, 2026
Merged

realm-server: wait for port release in test fixture closeServer#4647
habdelra merged 2 commits intomainfrom
realm-server-fixture-port-cleanup

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

@habdelra habdelra commented May 4, 2026

Summary

Running packages/realm-server/tests/prerendering-test.ts locally bails on the very first test with EADDRINUSE :::4455 from RealmServer.listen inside runTestRealmServer. The mechanism:

setupPermissionedRealmsCached installs two before hooks:

  1. acquirePermissionedRealmsTemplatebuildPermissionedRealmsTemplatestartPermissionedRealmsFixturerunTestRealmServer → binds the realm URL's port (e.g. 4455), runs from-scratch indexing, then calls teardownPermissionedRealmsFixture (which calls closeServer).
  2. setupPermissionedRealms's inner beforestartPermissionedRealmsFixture again, trying to bind the same 4455.

closeServer already does closeIdleConnections() + closeAllConnections() + await server.close(...), but Node's server.close(cb) only waits for the listener to stop accepting new connections — the kernel can still hold the bind slot briefly and the next bind() races into EADDRINUSE.

A related symptom of the same lifecycle gap is the warning:

Shared-context invariant violated for realm:http://127.0.0.1:4455/test/: existing BrowserContext does not match the one being registered.

The page-pool's #sharedContexts entry from the cache builder's run is still registered when the test's setup tries to register a fresh context for the same realm URL. That warning is informational; the EADDRINUSE is the bug that actually breaks the run.

Fix

Add awaitPortRelease(host, port, { timeoutMs }) and call it from closeServer after the existing close path. It opens a TCP probe and resolves on first ECONNREFUSED (or any error indicating the socket isn't listening), polling every 25ms with a 2s ceiling. On timeout it logs a clear awaitPortRelease: 127.0.0.1:4455 still appears bound after 2000ms warning so the next failure points at the leaked port rather than the downstream EADDRINUSE.

Centralizing the wait in closeServer means every fixture path (single-realm, multi-realm, cached, builder) gets the guarantee for free.

Per-cycle port for the builder — ruled out

The other fix considered was making the cache builder bind a different port from the test (e.g. original_port + 100). That turns out to be too invasive: boxel_index rows are keyed by realm_url in the primary key (packages/postgres/migrations/1733253128046_remove-type-from-index-pk.js makes [url, realm_version, realm_url] the PK). If the builder rewrote the realm URL, the template DB rows would be keyed under the wrong URL and the test reading them back via the original URL wouldn't find them. Shipping (1) alone is what unblocks local runs.

Test plan

  • Run pnpm --filter @cardstack/realm-server test-module --module 'prerender - mutating tests' (or another setupPermissionedRealmsCached consumer) locally and confirm it no longer fails with EADDRINUSE on the first test.
  • Confirm full prerendering-test runs cleanly through the cached-template setup without awaitPortRelease timeout warnings.
  • CI green for the realm-server suite.

🤖 Generated with Claude Code

The cached-template builder for `setupPermissionedRealmsCached` (a) starts
a real RealmServer on the realm URL's port, (b) populates the template DB,
(c) tears down via closeServer + DB cleanup. Then the actual test's `before`
hook starts a *second* RealmServer on the same port via the same fixture
plumbing.

`server.close(cb)` only stops accepting new connections; the kernel still
holds the bind slot briefly, and the next listen() races into EADDRINUSE.
Locally this reproduces on the very first prerendering test as
`EADDRINUSE :::4455`.

Add `awaitPortRelease(host, port, { timeoutMs })` and call it from
`closeServer` after the existing close path (idle/all connections + close
callback). It opens a TCP probe and waits for ECONNREFUSED, with a 2s
ceiling and a clear diagnostic warning on timeout so the next failure
points at the leaked port rather than the downstream EADDRINUSE.

Per-cycle port assignment for the builder (so it could bind a different
port from the test) was considered but ruled out: `boxel_index` rows are
keyed by realm_url in the primary key, so changing the URL during build
would invalidate the template DB the test reads back.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d018ba4be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/realm-server/tests/helpers/index.ts Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Host Test Results

    1 files  ±0      1 suites  ±0   1h 41m 57s ⏱️ - 1m 16s
2 563 tests ±0  2 548 ✅ ±0  15 💤 ±0  0 ❌ ±0 
2 582 runs  ±0  2 567 ✅ ±0  15 💤 ±0  0 ❌ ±0 

Results for commit 0aba64c. ± Comparison against earlier commit 7d018ba.

Realm Server Test Results

    1 files  ±0      1 suites  ±0   18m 14s ⏱️ - 1m 4s
1 234 tests ±0  1 234 ✅ ±0  0 💤 ±0  0 ❌ ±0 
1 306 runs  ±0  1 306 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 0aba64c. ± Comparison against earlier commit 7d018ba.

Codex review caught: when server.address().address is '::' (IPv6
wildcard), probing 127.0.0.1 can falsely report the port as released on
systems with IPv6-only binding behavior — the IPv4 probe gets
ECONNREFUSED while the IPv6 listener is still bound. Map '::' to '::1'
instead so the probe runs in the same address family as the listener.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the realm-server test fixture teardown by ensuring the OS has fully released a server port after server.close() resolves, preventing intermittent EADDRINUSE when back-to-back fixtures re-bind the same port (notably in setupPermissionedRealmsCached flows).

Changes:

  • Capture server.address() info before closing, then wait for the port to stop accepting TCP connections after server.close().
  • Add an awaitPortRelease(host, port, { timeoutMs, intervalMs }) helper that polls via a TCP connect probe and emits a targeted warning on timeout.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@habdelra habdelra requested a review from a team May 4, 2026 23:07
@habdelra habdelra merged commit 10daf5b into main May 5, 2026
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants