Skip to content

CS-10952: cross-process invalidation broadcast for CachingDefinitionLookup#4644

Draft
lukemelia wants to merge 3 commits intomainfrom
cs-10952-cross-process-invalidation-broadcast
Draft

CS-10952: cross-process invalidation broadcast for CachingDefinitionLookup#4644
lukemelia wants to merge 3 commits intomainfrom
cs-10952-cross-process-invalidation-broadcast

Conversation

@lukemelia
Copy link
Copy Markdown
Contributor

First sub-issue under the CS-10950 umbrella ("Cross-process coordination for CachingDefinitionLookup"). Closes the invalidation half. CS-10953 (cross-process populate coalescing via advisory-lock + NOTIFY) stacks on top.

Why

CS-10948 added in-process generation counters (#moduleGenerations, #realmGenerations, #globalGeneration) to discard an in-flight prerender result if a concurrent invalidate/clearRealmCache/clearAllModules ran during the prerender. That fixes the persist-after-invalidate race within one realm-server process.

Across N processes the protection doesn't span instances. If process A is mid-prerender for foo.gts and process B calls invalidate('foo.gts'), B bumps its own counters and deletes the row, but A's counters are untouched — A's persist-time generation check sees no mismatch and the UPSERT re-inserts the row B just deleted.

This PR broadcasts every invalidation over a Postgres NOTIFY channel so peer processes' counters stay in lockstep with the DB.

What changes

runtime-common/definition-lookup.ts

  • Adds an exported MODULE_CACHE_INVALIDATED_CHANNEL = 'module_cache_invalidated' constant.
  • Promotes bumpModuleGeneration and bumpRealmGeneration from private to public (so the listener can replay them); adds a bumpGlobalGeneration() helper for symmetry.
  • Emits pg_notify after each invalidation path's DELETE:
    • invalidate(moduleURL) → one notify per URL in uniqueInvalidations (target + dependency-graph fan-out): module:<resolvedRealmURL>:<moduleURL>
    • clearRealmCache(url) → single realm:<resolvedRealmURL> notify
    • clearAllModules() → single global notify
  • Best-effort try/catch with console.warn (mirroring Realm.#notifyFileChange). The local instance's counters are already bumped synchronously before the DELETE, so a notify failure is a bounded cross-instance staleness window — peers self-heal on their next prerender.
  • Sqlite/in-memory dbAdapter.kind short-circuits before pg_notify.

realm-server/lib/module-cache-invalidation-listener.ts (new)

ModuleCacheInvalidationListener follows RealmFileChangesListener's shape exactly:

  • Dedicated LISTEN connection via PgAdapter.listen (Pool LISTEN is unreliable per node-postgres#1543).
  • WorkLoop for predictable shutdown.
  • 60s safety poll (no DB-side state to poll — the dispatch is in the payload — so the wake-loop just sleeps until shutdown).
  • Parses the three payload formats and dispatches to bumpModuleGeneration / bumpRealmGeneration / bumpGlobalGeneration on the attached CachingDefinitionLookup.

Self-notify is idempotent: the emitter already bumped synchronously before its DELETE, and a second monotonic bump from receiving its own NOTIFY is observationally equivalent (counters are only used for snapshot equality).

realm-server/main.ts

Constructs and start()s alongside fileChangesListener; adds .shutDown() to the existing shutdown Promise.all.

A small structural divergence from the ticket

The CS-10952 spec suggested packages/runtime-common/definition-lookup-coordination.ts. runtime-common doesn't depend on @cardstack/postgres (and shouldn't — @cardstack/postgres already depends on runtime-common via DBAdapter, so the dep would be circular). The listener needs PgAdapter.listen and WorkLoop, both from @cardstack/postgres, so it lives in realm-server/lib/ next to RealmFileChangesListener. Behavior is identical; only the file home differs. CS-10953's sibling listener will land in the same spot. The channel constant is exported from runtime-common so other packages can import it.

A small spec divergence on the NOTIFY transaction

The spec says: "NOTIFY ... '<payload>' inside the same transaction as each invalidation path's DELETE, so peers only see the signal on commit."

Each existing invalidation path (invalidate / clearRealmCache / clearAllModules) is a single autocommit DELETE — no BEGIN/COMMIT wrapper. The existing Realm.#notifyFileChange precedent emits pg_notify as a separate query after the autocommit. Same observable semantics under non-crash conditions: peer sees the notify after the DELETE has committed, because both autocommit in order. Mirroring #notifyFileChange keeps the two cross-instance pathways using the same idiom and avoids new tx-management code. The narrow "DELETE commits, then process crashes before pg_notify" window is identical to what the file-changes path already accepts.

Behavior

N=1 (today's production): inert. The emitter bumps synchronously before its DELETE; the listener replays a second monotonic bump that any in-flight snapshot would have already mismatched on the first bump.

N>1: immediate correctness win. Peer processes' in-flight prerenders observe the invalidation via NOTIFY before persist; their generation checks catch the mismatch and discard via readFromDatabaseCache instead of re-inserting. The only remaining window is bounded by NOTIFY latency (typically sub-millisecond on healthy Postgres) and self-heals on the next prerender of the same key.

Test plan

  • CI Realm Server suite green
  • CI Software Factory job green
  • CI Host suite green

New tests in realm-server/tests/module-cache-invalidation-listener-test.ts:

  • Payload parsing units (the four shapes + malformed/empty cases).
  • Dispatch units: handleNotification with each payload kind invokes the right bump on a stub lookup with the right args.
  • End-to-end with PgAdapter (same pattern as realm-file-changes-listener-test.ts): two CachingDefinitionLookup instances share one DB; A's invalidate(url) / clearRealmCache(url) / clearAllModules() produces the right bump on B's recorder via the listener.
  • Self-notify idempotence: emitter sees ≥2 bumps for the same URL (1 synchronous + 1 listener echo).
  • Manual pg_notify sends through the channel constant and the listener picks them up — proves wire compatibility for any future emitter outside CachingDefinitionLookup.

What this does NOT do (out of scope)

  • Cross-process populate coalescing (the prerender-coalescing half of the work). Tracked as CS-10953 — stacks on this PR.
  • Strict-consistency cross-process invalidation closure beyond what NOTIFY latency provides; the sub-millisecond window remains.

Related

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Host Test Results

    1 files  ±0      1 suites  ±0   1h 42m 17s ⏱️ - 2m 26s
2 563 tests +1  2 548 ✅ +1  15 💤 ±0  0 ❌ ±0 
2 582 runs  +1  2 567 ✅ +1  15 💤 ±0  0 ❌ ±0 

Results for commit 7378223. ± Comparison against earlier commit 33049fc.

Realm Server Test Results

    1 files  ± 0      1 suites  ±0   17m 35s ⏱️ +37s
1 259 tests + 9  1 259 ✅ + 9  0 💤 ±0  0 ❌ ±0 
1 338 runs  +15  1 338 ✅ +15  0 💤 ±0  0 ❌ ±0 

Results for commit 7fa9e02. ± Comparison against earlier commit 7378223.

lukemelia and others added 2 commits May 4, 2026 19:35
…ookup

Stacks on CS-10948's three-counter generation discard. In a single process
the counters keep an in-flight prerender from re-INSERTing a row a
concurrent invalidate just deleted. Across N processes that protection
doesn't span instances: an invalidate on one process doesn't bump peer
processes' counters, so a peer's mid-flight prerender can still re-persist
pre-invalidation state.

This change closes that gap by broadcasting invalidations over a new
`module_cache_invalidated` Postgres NOTIFY channel.

* runtime-common/definition-lookup.ts: emit one notify per URL in
  invalidate()'s fan-out (`module:<resolvedRealmURL>:<moduleURL>`), one
  notify for clearRealmCache() (`realm:<resolvedRealmURL>`), and one notify
  for clearAllModules() (`global`). Sequenced after the DELETE rather than
  wrapped in BEGIN/COMMIT — each existing path is a single autocommit
  DELETE, so sequential pg_notify has the same observable semantics as a
  same-tx notify. Best-effort try/catch with console.warn, mirroring
  Realm.#notifyFileChange. Promotes the bump helpers from private to
  public so the listener can replay them; adds bumpGlobalGeneration.
  Sqlite/in-memory adapters short-circuit before pg_notify.

* realm-server/lib/module-cache-invalidation-listener.ts (new):
  WorkLoop + PgAdapter.listen pattern, mirroring RealmFileChangesListener
  exactly. Parses payloads, dispatches to the appropriate bump method on
  the attached CachingDefinitionLookup. 60s safety poll. Self-notify is
  idempotent — emitter already bumped synchronously before its DELETE.

* realm-server/main.ts: construct + start alongside fileChangesListener,
  add to shutdown Promise.all.

The listener lives in realm-server rather than the path the spec
suggested (runtime-common/definition-lookup-coordination.ts) because
runtime-common doesn't depend on @cardstack/postgres and adding the dep
would be circular (postgres already imports DBAdapter from runtime-common).
The MODULE_CACHE_INVALIDATED_CHANNEL constant is exported from
runtime-common so consumers in other packages can match it.

Behavior at N=1 (today's production) is inert: the emitter has already
bumped its counter synchronously before the DELETE, and a self-notify
replay is a second monotonic bump that any in-flight snapshot would have
already mismatched on the first bump. At N>1 it closes the cross-process
half of the persist-after-invalidate race; the only remaining window is
the sub-millisecond NOTIFY latency, which self-heals on the next
prerender of the same key.

Tests: payload-parsing units, dispatch units, and end-to-end pg tests
where two CachingDefinitionLookup instances share one PgAdapter and an
A.invalidate / A.clearRealmCache / A.clearAllModules drives the right
bump on B's recorder (mirroring realm-file-changes-listener-test.ts's
shape). Plus a self-notify idempotence test.

Linear: https://linear.app/cardstack/issue/CS-10952

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lint job blocked the PR on a prettier nit (multi-line single-arg signature
should be one line). No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lukemelia lukemelia force-pushed the cs-10952-cross-process-invalidation-broadcast branch from 33049fc to 7378223 Compare May 4, 2026 23:35
@lukemelia lukemelia requested a review from Copilot May 5, 2026 03:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds cross-process coordination for CachingDefinitionLookup invalidation by broadcasting invalidations via Postgres NOTIFY and listening in peer realm-server processes to keep in-memory generation counters synchronized across instances. This closes the “persist-after-invalidate across processes” race where one instance can re-upsert a row another instance has deleted.

Changes:

  • Export a shared Postgres NOTIFY channel constant and emit invalidation notifications from CachingDefinitionLookup after DELETEs.
  • Introduce ModuleCacheInvalidationListener in realm-server to LISTEN for invalidation payloads and replay generation bumps locally.
  • Add a realm-server test suite covering payload parsing, dispatch behavior, and end-to-end LISTEN/NOTIFY behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/runtime-common/definition-lookup.ts Exports invalidation channel constant, makes generation bump APIs callable by a listener, and emits best-effort pg_notify messages after invalidation deletes.
packages/realm-server/lib/module-cache-invalidation-listener.ts New LISTEN-based listener that parses invalidation payloads and bumps the attached CachingDefinitionLookup counters.
packages/realm-server/main.ts Wires up and shuts down the new invalidation listener alongside the existing file-changes listener.
packages/realm-server/tests/module-cache-invalidation-listener-test.ts New tests for parsing, dispatch, and end-to-end PG notifications (including self-notify idempotence).
packages/realm-server/tests/index.ts Registers the new test module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/runtime-common/definition-lookup.ts
Comment on lines 563 to +576
});
fileChangesListener.start();

// Cross-instance module-cache invalidation (CS-10952). When a peer
// realm-server emits NOTIFY module_cache_invalidated, replay the bump on
// this instance's CachingDefinitionLookup so its in-flight prerenders
// observe the invalidation at persist time and discard stale results.
// Self-notify is harmless — the emitter already bumped synchronously
// before the DELETE; a second bump from the listener loop is idempotent.
moduleCacheInvalidationListener = new ModuleCacheInvalidationListener({
dbAdapter,
definitionLookup,
});
moduleCacheInvalidationListener.start();
Address Copilot review on #4644: notifyModuleCacheInvalidations
issued one pg_notify round-trip per URL, which N+1's against
invalidate()'s fan-out (the source URL plus extension variants plus
transitive consumers from calculateInvalidations).

Switch the wire format from `<kind>:<...>` strings to JSON. Module
fan-out now packs URLs into a single payload of the form
{"k":"module","r":<realm>,"m":[<url>,...]} and the emitter chunks
the list to stay under a 7000-byte safety budget below Postgres's
8000-byte NOTIFY cap. Common case is one notify per invalidate;
worst case is a handful of chunks.

With M peer processes that turns M*N listener wakeups (and N
notification deliveries per peer) into M wakeups carrying the same N
bumps. Listener does the same N bumps either way.

Realm and global notifications also move to JSON ({"k":"realm",...},
{"k":"global"}) for parser uniformity. Listener parses JSON and
dispatches; updated parsing/dispatch unit tests and the manual-NOTIFY
end-to-end test cover both single- and multi-URL module payloads,
malformed JSON, missing fields, and non-string array entries.

Doing this in this PR (rather than a follow-up) so there is never a
moment where peers running mixed code disagree on payload format —
N=1 in production today, so no cross-process traffic exists yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants