CS-10952: cross-process invalidation broadcast for CachingDefinitionLookup#4644
CS-10952: cross-process invalidation broadcast for CachingDefinitionLookup#4644
Conversation
Host Test Results 1 files ±0 1 suites ±0 1h 42m 17s ⏱️ - 2m 26s Results for commit 7378223. ± Comparison against earlier commit 33049fc. Realm Server Test Results 1 files ± 0 1 suites ±0 17m 35s ⏱️ +37s Results for commit 7fa9e02. ± Comparison against earlier commit 7378223. |
…ookup Stacks on CS-10948's three-counter generation discard. In a single process the counters keep an in-flight prerender from re-INSERTing a row a concurrent invalidate just deleted. Across N processes that protection doesn't span instances: an invalidate on one process doesn't bump peer processes' counters, so a peer's mid-flight prerender can still re-persist pre-invalidation state. This change closes that gap by broadcasting invalidations over a new `module_cache_invalidated` Postgres NOTIFY channel. * runtime-common/definition-lookup.ts: emit one notify per URL in invalidate()'s fan-out (`module:<resolvedRealmURL>:<moduleURL>`), one notify for clearRealmCache() (`realm:<resolvedRealmURL>`), and one notify for clearAllModules() (`global`). Sequenced after the DELETE rather than wrapped in BEGIN/COMMIT — each existing path is a single autocommit DELETE, so sequential pg_notify has the same observable semantics as a same-tx notify. Best-effort try/catch with console.warn, mirroring Realm.#notifyFileChange. Promotes the bump helpers from private to public so the listener can replay them; adds bumpGlobalGeneration. Sqlite/in-memory adapters short-circuit before pg_notify. * realm-server/lib/module-cache-invalidation-listener.ts (new): WorkLoop + PgAdapter.listen pattern, mirroring RealmFileChangesListener exactly. Parses payloads, dispatches to the appropriate bump method on the attached CachingDefinitionLookup. 60s safety poll. Self-notify is idempotent — emitter already bumped synchronously before its DELETE. * realm-server/main.ts: construct + start alongside fileChangesListener, add to shutdown Promise.all. The listener lives in realm-server rather than the path the spec suggested (runtime-common/definition-lookup-coordination.ts) because runtime-common doesn't depend on @cardstack/postgres and adding the dep would be circular (postgres already imports DBAdapter from runtime-common). The MODULE_CACHE_INVALIDATED_CHANNEL constant is exported from runtime-common so consumers in other packages can match it. Behavior at N=1 (today's production) is inert: the emitter has already bumped its counter synchronously before the DELETE, and a self-notify replay is a second monotonic bump that any in-flight snapshot would have already mismatched on the first bump. At N>1 it closes the cross-process half of the persist-after-invalidate race; the only remaining window is the sub-millisecond NOTIFY latency, which self-heals on the next prerender of the same key. Tests: payload-parsing units, dispatch units, and end-to-end pg tests where two CachingDefinitionLookup instances share one PgAdapter and an A.invalidate / A.clearRealmCache / A.clearAllModules drives the right bump on B's recorder (mirroring realm-file-changes-listener-test.ts's shape). Plus a self-notify idempotence test. Linear: https://linear.app/cardstack/issue/CS-10952 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lint job blocked the PR on a prettier nit (multi-line single-arg signature should be one line). No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
33049fc to
7378223
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds cross-process coordination for CachingDefinitionLookup invalidation by broadcasting invalidations via Postgres NOTIFY and listening in peer realm-server processes to keep in-memory generation counters synchronized across instances. This closes the “persist-after-invalidate across processes” race where one instance can re-upsert a row another instance has deleted.
Changes:
- Export a shared Postgres NOTIFY channel constant and emit invalidation notifications from
CachingDefinitionLookupafter DELETEs. - Introduce
ModuleCacheInvalidationListenerin realm-server toLISTENfor invalidation payloads and replay generation bumps locally. - Add a realm-server test suite covering payload parsing, dispatch behavior, and end-to-end LISTEN/NOTIFY behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/runtime-common/definition-lookup.ts | Exports invalidation channel constant, makes generation bump APIs callable by a listener, and emits best-effort pg_notify messages after invalidation deletes. |
| packages/realm-server/lib/module-cache-invalidation-listener.ts | New LISTEN-based listener that parses invalidation payloads and bumps the attached CachingDefinitionLookup counters. |
| packages/realm-server/main.ts | Wires up and shuts down the new invalidation listener alongside the existing file-changes listener. |
| packages/realm-server/tests/module-cache-invalidation-listener-test.ts | New tests for parsing, dispatch, and end-to-end PG notifications (including self-notify idempotence). |
| packages/realm-server/tests/index.ts | Registers the new test module. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }); | ||
| fileChangesListener.start(); | ||
|
|
||
| // Cross-instance module-cache invalidation (CS-10952). When a peer | ||
| // realm-server emits NOTIFY module_cache_invalidated, replay the bump on | ||
| // this instance's CachingDefinitionLookup so its in-flight prerenders | ||
| // observe the invalidation at persist time and discard stale results. | ||
| // Self-notify is harmless — the emitter already bumped synchronously | ||
| // before the DELETE; a second bump from the listener loop is idempotent. | ||
| moduleCacheInvalidationListener = new ModuleCacheInvalidationListener({ | ||
| dbAdapter, | ||
| definitionLookup, | ||
| }); | ||
| moduleCacheInvalidationListener.start(); |
Address Copilot review on #4644: notifyModuleCacheInvalidations issued one pg_notify round-trip per URL, which N+1's against invalidate()'s fan-out (the source URL plus extension variants plus transitive consumers from calculateInvalidations). Switch the wire format from `<kind>:<...>` strings to JSON. Module fan-out now packs URLs into a single payload of the form {"k":"module","r":<realm>,"m":[<url>,...]} and the emitter chunks the list to stay under a 7000-byte safety budget below Postgres's 8000-byte NOTIFY cap. Common case is one notify per invalidate; worst case is a handful of chunks. With M peer processes that turns M*N listener wakeups (and N notification deliveries per peer) into M wakeups carrying the same N bumps. Listener does the same N bumps either way. Realm and global notifications also move to JSON ({"k":"realm",...}, {"k":"global"}) for parser uniformity. Listener parses JSON and dispatches; updated parsing/dispatch unit tests and the manual-NOTIFY end-to-end test cover both single- and multi-URL module payloads, malformed JSON, missing fields, and non-string array entries. Doing this in this PR (rather than a follow-up) so there is never a moment where peers running mixed code disagree on payload format — N=1 in production today, so no cross-process traffic exists yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First sub-issue under the CS-10950 umbrella ("Cross-process coordination for CachingDefinitionLookup"). Closes the invalidation half. CS-10953 (cross-process populate coalescing via advisory-lock + NOTIFY) stacks on top.
Why
CS-10948 added in-process generation counters (
#moduleGenerations,#realmGenerations,#globalGeneration) to discard an in-flight prerender result if a concurrent invalidate/clearRealmCache/clearAllModules ran during the prerender. That fixes the persist-after-invalidate race within one realm-server process.Across N processes the protection doesn't span instances. If process A is mid-prerender for
foo.gtsand process B callsinvalidate('foo.gts'), B bumps its own counters and deletes the row, but A's counters are untouched — A's persist-time generation check sees no mismatch and the UPSERT re-inserts the row B just deleted.This PR broadcasts every invalidation over a Postgres NOTIFY channel so peer processes' counters stay in lockstep with the DB.
What changes
runtime-common/definition-lookup.tsMODULE_CACHE_INVALIDATED_CHANNEL = 'module_cache_invalidated'constant.bumpModuleGenerationandbumpRealmGenerationfromprivateto public (so the listener can replay them); adds abumpGlobalGeneration()helper for symmetry.pg_notifyafter each invalidation path's DELETE:invalidate(moduleURL)→ one notify per URL inuniqueInvalidations(target + dependency-graph fan-out):module:<resolvedRealmURL>:<moduleURL>clearRealmCache(url)→ singlerealm:<resolvedRealmURL>notifyclearAllModules()→ singleglobalnotifyRealm.#notifyFileChange). The local instance's counters are already bumped synchronously before the DELETE, so a notify failure is a bounded cross-instance staleness window — peers self-heal on their next prerender.dbAdapter.kindshort-circuits beforepg_notify.realm-server/lib/module-cache-invalidation-listener.ts(new)ModuleCacheInvalidationListenerfollowsRealmFileChangesListener's shape exactly:PgAdapter.listen(Pool LISTEN is unreliable per node-postgres#1543).WorkLoopfor predictable shutdown.bumpModuleGeneration/bumpRealmGeneration/bumpGlobalGenerationon the attachedCachingDefinitionLookup.Self-notify is idempotent: the emitter already bumped synchronously before its DELETE, and a second monotonic bump from receiving its own NOTIFY is observationally equivalent (counters are only used for snapshot equality).
realm-server/main.tsConstructs and
start()s alongsidefileChangesListener; adds.shutDown()to the existing shutdownPromise.all.A small structural divergence from the ticket
The CS-10952 spec suggested
packages/runtime-common/definition-lookup-coordination.ts.runtime-commondoesn't depend on@cardstack/postgres(and shouldn't —@cardstack/postgresalready depends onruntime-commonviaDBAdapter, so the dep would be circular). The listener needsPgAdapter.listenandWorkLoop, both from@cardstack/postgres, so it lives inrealm-server/lib/next toRealmFileChangesListener. Behavior is identical; only the file home differs. CS-10953's sibling listener will land in the same spot. The channel constant is exported fromruntime-commonso other packages can import it.A small spec divergence on the NOTIFY transaction
The spec says: "
NOTIFY ... '<payload>'inside the same transaction as each invalidation path's DELETE, so peers only see the signal on commit."Each existing invalidation path (
invalidate/clearRealmCache/clearAllModules) is a single autocommit DELETE — noBEGIN/COMMITwrapper. The existingRealm.#notifyFileChangeprecedent emitspg_notifyas a separate query after the autocommit. Same observable semantics under non-crash conditions: peer sees the notify after the DELETE has committed, because both autocommit in order. Mirroring#notifyFileChangekeeps the two cross-instance pathways using the same idiom and avoids new tx-management code. The narrow "DELETE commits, then process crashes before pg_notify" window is identical to what the file-changes path already accepts.Behavior
N=1 (today's production): inert. The emitter bumps synchronously before its DELETE; the listener replays a second monotonic bump that any in-flight snapshot would have already mismatched on the first bump.
N>1: immediate correctness win. Peer processes' in-flight prerenders observe the invalidation via NOTIFY before persist; their generation checks catch the mismatch and discard via
readFromDatabaseCacheinstead of re-inserting. The only remaining window is bounded by NOTIFY latency (typically sub-millisecond on healthy Postgres) and self-heals on the next prerender of the same key.Test plan
New tests in
realm-server/tests/module-cache-invalidation-listener-test.ts:handleNotificationwith each payload kind invokes the right bump on a stub lookup with the right args.realm-file-changes-listener-test.ts): twoCachingDefinitionLookupinstances share one DB; A'sinvalidate(url)/clearRealmCache(url)/clearAllModules()produces the right bump on B's recorder via the listener.pg_notifysends through the channel constant and the listener picks them up — proves wire compatibility for any future emitter outsideCachingDefinitionLookup.What this does NOT do (out of scope)
Related
realm_file_changesNOTIFY pattern this mirrors🤖 Generated with Claude Code