CS-11156: cross-replica clearLocalCaches broadcast via NOTIFY#4842
Draft
lukemelia wants to merge 5 commits into
Conversation
The CS-11043 publish-realm fix invalidates the publish-handling replica's #sourceCache / #moduleCache before the reindex enqueues so the reindex's prerender doesn't see pre-swap bytes. That fix is correct on one replica. On two+ replicas behind a load balancer, peers still hold pre-swap bytes in their own caches and the reindex's HTTP fan-out to peers serves stale source — back into boxel_index.isolated_html, served forever. Extends the existing per-path `realm_file_changes` NOTIFY channel with a bulk payload `<realmURL>:*` meaning "drop every cached path for this realm". Wired into publish, unpublish, and delete realm handlers; on receive, peers call `Realm.clearLocalCaches()`. * runtime-common/realm.ts: `REALM_FILE_CHANGES_WILDCARD` sentinel, standalone `notifyAllFileChanges(dbAdapter, realmURL)` emitter, and `Realm.notifyAllFileChanges()` instance form. Same fire-and-forget semantics as `Realm.#notifyFileChange`; missed NOTIFY is a bounded staleness window per §9 of the registry doc, not data corruption. * realm-file-changes-listener.ts: dispatch branches on the wildcard payload to `Realm.clearLocalCaches()`. Existing per-path parser + realm lookup reused as-is. * handle-publish-realm.ts: keeps the sync local `clearLocalCaches()` before the reindex enqueue (replica's own prerender fan-out must bypass its cache) and adds the broadcast after. Self-NOTIFY is a no-op since clearLocalCaches is idempotent. * handle-unpublish-realm.ts and handle-delete-realm.ts: broadcast after the FS removal. Defense-in-depth against the brief window before peers unmount via `NOTIFY realm_registry`. Tests in realm-file-changes-listener-test.ts: * parsePayload returns `path: '*'` for both `host:port` and port-less URLs * dispatch routes wildcard to `clearLocalCaches`, not `invalidateCache` * end-to-end through the live LISTEN client: the new emitter → Postgres NOTIFY → the listener → `clearLocalCaches` on a fake peer-side realm Stacks on #4840 (CS-11125 — per-realm advisory locks on the data plane). The lock is what makes the broadcast's "after the swap" ordering meaningful — without serialization a concurrent same-realm write could land in the staleness window. Linear: https://linear.app/cardstack/issue/CS-11156 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Preview deploymentsHost Test Results 1 files ±0 1 suites ±0 1h 37m 53s ⏱️ + 1m 37s Results for commit bbdeef8. ± Comparison against earlier commit 4691944. For more details on these errors, see this check. Realm Server Test Results 1 files ±0 1 suites ±0 9m 28s ⏱️ -10s Results for commit bbdeef8. ± Comparison against earlier commit 4691944. |
4 tasks
Follow-up to the initial CS-11156 PR. The publish-realm handler had to
call two methods in sequence to fully invalidate the publishing
replica's cache plus all peers' caches:
mountedRealmForCacheClear.clearLocalCaches();
await mountedRealmForCacheClear.notifyAllFileChanges();
Every future emitter would have to remember both lines. Mirroring
`CachingDefinitionLookup.clearRealmCache(url)` — which bundles local
generation bump + DB DELETE + cross-instance NOTIFY in one method —
introduce `Realm.clearLocalCachesAndBroadcast()` that does both steps
and let the handler make one call.
Also drop `Realm.notifyAllFileChanges()`. It was a thin wrapper around
the standalone free function `notifyAllFileChanges(dbAdapter, url)` and
they were used inconsistently — publish used the method, unpublish
used the free function despite having a Realm instance in scope. The
two surfaces collapse to one clear rule:
- Need local clear AND broadcast (publish handler, realm staying up):
`realm.clearLocalCachesAndBroadcast()`.
- Need ONLY the peer broadcast (unpublish/delete handlers, realm
being torn down — local cache is about to be GC'd with the Realm
instance): `notifyAllFileChanges(dbAdapter, url)`.
`Realm.clearLocalCaches()` stays as the local-only primitive the
LISTEN handler calls on receive (no broadcast, no NOTIFY loop). The
free function `notifyAllFileChanges` is the single cross-replica emit
surface — the Realm class no longer needs to know about channel names
or payload formats.
No behavior change. All 16 realm-file-changes-listener tests still
pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The publish-realm handler had a hand-rolled `DELETE FROM modules WHERE
resolved_realm_url = $1` to drop stale error entries before the reindex
fan-out. That covers the DB rows but is strictly weaker than
`CachingDefinitionLookup.clearRealmCache(url)`, which:
1. bumps the per-realm generation counter so in-flight prerenders
on this replica that started before the DELETE see a mismatch at
persist time and discard their result instead of re-inserting a
row this invalidation just removed,
2. drops in-flight prerender promises for the realm so new callers
install their own pending against post-swap state rather than
joining a stale shared transpile,
3. runs the same DELETE, and
4. broadcasts on `module_cache_invalidated` so peer realm-server
replicas perform 1-3 on their own state.
The raw DELETE did only step 3. The reindex worker's prerender fan-out
fires immediately after this code path through HTTP into both this
realm-server and its peers, so missing steps 1, 2, and 4 was exactly
the modules-cache analog of the byte-cache staleness this PR fixes via
`clearLocalCachesAndBroadcast()`.
`clearRealmCache` already runs via the post-fullIndex completion path
in `Realm.startReindex` (realm.ts:1068), but that's at the *end* of
the reindex — too late for the prerender fan-out at the start. Running
it pre-reindex ensures the rebuild starts against a coherent cache on
every replica.
`definitionLookup` is already plumbed through `CreateRoutesArgs`; the
handler just needed to destructure it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two completely different caches were both called "the module cache" in
this codebase:
- `Realm.#moduleCache` — in-process bytes of transpiled JS (the
prerender's input)
- `CachingDefinitionLookup`'s `modules` DB table — assembled card
definitions (the prerender's output)
Both even had a type named `ModuleCacheEntry` with different shapes. The
juxtaposition in `handle-publish-realm.ts` after #4842
(`definitionLookup.clearRealmCache(url)` next to
`realm.clearLocalCachesAndBroadcast()`) made the collision impossible
to ignore.
This commit renames the Realm-side cache to make the "transpiled JS
bytes" framing explicit at the API surface, and renames the public
cache-wipe methods so each call site self-documents which cache it
touches.
- `Realm.#moduleCache` → `Realm.#transpiledModuleCache`
- Type `ModuleCacheEntry` (in `realm.ts`, local to that file) →
`TranspiledModuleEntry`
- `Realm.clearLocalCaches()` → `Realm.clearLocalSourceCaches()`
- `Realm.clearLocalCachesAndBroadcast()` →
`Realm.clearLocalSourceCachesAndBroadcast()`
- Internal helpers renamed consistently
(`#dropModuleCacheEntry`, `#bumpModuleCacheGeneration`, the
generation maps, etc.)
Mechanical rename — no behavior change. 16/16 listener tests pass.
Tier 2 (DefinitionLookup-side renames: `ModuleCacheEntry` →
`DefinitionCacheEntry`, `clearRealmCache` → `clearRealmDefinitions`,
`clearAllModules` → `clearAllDefinitions`, etc.) is a separate
follow-up commit. Tier 3 (DB column + NOTIFY channel rename, needs a
deploy plan for rolling-update compatibility) is deliberately deferred.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`CachingDefinitionLookup` caches assembled card *definitions* (per-export
results, error entries, dependency lists). It's not a module-byte cache.
But every public type and method on it was named "module cache" or
"modules" — which collided directly with `Realm.#transpiledModuleCache`
(renamed last commit), the actual JS-bytes cache.
Public API now reads as what it does:
- `ModuleCacheEntry` → `DefinitionCacheEntry`
- `ModuleCacheEntries` → `DefinitionCacheEntries`
- `ModuleCacheEntryQuery` → `DefinitionCacheEntryQuery`
- `getModuleCacheEntry` → `getCachedDefinitions`
- `getModuleCacheEntries` → `getCachedDefinitionsBatch`
- `clearAllModules` → `clearAllDefinitions`
- `clearRealmCache` → `clearRealmDefinitions`
Plus internal-consistency renames on the notify-emitter helpers
(`notifyModuleCacheInvalidations` → `notifyDefinitionCacheInvalidations`,
etc.).
What deliberately did NOT move (Tier 3, deferred — needs a deploy
plan for rolling-update compatibility between replicas listening on
the old vs. new channel name):
- `modules` DB table name and the `MODULES_TABLE` JS constant
- `module_cache_invalidated` NOTIFY channel name and the
`MODULE_CACHE_INVALIDATED_CHANNEL` constant
- File names containing "module-cache-*"
All 16 realm-file-changes-listener tests, 21 module-cache-invalidation-
listener tests, and 9 module-cache-coordination tests pass after the
rename. `tsc` clean across runtime-common / realm-server / host.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
#sourceCache/#moduleCacheonly on the replica that processes the request. On 2+ replicas, peers keep pre-swap bytes and the reindex's prerender HTTP fan-out lands stale source inboxel_index.isolated_html— served forever. Same shape on unpublish / delete during the window between the registry-row commit and the peer-side reconciler unmount.realm_file_changesNOTIFY channel with a bulk payload<realmURL>:*meaning "drop every cached path for this realm." Wired into publish, unpublish, and delete realm handlers; on receive, peers callRealm.clearLocalCaches().DELETE FROM modules WHERE resolved_realm_url = $1withdefinitionLookup.clearRealmCache(url)so the modules cache invalidation gets the same cross-replica treatment (generation bump + in-flight drop + DELETE + NOTIFY) instead of just the DB delete.Stacks on #4840.
Linear: CS-11156.
API surface
Mirrors
CachingDefinitionLookup.clearRealmCache(url)— one method bundles local invalidation + cross-instance NOTIFY so handlers don't have to remember both steps. Three entry points after this PR:realm.clearLocalCachesAndBroadcast()notifyAllFileChanges(dbAdapter, url)(free function)realm.clearLocalCaches()The modules cache (separate from the in-process byte caches above) keeps its existing
definitionLookup.clearRealmCache(url)entry point — this PR just stops bypassing it from the publish handler.What's in
runtime-common/realm.ts—REALM_FILE_CHANGES_WILDCARD = '*'sentinel, standalonenotifyAllFileChanges(dbAdapter, realmURL)emitter (the single cross-replica emit surface — Realm doesn't need to know about channel names or payload formats), andRealm.clearLocalCachesAndBroadcast()instance method that bundlesclearLocalCaches()+ the free-function emit. Same best-effort fire-and-forget shape asRealm.#notifyFileChange; missed NOTIFY is a bounded staleness window per §9 ofdocs/db-authoritative-realm-registry.md, not data corruption.realm-file-changes-listener.ts— dispatch branches onpath === '*'toRealm.clearLocalCaches(). Existing regex parser + realm lookup reused as-is (the wildcard payload parses cleanly withpath = '*').handle-publish-realm.ts—DELETE FROM modules WHERE resolved_realm_url = $1withawait definitionLookup.clearRealmCache(publishedRealmURL)so the modules-cache invalidation also bumps the per-realm generation counter, drops in-flight prerender promises, and broadcasts onmodule_cache_invalidated— the modules-table analog of the byte-cache fix this PR is making. Without those extra steps an in-flight prerender that started before the DELETE could re-insert a stale row at persist time, and peer replicas would keep their cached rows + generation counters until their own next invalidation arrived.clearRealmCachealready runs via the post-fullIndex completion path (realm.ts:1068) but that's at the end of the reindex — too late for the prerender fan-out at the start.await mountedRealmForCacheClear.clearLocalCachesAndBroadcast()) for the byte-cache wipe + cross-replica broadcast before the reindex enqueue. Self-NOTIFY is a no-op sinceclearLocalCachesis idempotent.handle-unpublish-realm.tsandhandle-delete-realm.ts— call the standalonenotifyAllFileChanges(dbAdapter, url)after the FS removal. No local clear needed: the realm is about to be unmounted, so the in-process cache will be garbage-collected with theRealminstance. Defense-in-depth against the brief window before peers unmount viaNOTIFY realm_registry. (Per-filedeleteAllin unpublish already emits per-path NOTIFYs; this is the catch-all for the registry-commit-to-unmount window.)Tests
packages/realm-server/tests/realm-file-changes-listener-test.ts(12 existing pass; 4 new):parsePayloadround-trips<realmURL>:*topath: '*'for both port-bearing and port-less URLs.clearLocalCaches()exactly once and neverinvalidateCache().notifyAllFileChangesemitter → Postgres NOTIFY → listener →clearLocalCacheson a fake peer-side Realm.Why stack on CS-11125
The advisory lock from #4840 is what makes the broadcast's "after the swap" ordering meaningful. Without serialization, a concurrent same-realm write could land in the staleness window between the registry pointer flip and the NOTIFY landing on a peer.
Compatibility
clearLocalCaches()insideclearLocalCachesAndBroadcast()still runs; the NOTIFY is a no-op when no other replicas are LISTENing.clearRealmCachewas already in use in the post-fullIndex completion path; calling it pre-reindex too is purely additive.notifyis a no-op there.Test plan
realm-file-changes-listener-test.ts(16/16) including 4 new wildcard teststsconpackages/runtime-common+packages/realm-server— no new errorsclearLocalCaches()restoration on its branch)🤖 Generated with Claude Code