CS-11119: cross-replica clearInFlightSearch via realm_index_updated NOTIFY#4862
Conversation
…OTIFY Phase 1 of CS-11115 added `#inFlightSearch` on `RealmIndexQueryEngine` to dedupe concurrent same-key `searchCards` calls. The local clear is wired to every same-replica swap event (`Realm.update` onInvalidation + `Realm.fullReindex` after `await completed`). Cross-replica writes don't reach those callbacks: a peer's worker commits a swap to the shared `boxel_index`, but this replica's `#inFlightSearch` keeps coalescing post-swap callers into pre-swap pending promises until each promise self-resolves. Bounded window — typically <1 s, but 5–10 s for piranha- class deep `populateQueryFields` walks. Close the window with a new per-realm post-swap NOTIFY channel: - `REALM_INDEX_SWAPPED_CHANNEL = 'realm_index_swapped'`, payload is the realm URL. - `notifyRealmIndexSwapped(dbAdapter, realmURL)` free function, parallel in shape to CS-11156's `notifyAllFileChanges`. Best-effort, same bounded-staleness semantics as the other realm-server NOTIFY channels. - New `Realm.clearInFlightSearches()` public method (for the listener) + `Realm.clearInFlightSearchesAndBroadcast()` that bundles local clear with the cross-replica NOTIFY. Same shape as `clearLocalSourceCachesAndBroadcast`. - Three existing swap sites in `realm.ts` (sync `Realm.update` onInvalidation, deferred-enqueue onInvalidation, and `Realm.fullReindex` post-`completed`) now call `clearInFlightSearchesAndBroadcast()` instead of reaching into the private query engine. - New `RealmIndexSwappedListener` in `packages/realm-server/lib/`, parallel in shape to `RealmFileChangesListener`. On NOTIFY, looks up the mounted realm and calls `clearInFlightSearches()`. Wired into `main.ts` startup + shutDown alongside the file-changes listener. Why a new channel, not `realm_file_changes` or `jobs_finished`: - `realm_file_changes` fires at file-WRITE time (before indexing). At that moment the shared `boxel_index` hasn't swapped yet — clearing `#inFlightSearch` then would make new callers re-do the same search against unchanged DB state. We need a swap-time signal, not a write- time one. Mixing layers would re-introduce that bug. - `jobs_finished` is payload-less and fires for every worker job system-wide. Forcing every replica to scan every mounted realm's `#inFlightSearch` on every job completion is a sledgehammer. The new channel fires once per swap per realm. Composes orthogonally with CS-11156: that PR keeps the wildcard `*` on `realm_file_changes` for byte-cache invalidation at publish/unpublish/ delete (write-event). This PR adds the swap-event channel. Two together close cross-replica staleness across byte caches and the search-coalesce map. Tests: 6 new (4 dispatch, 2 LISTEN end-to-end through a real `PgAdapter.subscribe` round-trip). All pass locally. Linear: https://linear.app/cardstack/issue/CS-11119/audit-phase-1-inflightsearch-for-multi-replica-staleness Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The channel name should not leak the boxel_index_working → boxel_index swap mechanism; "update" matches the public `Realm.update()` method name and the past-participle pattern of sibling channels (`jobs_finished`, `module_cache_invalidated`). Renames the constant, free function, listener class, file names, and channel string. Comments that specifically describe the implementation detail (e.g. "boxel_index has just swapped") keep "swap" as appropriate jargon. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The local/broadcast pair is now named for the umbrella concept ("realm
index caches") rather than the single specific cache currently under
that umbrella (#inFlightSearch). Matches the framing of the
realm_index_updated channel and mirrors CS-11156's
clearLocalSourceCachesAndBroadcast naming, where "source caches"
groups #sourceCache + the transpiled-module cache under one method.
Forward-compatible if future caches join the post-update invalidation
group; comments still document that #inFlightSearch is what's cleared
today.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The post-fullReindex completion path already paired the in-flight search clear with `invalidateCachedRealmInfo()`. Move that into the umbrella method so the two read-side caches (#inFlightSearch + #cachedRealmInfo) clear together at every realm-index-update event, and peer replicas receiving the realm_index_updated NOTIFY pick up the cachedRealmInfo invalidation for free — closing a real gap where a from-scratch reindex that re-parsed /realm.json on the publishing replica left peer replicas serving stale ETags. The other invalidateCachedRealmInfo callsites (pre-fullIndex setup, the route handler for /realm.json edits, the publish/unpublish cross-realm metadata effects) stay direct — they aren't index-update events. The doc on clearRealmIndexCaches now names both members of the umbrella. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new Postgres NOTIFY channel realm_index_updated so that when one replica's worker commits an index swap to the shared boxel_index, peer replicas drop their RealmIndexQueryEngine#inFlightSearch (and #cachedRealmInfo) caches. This closes a small cross-replica staleness window where post-update callers on a peer would coalesce into pre-update pending search promises.
Changes:
- Add
REALM_INDEX_UPDATED_CHANNELconstant +notifyRealmIndexUpdated()helper and newRealm.clearRealmIndexCaches()/clearRealmIndexCachesAndBroadcast()methods; route the three local post-update sites (sync/deferredRealm.updateonInvalidation andRealm.fullReindex) through the umbrella. - New
RealmIndexUpdatedListenerinpackages/realm-server/lib/, parallel in shape toRealmFileChangesListener, wired intomain.tsstartup and shutdown. - Add
realm-index-updated-listener-test.ts(4 dispatch + 2 LISTEN end-to-end tests) and register it in the test index.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| packages/runtime-common/realm.ts | Define new NOTIFY channel + helper; add clearRealmIndexCaches/...AndBroadcast; route the three post-update sites through the umbrella and drop the redundant invalidateCachedRealmInfo at fullReindex completion. |
| packages/realm-server/lib/realm-index-updated-listener.ts | New listener subscribing to realm_index_updated, dispatching to realm.clearRealmIndexCaches() for mounted realms. |
| packages/realm-server/main.ts | Instantiate RealmIndexUpdatedListener at startup and include it in shutdown sequence. |
| packages/realm-server/tests/realm-index-updated-listener-test.ts | Unit + LISTEN end-to-end tests for dispatch, unmounted-realm drop, and empty/undefined payload guards. |
| packages/realm-server/tests/index.ts | Register the new test module in the runner. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Host Test Results 1 files ± 0 1 suites ±0 1h 45m 42s ⏱️ - 1h 42m 22s Results for commit 8328ba5. ± Comparison against earlier commit 9359cff. Realm Server Test Results 1 files ±0 1 suites ±0 7m 51s ⏱️ -9s Results for commit 8328ba5. ± Comparison against earlier commit 9359cff. |
…1-inflightsearch-for-multi-replica-staleness # Conflicts: # packages/runtime-common/realm.ts
Summary
Phase 1 of CS-11115 added
#inFlightSearchonRealmIndexQueryEngine(aMap<string, Promise<LinkableCollectionDocument>>) to dedupe concurrent same-keysearchCardscalls per replica. Local clears are wired to every same-replica index-update event (Realm.update'sonInvalidation+Realm.fullReindexafterawait completed). The cross-replica gap: when a peer's worker commits a swap to the sharedboxel_index, this replica's#inFlightSearchkeeps coalescing post-update callers into pre-update pending promises until each self-resolves — bounded window, typically <1 s but 5–10 s for piranha-class deeppopulateQueryFieldswalks.This PR closes the window with a new per-realm post-update NOTIFY channel, and folds the existing
#cachedRealmInfoinvalidation that already cohabited the same lifecycle point into the same umbrella.What's in
REALM_INDEX_UPDATED_CHANNEL = 'realm_index_updated'(constant inruntime-common/realm.ts), payload = realm URL.notifyRealmIndexUpdated(dbAdapter, realmURL)free function, parallel in shape to CS-11156'snotifyAllFileChanges. Same best-effort semantics — a missed NOTIFY is a bounded staleness window, not data corruption.Realm.clearRealmIndexCaches()new public method that drops every read-side cache deriving from the realm's index — currently#inFlightSearch+#cachedRealmInfo. For the listener; mirrorsRealm.invalidateCacheas a public NOTIFY-receiver entry point.Realm.clearRealmIndexCachesAndBroadcast()that bundles local clear + cross-replica NOTIFY in one call. Same shape asclearLocalSourceCachesAndBroadcastfrom CS-11156.realm.ts(syncRealm.updateonInvalidation, deferred-enqueue onInvalidation,Realm.fullReindexpost-completed) now callclearRealmIndexCachesAndBroadcast()instead of reaching into the private query engine. The redundant siblinginvalidateCachedRealmInfo()at thefullReindexcompletion path is dropped — now covered by the umbrella.RealmIndexUpdatedListenerinpackages/realm-server/lib/, parallel in shape toRealmFileChangesListener. On NOTIFY, looks up the mounted realm and callsclearRealmIndexCaches(). Wired intomain.tsstartup + shutDown alongside the file-changes listener.Umbrella scope
clearRealmIndexCachescovers the read-side caches that go stale whenboxel_indexmoves:#inFlightSearch— the searchCards coalesce map (CS-11115 Phase 1). Was the original audit target.#cachedRealmInfo+ its ETag hash — derived from indexed metadata andrealm_registryrows. A from-scratch reindex pass that re-parses/realm.jsoninvalidates it; was already paired with the in-flight clear at thefullReindexcompletion site. Folding into the umbrella picks up free cross-replica invalidation on peer replicas, closing a small parallel gap.The other
invalidateCachedRealmInfo()callsites (pre-fullIndex setup, the/realm.jsonroute handler, publish/unpublish cross-realm metadata effects) stay direct — they aren't index-update events.Naming
The channel is named "updated" rather than "swapped": "swap" is internal jargon for the
boxel_index_working→boxel_indexatomic moment, while "updated" matches the dominant public terminology (Realm.update(),RealmIndexUpdater) and the past-participle pattern of sibling channels (jobs_finished,module_cache_invalidated). The methods use "realm index caches" as the umbrella term mirroring CS-11156's "source caches" framing.Why a new channel, not an existing one
realm_file_changesboxel_indexhasn't been updated yet — clearing#inFlightSearchthen would make new callers re-do the same search against unchanged DB state. Wrong layer + wrong time.jobs_finished#inFlightSearchon every job completion is a sledgehammer.realm_index_updated(new)Composition with CS-11156
CS-11156 keeps its
realm_file_changes:*wildcard for byte-cache invalidation at publish/unpublish/delete (write-event triggered). This PR adds the orthogonal update-event channel. The two together close cross-replica staleness across both byte caches and the read-side index caches.Tests
packages/realm-server/tests/realm-index-updated-listener-test.ts— 6 new tests, all pass locally:handleNotificationto mounted realm, dropping unmounted, undefined/empty payload guards)PgAdapter.subscriberound-trip (mounted realm clears; unmounted realm is dropped)Test plan
realm-index-updated-listener-test.ts(6/6 pass locally)tsconruntime-common+realm-server— cleanLinear
CS-11119.
🤖 Generated with Claude Code