Skip to content

CS-11119: cross-replica clearInFlightSearch via realm_index_updated NOTIFY#4862

Merged
lukemelia merged 5 commits into
mainfrom
cs-11119-audit-phase-1-inflightsearch-for-multi-replica-staleness
May 18, 2026
Merged

CS-11119: cross-replica clearInFlightSearch via realm_index_updated NOTIFY#4862
lukemelia merged 5 commits into
mainfrom
cs-11119-audit-phase-1-inflightsearch-for-multi-replica-staleness

Conversation

@lukemelia
Copy link
Copy Markdown
Contributor

@lukemelia lukemelia commented May 18, 2026

Summary

Phase 1 of CS-11115 added #inFlightSearch on RealmIndexQueryEngine (a Map<string, Promise<LinkableCollectionDocument>>) to dedupe concurrent same-key searchCards calls per replica. Local clears are wired to every same-replica index-update event (Realm.update's onInvalidation + Realm.fullReindex after await completed). The cross-replica gap: when a peer's worker commits a swap to the shared boxel_index, this replica's #inFlightSearch keeps coalescing post-update callers into pre-update pending promises until each self-resolves — bounded window, typically <1 s but 5–10 s for piranha-class deep populateQueryFields walks.

This PR closes the window with a new per-realm post-update NOTIFY channel, and folds the existing #cachedRealmInfo invalidation that already cohabited the same lifecycle point into the same umbrella.

What's in

  • REALM_INDEX_UPDATED_CHANNEL = 'realm_index_updated' (constant in runtime-common/realm.ts), payload = realm URL.
  • notifyRealmIndexUpdated(dbAdapter, realmURL) free function, parallel in shape to CS-11156's notifyAllFileChanges. Same best-effort semantics — a missed NOTIFY is a bounded staleness window, not data corruption.
  • Realm.clearRealmIndexCaches() new public method that drops every read-side cache deriving from the realm's index — currently #inFlightSearch + #cachedRealmInfo. For the listener; mirrors Realm.invalidateCache as a public NOTIFY-receiver entry point.
  • Realm.clearRealmIndexCachesAndBroadcast() that bundles local clear + cross-replica NOTIFY in one call. Same shape as clearLocalSourceCachesAndBroadcast from CS-11156.
  • Three index-update sites updated in realm.ts (sync Realm.update onInvalidation, deferred-enqueue onInvalidation, Realm.fullReindex post-completed) now call clearRealmIndexCachesAndBroadcast() instead of reaching into the private query engine. The redundant sibling invalidateCachedRealmInfo() at the fullReindex completion path is dropped — now covered by the umbrella.
  • RealmIndexUpdatedListener in packages/realm-server/lib/, parallel in shape to RealmFileChangesListener. On NOTIFY, looks up the mounted realm and calls clearRealmIndexCaches(). Wired into main.ts startup + shutDown alongside the file-changes listener.

Umbrella scope

clearRealmIndexCaches covers the read-side caches that go stale when boxel_index moves:

  • #inFlightSearch — the searchCards coalesce map (CS-11115 Phase 1). Was the original audit target.
  • #cachedRealmInfo + its ETag hash — derived from indexed metadata and realm_registry rows. A from-scratch reindex pass that re-parses /realm.json invalidates it; was already paired with the in-flight clear at the fullReindex completion site. Folding into the umbrella picks up free cross-replica invalidation on peer replicas, closing a small parallel gap.

The other invalidateCachedRealmInfo() callsites (pre-fullIndex setup, the /realm.json route handler, publish/unpublish cross-realm metadata effects) stay direct — they aren't index-update events.

Naming

The channel is named "updated" rather than "swapped": "swap" is internal jargon for the boxel_index_workingboxel_index atomic moment, while "updated" matches the dominant public terminology (Realm.update(), RealmIndexUpdater) and the past-participle pattern of sibling channels (jobs_finished, module_cache_invalidated). The methods use "realm index caches" as the umbrella term mirroring CS-11156's "source caches" framing.

Why a new channel, not an existing one

Channel Why not
realm_file_changes Fires at file-WRITE time (before indexing). At that moment the shared boxel_index hasn't been updated yet — clearing #inFlightSearch then would make new callers re-do the same search against unchanged DB state. Wrong layer + wrong time.
jobs_finished Payload-less, fires for every worker job system-wide. Forcing every replica to scan every mounted realm's #inFlightSearch on every job completion is a sledgehammer.
realm_index_updated (new) Per-realm payload, fires only at index-update commit. ~Once per indexing batch per realm. Targeted.

Composition with CS-11156

CS-11156 keeps its realm_file_changes:* wildcard for byte-cache invalidation at publish/unpublish/delete (write-event triggered). This PR adds the orthogonal update-event channel. The two together close cross-replica staleness across both byte caches and the read-side index caches.

Tests

packages/realm-server/tests/realm-index-updated-listener-test.ts — 6 new tests, all pass locally:

  • 4 dispatch (handleNotification to mounted realm, dropping unmounted, undefined/empty payload guards)
  • 2 LISTEN end-to-end through a real PgAdapter.subscribe round-trip (mounted realm clears; unmounted realm is dropped)

Test plan

  • New realm-index-updated-listener-test.ts (6/6 pass locally)
  • tsc on runtime-common + realm-server — clean
  • Prettier clean on touched files
  • CI realm-server suite

Linear

CS-11119.

🤖 Generated with Claude Code

lukemelia and others added 2 commits May 18, 2026 14:03
…OTIFY

Phase 1 of CS-11115 added `#inFlightSearch` on `RealmIndexQueryEngine` to
dedupe concurrent same-key `searchCards` calls. The local clear is wired
to every same-replica swap event (`Realm.update` onInvalidation +
`Realm.fullReindex` after `await completed`). Cross-replica writes don't
reach those callbacks: a peer's worker commits a swap to the shared
`boxel_index`, but this replica's `#inFlightSearch` keeps coalescing
post-swap callers into pre-swap pending promises until each promise
self-resolves. Bounded window — typically <1 s, but 5–10 s for piranha-
class deep `populateQueryFields` walks.

Close the window with a new per-realm post-swap NOTIFY channel:

- `REALM_INDEX_SWAPPED_CHANNEL = 'realm_index_swapped'`, payload is the
  realm URL.
- `notifyRealmIndexSwapped(dbAdapter, realmURL)` free function, parallel
  in shape to CS-11156's `notifyAllFileChanges`. Best-effort, same
  bounded-staleness semantics as the other realm-server NOTIFY channels.
- New `Realm.clearInFlightSearches()` public method (for the listener)
  + `Realm.clearInFlightSearchesAndBroadcast()` that bundles local clear
  with the cross-replica NOTIFY. Same shape as
  `clearLocalSourceCachesAndBroadcast`.
- Three existing swap sites in `realm.ts` (sync `Realm.update`
  onInvalidation, deferred-enqueue onInvalidation, and `Realm.fullReindex`
  post-`completed`) now call `clearInFlightSearchesAndBroadcast()`
  instead of reaching into the private query engine.
- New `RealmIndexSwappedListener` in `packages/realm-server/lib/`,
  parallel in shape to `RealmFileChangesListener`. On NOTIFY, looks up
  the mounted realm and calls `clearInFlightSearches()`. Wired into
  `main.ts` startup + shutDown alongside the file-changes listener.

Why a new channel, not `realm_file_changes` or `jobs_finished`:

- `realm_file_changes` fires at file-WRITE time (before indexing). At
  that moment the shared `boxel_index` hasn't swapped yet — clearing
  `#inFlightSearch` then would make new callers re-do the same search
  against unchanged DB state. We need a swap-time signal, not a write-
  time one. Mixing layers would re-introduce that bug.
- `jobs_finished` is payload-less and fires for every worker job
  system-wide. Forcing every replica to scan every mounted realm's
  `#inFlightSearch` on every job completion is a sledgehammer. The new
  channel fires once per swap per realm.

Composes orthogonally with CS-11156: that PR keeps the wildcard `*` on
`realm_file_changes` for byte-cache invalidation at publish/unpublish/
delete (write-event). This PR adds the swap-event channel. Two
together close cross-replica staleness across byte caches and the
search-coalesce map.

Tests: 6 new (4 dispatch, 2 LISTEN end-to-end through a real
`PgAdapter.subscribe` round-trip). All pass locally.

Linear: https://linear.app/cardstack/issue/CS-11119/audit-phase-1-inflightsearch-for-multi-replica-staleness

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The channel name should not leak the boxel_index_working → boxel_index
swap mechanism; "update" matches the public `Realm.update()` method name
and the past-participle pattern of sibling channels (`jobs_finished`,
`module_cache_invalidated`). Renames the constant, free function,
listener class, file names, and channel string. Comments that
specifically describe the implementation detail (e.g. "boxel_index has
just swapped") keep "swap" as appropriate jargon.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lukemelia lukemelia changed the title CS-11119: cross-replica clearInFlightSearch via realm_index_swapped NOTIFY CS-11119: cross-replica clearInFlightSearch via realm_index_updated NOTIFY May 18, 2026
lukemelia and others added 2 commits May 18, 2026 14:14
The local/broadcast pair is now named for the umbrella concept ("realm
index caches") rather than the single specific cache currently under
that umbrella (#inFlightSearch). Matches the framing of the
realm_index_updated channel and mirrors CS-11156's
clearLocalSourceCachesAndBroadcast naming, where "source caches"
groups #sourceCache + the transpiled-module cache under one method.
Forward-compatible if future caches join the post-update invalidation
group; comments still document that #inFlightSearch is what's cleared
today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The post-fullReindex completion path already paired the in-flight
search clear with `invalidateCachedRealmInfo()`. Move that into the
umbrella method so the two read-side caches (#inFlightSearch +
#cachedRealmInfo) clear together at every realm-index-update event,
and peer replicas receiving the realm_index_updated NOTIFY pick up
the cachedRealmInfo invalidation for free — closing a real gap where
a from-scratch reindex that re-parsed /realm.json on the publishing
replica left peer replicas serving stale ETags.

The other invalidateCachedRealmInfo callsites (pre-fullIndex setup,
the route handler for /realm.json edits, the publish/unpublish
cross-realm metadata effects) stay direct — they aren't index-update
events. The doc on clearRealmIndexCaches now names both members of
the umbrella.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Postgres NOTIFY channel realm_index_updated so that when one replica's worker commits an index swap to the shared boxel_index, peer replicas drop their RealmIndexQueryEngine#inFlightSearch (and #cachedRealmInfo) caches. This closes a small cross-replica staleness window where post-update callers on a peer would coalesce into pre-update pending search promises.

Changes:

  • Add REALM_INDEX_UPDATED_CHANNEL constant + notifyRealmIndexUpdated() helper and new Realm.clearRealmIndexCaches() / clearRealmIndexCachesAndBroadcast() methods; route the three local post-update sites (sync/deferred Realm.update onInvalidation and Realm.fullReindex) through the umbrella.
  • New RealmIndexUpdatedListener in packages/realm-server/lib/, parallel in shape to RealmFileChangesListener, wired into main.ts startup and shutdown.
  • Add realm-index-updated-listener-test.ts (4 dispatch + 2 LISTEN end-to-end tests) and register it in the test index.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
packages/runtime-common/realm.ts Define new NOTIFY channel + helper; add clearRealmIndexCaches/...AndBroadcast; route the three post-update sites through the umbrella and drop the redundant invalidateCachedRealmInfo at fullReindex completion.
packages/realm-server/lib/realm-index-updated-listener.ts New listener subscribing to realm_index_updated, dispatching to realm.clearRealmIndexCaches() for mounted realms.
packages/realm-server/main.ts Instantiate RealmIndexUpdatedListener at startup and include it in shutdown sequence.
packages/realm-server/tests/realm-index-updated-listener-test.ts Unit + LISTEN end-to-end tests for dispatch, unmounted-realm drop, and empty/undefined payload guards.
packages/realm-server/tests/index.ts Register the new test module in the runner.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

Host Test Results

    1 files  ±    0      1 suites  ±0   1h 45m 42s ⏱️ - 1h 42m 22s
2 661 tests ±    0  2 646 ✅ ±    0  15 💤 ± 0  0 ❌ ±0 
2 680 runs   - 2 586  2 665 ✅  - 2 571  15 💤  - 15  0 ❌ ±0 

Results for commit 8328ba5. ± Comparison against earlier commit 9359cff.

Realm Server Test Results

    1 files  ±0      1 suites  ±0   7m 51s ⏱️ -9s
1 405 tests +5  1 405 ✅ +5  0 💤 ±0  0 ❌ ±0 
1 492 runs  +5  1 492 ✅ +5  0 💤 ±0  0 ❌ ±0 

Results for commit 8328ba5. ± Comparison against earlier commit 9359cff.

…1-inflightsearch-for-multi-replica-staleness

# Conflicts:
#	packages/runtime-common/realm.ts
@lukemelia lukemelia merged commit f71047b into main May 18, 2026
67 of 68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants