Skip to content

Indexer: pre-warm every realm .gts/.gjs module, not just per-row deps#4864

Merged
habdelra merged 5 commits into
prerender-deadlock-pagepool-priorityfrom
prerender-deadlock-pre-warm-realm-modules
May 19, 2026
Merged

Indexer: pre-warm every realm .gts/.gjs module, not just per-row deps#4864
habdelra merged 5 commits into
prerender-deadlock-pagepool-priorityfrom
prerender-deadlock-pre-warm-realm-modules

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

@habdelra habdelra commented May 18, 2026

Stacked on top of #4863. Merge that first (or retarget this PR's base to main after #4863 merges).

Why

The indexer's pre-warm phase seeds the modules cache before the visit loop runs, so card renders don't have to fire a sub-prerenderModule mid-render for a missing definition. The existing pre-warm walks each invalidation's boxel_index.deps array and pre-warms every executable URL it finds — i.e. import-graph dependencies. That misses modules referenced by string in card templates.

Pattern that triggered every staging failure on ctse/ambitious-piranha:

// dashboard.gts — imports only `card-api`, references cohort.gts by STRING
get cohortRef() { return { module: this.moduleUrl('cohort'), name: 'Cohort' }; }
// then in template:
<Search @query={{filter: {type: this.cohortRef}}} @realms={{this.realmHrefs}} />

cohort.gts is a parameter to a search filter; it's never an import. So:

  • cohort.gts is not in dashboard.gts's runtime module graph.
  • dashboard.json's deps row in boxel_index does not list cohort.gts.
  • Pre-warm walks the dashboard's deps and skips cohort.gts.
  • When the dashboard renders, <Search> fires _federated-search with filter.type.module = '.../cohort.gts' → cache miss → spawns a same-affinity prerenderModule for cohort.gts mid-card-render → self-referential prerender deadlock.

Verified empirically against the staging index — dashboard.json on ctse/ambitious-piranha has exactly two Tracking-realm entries in deps (.../Tracking/dashboard plus a .glimmer-scoped.css artifact), and none of the sibling card modules every dashboard render needs.

What this PR changes

Adds a realm-wide pre-warm layer that runs in addition to the existing per-row deps walk.

Source of the realm-wide list — filesystem mtimes filtered by extension.

  • From-scratch path: discoverInvalidations already calls reader.mtimes() and returns the result. Filter to .gts/.gjs and pass into preWarmModulesTable as a new allRealmCardModules argument. Zero added cost.
  • Incremental path: reader.mtimes() hadn't been called yet — add one call before pre-warm. ~< 200 ms on realistic realm sizes, one call per job, dominated by the indexing work that follows.

Why .gts / .gjs only — optimization, not correctness gate. .ts / .js files CAN host CardDef (e.g. command-input cards), but they almost never do in practice. If pre-warm misses such a module, the on-demand lookupDefinition cache read-through during the visit fires a prerenderModule for it — safe because the PagePool now materializes a tab for non-file callers instead of queueing them behind the render that triggered the lookup (the change in the parent branch). Restricting the sweep to the extensions where cards live almost exclusively avoids paying the prerender cost on every reindex for files that don't define cards (typical realms have many helper .ts files alongside their card .gts).

No-card markers in the cache. A pre-warmed module that turns out not to define any cards (rare among .gts; e.g. a .gts Glimmer template helper with no CardDef export) gets persisted to the modules table with definitions: {}. CachingDefinitionLookup.getCachedDefinitions treats that as a valid cache hit on subsequent lookups — so the wasted prerender is a one-shot, not a per-call cost.

Adds hasCardExtension / cardExtensions to runtime-common/index.ts, parallel in shape to the existing hasExecutableExtension / executableExtensions.

Stays serial. Pre-warm loop body unchanged from the existing serial-await shape; the only behavioral change is which modules end up in toWarm. Bounded-parallel pre-warm is a separate follow-up after we've measured serial.

Bench results — local cold full-reindex

Run against a freshly-restarted realm-server with clearLastModified: true so the from-scratch path walks the full filesystem mtime sweep.

realm wall time errored rows deadlock-fingerprint rows
user/ambitious-piranha 665 s (~11 min) 0 0
user/prudent-octopus 145 s (~2.4 min) 0 0

Baseline before this branch on the same workload:

  • ambitious-piranha was rejected by the manager-side 150 s /prerender-visit abort, or completed at ~36 min with multiple errored rows whenever it didn't.
  • Errored rows carried the documented self-referential prerender deadlock fingerprint: renderStage = waiting-stability, queryLoadsInFlight with source: search-resource:*, and affinitySnapshot.sameAffinityActivity entries of {queue: 'module', state: 'queued'} on the same affinity.

Spot-checked timing_diagnostics on the heaviest rows post-fix (dashboard cards on both realms): no {state: queued} entries in sameAffinityActivity, confirming the deadlock path no longer fires.

Files

  • packages/runtime-common/index.ts — new hasCardExtension / cardExtensions exports.
  • packages/runtime-common/index-runner.tspreWarmModulesTable takes an allRealmCardModules arg; the from-scratch and incremental call sites compute the realm-wide list and pass it in.

Tests

The cache-contract this PR relies on — definitions: {} persists as a no-card marker, subsequent lookups short-circuit at the cache — is pinned by 'getCachedDefinitions caches non-card modules as no-card markers' in the parent branch's definition-lookup-test.ts.

A focused integration test for the realm-wide sweep (intercept getCachedDefinitions, assert it's invoked for every .gts/.gjs in the realm including sibling-referenced ones) is worth adding as a follow-up — the bench-time validation is the immediate gate; CI catches regressions on the existing deps-driven layer.

Test plan

  • CI passes
  • Local cold full-reindex of user/prudent-octopus lands with zero errored rows
  • Local cold full-reindex of user/ambitious-piranha lands with zero errored rows
  • Spot-check timing_diagnostics on the heaviest rows post-fix — no {state: queued} entries in sameAffinityActivity

🤖 Generated with Claude Code

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fbac4ddd3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/runtime-common/index-runner.ts
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the indexer’s module-cache pre-warm phase so it seeds the modules table for all card-template modules in a realm (via a realm-wide .gts/.gjs sweep), not only modules discovered via per-row deps. This reduces on-demand prerenderModule sub-renders during indexing, avoiding same-affinity deadlock conditions when _federated-search needs card definitions referenced by string in templates.

Changes:

  • Add cardExtensions and hasCardExtension() helper exports alongside the existing executable-extension helpers.
  • Update IndexRunner to compute allRealmCardModules from reader.mtimes() (reusing discoverInvalidations mtimes for from-scratch; adding a new mtimes call for incremental).
  • Extend preWarmModulesTable to seed its warming set from the realm-wide card-module list, then layer existing per-row deps warming on top.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
packages/runtime-common/index.ts Introduces .gts/.gjs “card extension” constants and a hasCardExtension() helper for realm-wide filtering.
packages/runtime-common/index-runner.ts Threads a realm-wide card-module list into preWarmModulesTable and populates it from filesystem mtimes for both from-scratch and incremental jobs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/runtime-common/index.ts Outdated
Comment thread packages/runtime-common/index-runner.ts
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

Host Test Results

    1 files  ±    0      1 suites  ±0   3h 22m 26s ⏱️ + 1h 35m 42s
2 661 tests ±    0  2 646 ✅ ±    0  15 💤 ± 0  0 ❌ ±0 
5 283 runs  +2 603  5 254 ✅ +2 589  29 💤 +14  0 ❌ ±0 

Results for commit a89c949. ± Comparison against earlier commit 8bff668.

Realm Server Test Results

    1 files  ±0      1 suites  ±0   8m 44s ⏱️ +5s
1 405 tests ±0  1 405 ✅ ±0  0 💤 ±0  0 ❌ ±0 
1 489 runs  ±0  1 489 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit a89c949. ± Comparison against earlier commit 8bff668.

`preWarmModulesTable` previously walked each invalidation's
`boxel_index.deps` and added any executable URLs it found. That misses
modules referenced by *string* in card templates — a typical pattern in
dashboard cards:

  <Search @query={{filter: {type: {module: '.../cohort.gts',
                                   name: 'Cohort'}}}}>

`cohort.gts` is referenced as a string parameter to the search filter,
never imported into the dashboard's runtime module graph. So it never
appears in the dashboard row's `deps`, never gets pre-warmed, and when
the dashboard renders during indexing the `_federated-search` call
fires a same-affinity `prerenderModule` for `cohort.gts` mid-card-
render. That sub-prerender queues behind the dashboard's tab — the
self-referential prerender deadlock.

Add a realm-wide sweep on top of the existing per-row deps walk. Source
is the filesystem-mtimes map: from-scratch already pays for the walk
via `discoverInvalidations`, so we just filter and reuse it; incremental
adds a fresh `reader.mtimes()` call before pre-warm (typical ~200 ms on
a 500-file realm, one call per job, completely dominated by the
indexing work that follows). Filter to `.gts` + `.gjs` only — cards can
only live in template-bearing modules, so `.ts`/`.js` helpers are
correctly excluded from the realm-wide layer. Helpers that ARE needed
by a card already get pre-warmed through the existing per-row deps
walk; the realm-wide layer is purely additive on top.

Adds `hasCardExtension` / `cardExtensions` next to
`hasExecutableExtension` / `executableExtensions` in
`runtime-common/index.ts`, same shape — `.gts` and `.gjs` are the
extensions a `CardDef` / `FieldDef` can live in because they're the
ones with a Glimmer template surface.

The modules cache treats `definitions: {}` (an empty result from a
non-card module — e.g. a rare `.gts` helper) as a valid cache hit at
`definition-lookup.ts:561-563`. So the worst case is one wasted
prerender per non-card `.gts` per `clearRealmCache` cycle; subsequent
lookups skip the prerender entirely. That contract is pinned by
`getModuleCacheEntry caches non-card modules as no-card markers` in
PR A.

Cost analysis for ambitious-piranha (the staging realm where the
deadlock fired): 9 `.gts` files + 9 `.ts` files = 18 executables, but
only 9 `.gts` files in the realm-wide sweep (the `.ts` files are
`bxl/*` helpers). Pre-warm is still serial (concurrency comes in a
follow-up PR after baseline measurement). Roughly ~200 ms per file
× 9 files = ~1.8 s additional pre-warm cost per from-scratch reindex.

Stacked on top of PR #4863 (PagePool tab-materialization + priority
plumbing). The two changes attack the same bug from different angles:
PR #4863 makes sure the deadlock cannot happen when a sub-prerender
DOES fire; this PR makes sure it almost never has to fire in the first
place. PR #4863 is the load-bearing fix; this is the preventive layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@habdelra habdelra force-pushed the prerender-deadlock-pre-warm-realm-modules branch from 4fbac4d to 95e988e Compare May 18, 2026 19:41
habdelra and others added 3 commits May 18, 2026 15:52
Drops the `hasCardExtension` helper added in the previous commit.
Copilot's review caught a real case I'd missed:
`packages/catalog-realm/commands/collect-submission-files.ts` defines
`class CollectSubmissionFilesInput extends CardDef` — a card-defining
`.ts` file, no Glimmer template. The `.gts`/`.gjs` extension filter
would have skipped it on every realm-wide pre-warm sweep, and any
`_federated-search` filter referencing it as a `type:` would have
fired a same-affinity `prerenderModule` mid-card-render — the exact
deadlock pattern this PR aims to prevent.

Fix: realm-wide sweep now uses the existing `hasExecutableExtension`
(`.gts` / `.gjs` / `.ts` / `.js`, excluding `.d.ts`). Non-card modules
pre-warmed by the sweep persist to the modules table as empty-
definitions rows (no-card markers — `definition-lookup.ts:561-563`),
so subsequent `getCachedDefinitions` / `lookupDefinition` calls
short-circuit at the cache without re-firing the prerenderer. That
contract is pinned by the `getCachedDefinitions caches non-card
modules as no-card markers` test in PR #4863.

Variable renamed `allRealmCardModules` → `allRealmExecutables` and
comments updated. The `hasCardExtension` helper + `cardExtensions`
constant are removed entirely.

Cost analysis update: ambitious-piranha goes from 9 candidates (just
the `.gts` files) to 18 (9 `.gts` + 9 `.ts` helpers in `bxl/`). At
~200 ms per prerender × 18 modules with serial pre-warm = ~3.6 s
additional cost on top of the from-scratch reindex. After the first
sweep, the modules table caches the helper rows; subsequent reindexes
only pay for modules that actually have card defs. Bounded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reverts the previous "broaden to all executables" commit. The case
that broadening fixed — `.ts` files that host CardDef (rare; the only
example in the repo today is command-input cards) — is handled
correctly by the on-demand `lookupDefinition` read-through during the
visit. The PagePool's new tab-materialization for module/command
callers makes that on-demand path safe (the sub-prerender gets its
own tab instead of queueing behind the render that triggered it), so
the cost trade favors the narrow sweep:

- Realm-wide pre-warm of `.gts` / `.gjs` only: ~9 prerenders × ~200 ms
  on a typical card-heavy realm.
- Broader sweep including `.ts` / `.js` helpers: ~18+ prerenders, most
  of them producing no-card-marker rows for files that almost never
  define a card.

This is an optimization on every reindex, not a correctness change.
Comments updated to call out the trade-off honestly: `.ts` / `.js`
CAN host card defs, the narrow filter exists because the on-demand
path is now safe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@habdelra habdelra requested a review from a team May 18, 2026 21:24
@habdelra habdelra merged commit de0548f into prerender-deadlock-pagepool-priority May 19, 2026
94 of 96 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants