CS-11123 Phase 1: Pre-warm modules table before indexing (serial)#4799
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new “modules table pre-warm” step in the indexing pipeline to proactively populate module-cache rows for modules that upcoming renders are likely to need, aiming to prevent nested-prerender deadlocks during indexing (especially those triggered by cold lookupDefinition paths).
Changes:
- Add a serial pre-warm pass (
preWarmModulesTable) before the main visit loop in bothfromScratchandincrementalindexing. - Derive pre-warm candidates from existing dependency rows, plus
adoptsFrom.modulefor novel JSON cards. - Add debug/warn logging for pre-warm duration and failures.
Comments suppressed due to low confidence (1)
packages/runtime-common/index-runner.ts:608
- This
adoptsFromModulepath is also gated onhasExecutableExtension(...), butadoptsFrom.modulevalues can be extensionless (the runtime normalizes modules by stripping executable extensions, andDefinitionLookup.getModuleCacheEntry()already supports extensionless URLs via its candidate-extension fallback logic). Consider addingadoptsFromModuletotoWarmeven when it’s extensionless (or normalizing consistently), otherwise novel.jsoncards may not actually pre-warm their base module.
let adoptsFromModule = await this.#readAdoptsFromModuleFromDisk(url);
if (adoptsFromModule && hasExecutableExtension(adoptsFromModule)) {
toWarm.add(adoptsFromModule);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
004c9b8 to
bdd34dc
Compare
Host Test Results 1 files ± 0 1 suites ±0 2h 6m 0s ⏱️ - 1h 26m 56s Results for commit c85633d. ± Comparison against earlier commit bdd34dc. Realm Server Test Results 1 files ±0 1 suites ±0 11m 57s ⏱️ -45s Results for commit c85633d. ± Comparison against earlier commit bdd34dc. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bdd34dca64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Walk the invalidation set before the file-visit loop fires and ensure every module URL the renders are likely to need has a populated `modules` row. With the table primed, `lookupDefinition` calls fired from inside renders (via `populateQueryFields` → `_federated-search`) hit a populated DB row instead of spawning a nested prerender — clearing the affinity-scoped deadlock that PR #4777 papered over with `cacheOnlyDefinitions:true`. Signal sources for the URL set, in priority order: 1. Existing `boxel_index.deps` — the runtime-captured dep list from the URL's prior successful render. Used for any URL with a row in `boxel_index_working` ∪ `boxel_index`. 2. `adoptsFrom.module` read from disk — used for novel `.json` URLs without a prior `deps` row. 3. The URL itself — used for novel executable files; the file IS a module, pre-warm it directly. Reads go through `CachingDefinitionLookup.getModuleCacheEntry`, which is the URL-only entry into the `loadModuleCacheEntry` pipeline that `lookupDefinition` itself uses. On a DB hit the call returns immediately; on a miss it walks through the in-flight dedup map, the cross-process coalescer (CS-10953), the prerender, and the post- prerender persist — so two callers asking for the same URL share one prerender, and a pre-warm racing an invalidation never overwrites the new generation snapshot. Phase 1: serial. The pre-warm loop awaits each URL in turn. Validates that pre-warm as a concept clears the deadlock in isolation from any concurrency-driven variability. Phase 2 will bound-parallelize this loop. Failures in the pre-warm phase are warned but do not fail the batch. A mid-render sub-prerender will still fire on demand if a module fails to pre-warm — degrading to the pre-PR behavior for that specific module, not deadlocking the batch. Wired into both `fromScratch` and `incremental`, called once after `orderInvalidationsByDependencies` and before the visit loop opens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdd34dc to
c85633d
Compare
Summary
modulesrow.CachingDefinitionLookup.getModuleCacheEntry— same read-through pathlookupDefinitionuses, so DB hits return immediately and misses share the in-flight dedup + cross-process coalescer with regular lookups.Why this matters: A file render that fires
_federated-search→populateQueryFields→lookupDefinitionfor a cold definition spawns a nested prerender into the same affinity-scoped tab queue the original render is holding — deadlock.cacheOnlyDefinitions:truepapers over this; pre-warming kills it at the root. With the table primed,lookupDefinitionhits a populated DB row instead of spawning a sub-prerender.Signal sources for the URL set, in priority order:
boxel_index.deps— the runtime-captured dep list from the URL's prior successful render.adoptsFrom.moduleread viareader.readFile— for novel.jsonURLs without a priordepsrow.Failures in the pre-warm phase are warned but do not fail the batch. A mid-render sub-prerender will still fire on demand if pre-warm misses a module — degrading to the pre-PR behavior for that specific module, not deadlocking the batch.
Wired into both
fromScratchandincremental, called once afterorderInvalidationsByDependenciesand before the visit loop opens.Test plan
/prudent-octopusfrom-scratch reindex on top of this PR —wall_secondsno worse than the main baseline (~190 s prod-built)./ambitious/prianhafrom-scratch reindex on top of this PR — must complete (no rejection at the 150 s prerender abort on the heavyTracking/Cohort/*.jsonrender). Capturewall_seconds,instancesIndexed,instanceErrors, and the_federated-searchHTTP count.This is Phase 1 of a 3-PR stack:
cacheOnlyDefinitionsworkaround now that pre-warm covers the same ground.🤖 Generated with Claude Code
Bench results (vite-dev rig, 2026-05-12)
Protocol:
mise run kill-all→TRUNCATE job_reservations, jobs, job_progress→mise run dev-all→ boot drain (unfulfilled=0 active=0) → readiness-check on each realm → mount drain →time curl /_grafana-reindex?realm=user/<realm>. Same-rig as the priormainbaseline (commit0119bac76d) captured in the Linear ticket comment.Phase 1 — current SHA (
c85633d237), post the extensionless-adoptsFromfixuser/prudent-octopusmain 0119bac76duser/prudent-octopusc85633d237user/ambitious-piranhamain 0119bac76duser/ambitious-piranhac85633d237Octopus
Phase 1 trims 150 s (~24%) off the main baseline — pre-warm front-loads the modules a cohort render needs so the visit phase doesn't have to spawn nested prerenders inside
populateQueryFields. 91 instances / 0 errors matches main exactly; the extensionless-adoptsFromfix on the latest push resolved the earlier 90/1-error result (adoptsFrom.modulewas being filtered out for the common relative-extensionless form like"../author", so the cohort instance referencing it never got its module pre-warmed and errored at index time). fed-search count goes 120 → 176 (+47%) — consistent with more renders completing the livepopulateQueryFieldspath rather than degrading.Piranha
Still rejects with the 150 s prerender-server abort during the first cohort render — same outcome as
main, which also rejects piranha. The bar is piranha completes, and pre-warm alone does not clear it (only 18 fed-search hits before the 150 s single-render budget is exhausted on the heavyTracking/Cohort/*.jsonrender). Comparing wall-clock-to-rejection (195 s vs 652 s) is noise — both runs fail the bar, so time-to-failure is not meaningful.Pre-warm Phase 1 alone is insufficient to clear piranha. The next dependent ticket (CS-11117 — width-aware parallel visit loop) plus other cohort-render optimizations will be needed. Phase 2 (bounded-parallel pre-warm) and Phase 3 (unwind cacheOnlyDefinitions) bench results will be appended as they come in.
Pass-bar status for Phase 1
main, so pre-warm alone is not sufficient. Tracking as a follow-up signal for the broader pre-warm + parallel-visit-loop combination.