Skip to content

Fix flaky boxel-cli push --delete: guard #sourceCache against set-after-invalidate#5123

Merged
habdelra merged 1 commit into
mainfrom
worktree-fix-source-cache-set-after-invalidate-flake
Jun 5, 2026
Merged

Fix flaky boxel-cli push --delete: guard #sourceCache against set-after-invalidate#5123
habdelra merged 1 commit into
mainfrom
worktree-fix-source-cache-set-after-invalidate-flake

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

@habdelra habdelra commented Jun 5, 2026

The boxel-cli push --delete integration test (removes remote-only files) fails intermittently in CI: after the delete-push returns 204 and the file is gone from disk, the test's immediate source GET for the removed file returns 200 instead of 404.

Root cause

The realm server's getSourceOrRedirect serves .gts / .json source from an in-memory #sourceCache. It reads the bytes from disk under an await (getFileWithFallbacks + materializeFileRef) and only then calls #sourceCache.set. If invalidateCache(path) fires inside that window it clears the slot synchronously — but the in-flight read's set re-populates it afterward with the now-stale bytes. #sourceCache had no guard against this, even though #transpiledModuleCache was already hardened against the identical set-after-invalidate race.

The failing run's realm-server log shows the exact interleave: the prior push's waitForIndex:false indexing job for the same file was still in flight (its source fetch had read the bytes) when the delete-push's DELETE landed. The DELETE removed the file and invalidated the cache, then the indexing fetch's set re-cached the just-deleted bytes — and the next source GET came back a 200 cache hit (dur=2ms) for a file already gone from disk.

Fix

Give #sourceCache the same per-path + global generation guard #transpiledModuleCache already uses:

  • getSourceOrRedirect snapshots the source-cache generation before its first await and drops its set when the resolved path's generation moved during the read (keyed on the canonical path, so extensionless-alias requests are covered too).
  • invalidateCache, the bypass-cache early invalidate, and the bulk clears bump the generation.
  • Every dropped stale set is logged, so if the race ever recurs through a path the snapshot didn't cover, the next occurrence still leaves a signal.

The in-flight reader still serves the bytes it read at request time (consistent with its happens-before ordering); only the cache write is discarded.

Tests

A new deterministic module in module-cache-race-test.ts uses a test-only gate that parks a source read at the post-read / pre-set point and races concurrent work against it:

  • invalidate during the read → next GET is a cache miss (set discarded)
  • delete during the read → next GET is 404, not a stale 200 — the realm-server-level reproduction of the flake
  • unrelated-path invalidate → the set survives (cache hit; per-path scoping intact)
  • __testOnlyClearCaches during the read → cache miss (the global generation catches what the path counter alone would miss)

eslint + prettier clean and the type declarations build; the realm-server test shard exercises the new tests in CI.

🤖 Generated with Claude Code

getSourceOrRedirect's #sourceCache had no protection against a write
landing after a concurrent invalidate, unlike #transpiledModuleCache. A
source read that read bytes from disk and then had its path invalidated
mid-flight — e.g. a DELETE removing the file while a worker's indexing
fetch of the same source was still in flight — would re-populate the
cache with the now-deleted bytes. A subsequent GET then served a 200 for
a file already gone from disk, which is the flaky boxel-cli
`push --delete` integration failure (the post-delete source GET returned
a stale 200 cache hit instead of 404).

Mirror the per-path + global generation guard #transpiledModuleCache
already uses onto #sourceCache: snapshot the generation before the first
await in getSourceOrRedirect and drop the set when the path's generation
moved during the read; invalidate and the bulk clears bump the
generation. Log every dropped stale set so a future recurrence through a
path the snapshot didn't cover still leaves a signal. Add a deterministic
regression test that parks a source read at the post-read/pre-set point
via a test-only gate and races invalidate / delete / clear against it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an intermittent stale-read race in the realm server’s source-serving path by hardening Realm.#sourceCache against “set-after-invalidate” when a source read is in-flight during a concurrent invalidation (notably DELETE). The change mirrors the existing generation-guard strategy already used for #transpiledModuleCache, preventing deleted or outdated bytes from being re-cached and subsequently served as a fast cache hit.

Changes:

  • Add per-path and global generation counters for #sourceCache, and drop cache writes when an invalidate/clear occurs during an in-flight read.
  • Route all single-path and bulk source-cache invalidations through generation-bumping helpers.
  • Add deterministic realm-server regression tests using a test-only gate to park a request between “read bytes” and “cache set”, reproducing the race without timing flakiness.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
packages/runtime-common/realm.ts Adds generation guards + helper invalidation methods for #sourceCache, and a test-only delay hook to deterministically reproduce the race window.
packages/realm-server/tests/module-cache-race-test.ts Adds a new test module that deterministically verifies #sourceCache does not re-populate with stale bytes after concurrent invalidate/delete/clear.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Host Test Results

    1 files      1 suites   1h 52m 20s ⏱️
2 936 tests 2 921 ✅ 15 💤 0 ❌
2 955 runs  2 940 ✅ 15 💤 0 ❌

Results for commit eb7d9a5.

Realm Server Test Results

    1 files      1 suites   11m 46s ⏱️
1 563 tests 1 562 ✅ 1 💤 0 ❌
1 654 runs  1 653 ✅ 1 💤 0 ❌

Results for commit eb7d9a5.

@habdelra habdelra requested a review from a team June 5, 2026 02:20
@habdelra habdelra merged commit fe89b87 into main Jun 5, 2026
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants