RTC: fix stale local snapshots overwriting object+query map operations#77887
Open
danluu wants to merge 3 commits intoWordPress:trunkfrom
Open
RTC: fix stale local snapshots overwriting object+query map operations#77887danluu wants to merge 3 commits intoWordPress:trunkfrom
danluu wants to merge 3 commits intoWordPress:trunkfrom
Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
17f5c91 to
9c5dba1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is part of an AI fuzzing project, where an AI wrote a fuzzer and then triages bugs from the fuzzer and creates fixes. See #77716 for the tracking issue. As of this writing, there have been no known false positives from this project, but there have been some issues, which are documented in #77716. I expect we’ll see false positives at some point (and may even have one that’s been filed in a PR that hasn’t been inspected by a code owner yet).
What?
Here's a video demonstrating a repro for this issue (specifically, italicizing some text causes HTML to get corrupted):
table-stale-snapshot-repro.mp4
Unlike a lot of PRs in the #77716 series of PRs, this makes a moderately large change to the code. For a lot of the tiny tactical fixes, I think there's a very strong case for putting the fix in even if the approach should change because real users are running into issues frequently (or, that's my experience when using RTC, anyway), but if you prefer a different approach here it's possible the fix should be something fairly different. Please note the "open questions" at the bottom of the AI description for the issue and PR here.
AI TEXT
Object-backed
queryattributes are stored as nestedY.Mapvalues, but thecurrent local merge path treats every incoming plain object snapshot as the full
desired value. If user A has a stale local editor snapshot of
attributes.hero, receives user B's remote update tohero.caption, and thenchanges only
hero.headline, the stale local snapshot writes the oldhero.captionback into the CRDT document. The same path resurrects a remotedelete when the stale object still contains the deleted key.
The focused example uses a custom block with this attribute schema:
Status against known fixes
Base tested for the issue branch:
5ddf4ad1b34bb0798185a663437955e7f10da01b(origin/trunk), which alreadycontains the merged large-update fix from
#77669 and the merged
follow-up listed in the fuzz tracking issue,
#77681.
I reviewed the tracking issue
#77716, its listed PRs,
comments, and related open RTC fixes. Most listed fixes are not on the
object+query map merge path: the title reload fix
#77666, awareness fixes
#77673 and
#77678, room storage work
#77675, and the test-only
cursor scope PR #77662.
I applied the focused repros to the relevant branches that do touch nearby RTC
merge code:
origin/trunkat5ddf4ad1b34danluu/try/offset-space-bug-pratcfcfe75228f(#77658)danluu/try/rtc-duplicate-table-rows-stock-repro-pr-trunkat2e153b32280(#77723)danluu/try/fuzz-known-issues-fixed-campaignat220392e83e9mergeCrdtBlocks()update and delete repros still fail with the stale caption restored. The post-adapter helper diverges on that campaign branch, so the merge-level result is the useful signal there.Current trunk and newer danluu branch revalidation
I re-fetched current refs on 2026-05-01:
git fetch origin trunk git fetch danluu '+refs/heads/*:refs/remotes/danluu/*'Current
origin/trunkwas68484244df2d. The proposed PR branchdanluu/try/stale-query-object-map-pris rebased on that trunk and has thisthree-commit structure:
7fa898389e0(origin/trunkplus repro tests, no fix)WP_ENV_PORT=8922 WP_BASE_URL=http://localhost:8922 WP_ARTIFACTS_PATH=/Users/danluu/dev/fuzz/gutenberg-stale-query-object-map-baseline-rebased-check/artifacts/repeat-baseline-post-rebase-8922 npm run test:e2e -- test/e2e/specs/editor/collaboration/collaboration-table-stale-snapshot.spec.ts --project=chromium --grep "stale HTML snapshot" --repeat-each=5 --retries=0failed 5/5 on user-visible table preservation assertions.17f5c915e93(danluu/try/stale-query-object-map-pr)WP_ENV_PORT=8921, with artifacts under/Users/danluu/dev/fuzz/gutenberg-stale-query-object-map-pr/artifacts/repeat-fix-post-rebase-8921, passed 5/5. The full file also passed 2/2.17f5c915e93(danluu/try/stale-query-object-map-pr)npm run lint:js -- packages/core-data/src/utils/crdt-blocks.ts packages/core-data/src/utils/test/crdt-object-query-stale-snapshot-repro.test.ts packages/core-data/src/utils/test/crdt-stale-query-array.test.ts packages/core-data/src/utils/test/crdt-stale-query-array-post.test.ts test/e2e/specs/editor/collaboration/collaboration-table-stale-snapshot.spec.tspassed.npm run test:unit -- packages/core-data/src/utils/test/crdt-object-query-stale-snapshot-repro.test.ts packages/core-data/src/utils/test/crdt-stale-query-array.test.ts packages/core-data/src/utils/test/crdt-stale-query-array-post.test.ts --runInBandpassed 31 tests.npm run build -- --skip-typespassed.danluu/try/stale-top-level-blocks-pratd4f43fbf6954danluu/try/stale-rich-text-sibling-prat1455411c8049danluu/try/stale-content-overwrite-prat54ff99db2227crdt-blocks.tsand post sync plumbing.danluu/try/form-content-overwrite-pratcd6822b89c95crdt-blocks.tsand post sync plumbing.danluu/try/rtc-duplicate-table-body-revision-loss-pratc7ef8332801bdanluu/try/rtc-table-stale-snapshot-prat876398df67b8WP_ENV_PORT=8923; it failed 3/3. Two repeats failed on visible data loss (A-new/B-newmissing or A's HTML edit missing), and one repeat hit an old-branch harness timeout after the textarea left HTML mode. Artifacts:/Users/danluu/dev/fuzz/issue2-table-candidate-e2e/artifacts/rtc-table-stale-snapshot-pr-html-repro-8923/.danluu/try/offset-space-bug-pratf136c427533bcrdt-blocks.ts.danluu/try/draft-reopens-blank-prata3e4dbc6e81adanluu/try/nav-menu-stale-save-prat82c7692768c4danluu/fix/rtc-autodraft-autosave-loss-pratb677576fbfe8danluu/try/rich-text-html-corruptionat0e3f72522e2danluu/try/stale-query-arrayat97b9e88e363The scratch minimal candidate command was:
The copied scratch test keeps only the two
mergeCrdtBlocks()object+queryupdate/delete histories to avoid unrelated old-branch post-adapter and editor
dependency drift. Its result summary and per-branch logs are under:
Conclusion: this issue is still active after the relevant known fixes I could
identify from #77716 and related open branches.
Reproductions
New focused repro file:
packages/core-data/src/utils/test/crdt-object-query-stale-snapshot-repro.test.ts.Focused merge-level repro:
npm run test:unit -- packages/core-data/src/utils/test/crdt-object-query-stale-snapshot-repro.test.ts --runInBand --testNamePattern="mergeCrdtBlocks preserves a remote sibling object property update"Expected: after user B changes
hero.captionand user A later changes onlyhero.headlinefrom a stale local snapshot, the final object is:Actual on
5ddf4ad1b34:captionis restored tocaption before.Focused delete repro:
npm run test:unit -- packages/core-data/src/utils/test/crdt-object-query-stale-snapshot-repro.test.ts --runInBand --testNamePattern="mergeCrdtBlocks preserves a remote sibling object property delete"Expected: after user B deletes
hero.caption, user A's later stale headlineedit must not bring it back.
Actual on
5ddf4ad1b34:caption beforeis resurrected.Post CRDT adapter repro:
npm run test:unit -- packages/core-data/src/utils/test/crdt-object-query-stale-snapshot-repro.test.ts --runInBand --testNamePattern="post CRDT adapter preserves remote object\\+query sibling changes"Expected:
applyPostChangesToCRDTDoc()andgetPostChangesFromCRDTDoc()preserve B's caption update after A's stale headline snapshot is applied.
Actual on
5ddf4ad1b34: the adapter returnscaption before.Run all lower-level repros:
Actual on
5ddf4ad1b34: all 3 tests fail.Playwright repro:
test/e2e/specs/editor/collaboration/collaboration-object-query-stale-snapshot.spec.ts.The test creates a draft post, opens it in two collaborative editor sessions,
registers a custom block with the object+query schema, inserts the block through
the block inserter, and edits visible textboxes with normal keyboard actions.
The custom block keeps a local draft after the user has edited the form, which
is a realistic block UI pattern and creates the stale local object snapshot
without direct Y.Doc mutation, fault injection, clock changes, or network hacks.
Expected: after B edits the caption and A edits the headline, both editors show:
Actual on
5ddf4ad1b34: A's final attributes are:I first tried a simpler Playwright flow where both editors edited the shared
fields directly without a local draft held by the block. That passed because the
block UI immediately consumed the remote prop update and did not retain a stale
object snapshot. The retained local draft variant is still natural for a block
form and exercises the real editor data path.
Stock Table browser repro evidence
The proposed fix branch also carries a browser-level repro using only the stock
core/tableblock and ordinary editor controls:test/e2e/specs/editor/collaboration/collaboration-table-stale-snapshot.spec.ts.This is the realistic browser repro used for PR-readiness because it does not
register a custom block, mutate a Y.Doc directly, mutate editor stores, pause
clocks or networks, or use fault injection.
Minimal user flow:
A1/B1andA2/B2.retaining a local HTML snapshot.
A-new/B-newinto the inserted row.A1in the stock HTML textarea toA1 local HTML edit, then switches the block back to visual mode.Expected final visible table on both editors:
Repeated baseline command, run from a detached baseline worktree at
7fa898389e0(trunkplus the repro tests, without the fix):WP_ENV_PORT=8922 WP_BASE_URL=http://localhost:8922 \ WP_ARTIFACTS_PATH=/Users/danluu/dev/fuzz/gutenberg-stale-query-object-map-baseline-rebased-check/artifacts/repeat-baseline-post-rebase-8922 \ npm run test:e2e -- test/e2e/specs/editor/collaboration/collaboration-table-stale-snapshot.spec.ts --project=chromium --grep "stale HTML snapshot" --repeat-each=5 --retries=0Result on the unfixed baseline: 5/5 repeats failed the user-visible preservation
assertion. In 3/5 repeats the remotely inserted
A-new/B-newrow disappearedand the final visible table was:
In 2/5 repeats the stale HTML handoff lost User A's local
A1edit whileretaining User B's row:
Both outcomes are visible editor data loss from the same normal stock Table
workflow; the first is the stale-snapshot overwrite symptom this fix targets,
and the second is the same workflow failing the preservation invariant in the
other direction. The repeated baseline artifacts are under:
Repeated fix-branch command, run from
/Users/danluu/dev/fuzz/gutenberg-stale-query-object-map-prat17f5c915e93:WP_ENV_PORT=8921 WP_BASE_URL=http://localhost:8921 \ WP_ARTIFACTS_PATH=/Users/danluu/dev/fuzz/gutenberg-stale-query-object-map-pr/artifacts/repeat-fix-post-rebase-8921 \ npm run test:e2e -- test/e2e/specs/editor/collaboration/collaboration-table-stale-snapshot.spec.ts --project=chromium --grep "stale HTML snapshot" --repeat-each=5 --retries=0Result on the fix branch: 5/5 repeats passed. The repeated fix artifacts are
under:
I also tried a keyboard-typing variant of the HTML textarea edit. It was less
useful as evidence because it sometimes failed before the final preservation
assertion when the textarea had already left HTML mode. The committed repro uses
Playwright's normal form-control
fill()action on the stock HTML textarea,which produced a clean 5/5 baseline-fails and 5/5 fix-passes split.
Verification commands that passed before writing this handoff:
Failure mechanism
mergeCrdtBlocks()iterates the incoming block snapshot and compares eachincoming attribute against the current value in the local Y.Doc. For nested Yjs
values, including object+query attributes stored as
Y.Map, it always delegatesto
mergeYValue()becausefastDeepEqual()cannot compare a Y type with a plainobject.
For
schema.type === 'object' && schema.query,mergeYValue()callsmergeYMapValues().mergeYMapValues()then:mergeYValue()forthat key;
Y.Mapthat is absent from the incomingobject.
That is correct only if the incoming object is causally the latest intended
state of the whole object. In the stale-snapshot history, it is not. The
incoming object is a local UI snapshot that only proves user A changed
hero.headline; it says nothing reliable abouthero.caption. Because themerge diffs against the current Y.Doc instead of against user A's previous local
snapshot, the old caption is misclassified as a local write, and missing keys
are misclassified as local deletes.
How this was introduced
The exact introducing PR is uncertain because this is a semantic bug in the
snapshot merge model, not a crash introduced by one isolated line.
Evidence from
packages/core-data/src/utils/crdt-blocks.tshistory:post/block CRDT merge infrastructure (
84019935998). That established thebroad pattern of merging incoming editor snapshots into the CRDT document.
schema-aware nested Yjs representation for table/query attributes
(
09a21c64b5b). That commit added the relevantobjectwithqueryhandlingthrough
Y.MapandmergeYMapValues(), which is the active failure path forthis issue.
array stability (
a6bfd3e5543). It is related to stale snapshots for nestedarrays, but this object+query map repro fails without needing array structural
matching.
My best supported attribution is that the stale full-snapshot model originates
with the initial block CRDT merge design, while the object+query map-specific
failure became observable through the nested Y.Map handling added in #76913.
Initial fix plan
Track the previous local editor snapshot for each local merge stream and derive
local operations by diffing:
Then apply only those local operations to the current Y.Doc. For object+query
maps:
local snapshots;
from the next local snapshot;
has no operation for that key.
The implementation should be schema-driven and not special-case
test/object-query-cardorhero.Fix plan audit
Linus Torvalds lens
The bug is an invariant failure: a stale snapshot is being treated as a set of
operations. Patching
mergeYMapValues()with heuristics such as "do not replaceif the Y.Doc value differs" would hide one symptom while breaking legitimate
local overwrites. The fix needs an explicit local-base invariant: writes must be
derived from a known prior local state, and the merge code must be small enough
that updates and deletes follow the same rule.
Kyle Kingsbury / Jepsen lens
The property to preserve is operation causality and convergence, not just final
deep equality for a happy path. The stale local snapshot has not observed B's
caption operation, so it cannot safely overwrite or delete that field. Tests
must include histories for remote update, remote add, remote delete, replayed
updates, late join, and both application orders. Deletes need special attention:
a missing field is only a delete if there is local-base evidence that the field
was removed locally.
Dan Luu lens
The Playwright repro matters because this is not just an artificial model test.
Stateful block UIs commonly keep draft form objects, debounce updates, or stage
multi-field edits before committing them. A fix that only handles the exact unit
shape may still fail under real editor interleavings. The tests should leave a
debuggable trail with ordinary block attributes and should avoid hidden metadata
that leaks into serialized content or plugin-visible block data.
Revised fix plan
applyPostChangesToCRDTDoc()or its caller, somergeCrdtBlocks()canreceive both the previous local blocks and the next local blocks for local
writes.
CRDT document, or explicitly trusted full-state replacement.
should emit key update/delete operations from previous-local to next-local
and apply those operations to the current Y.Map. For nested Y.Text and
Y.Array values, delegate to the corresponding operation-aware merge path.
preserving remote data and forcing an editor resync over applying a
destructive full-object overwrite.
crdt-object-query-stale-snapshot-repro.test.ts;stateful block UI behavior.
Open questions
editor stream, and not confused with remote state received from Yjs?
and migrations do not accidentally use the delta-only path?
query arrays, and top-level block arrays, or should object maps land as the
first narrow slice behind a common operation interface?
END AI TEXT