Skip to content

fix: always trust worker storage over stale da-admin on DO rehydration#138

Closed
kptdobe wants to merge 1 commit into
mainfrom
fix/trust-worker-storage
Closed

fix: always trust worker storage over stale da-admin on DO rehydration#138
kptdobe wants to merge 1 commit into
mainfrom
fix/trust-worker-storage

Conversation

@kptdobe
Copy link
Copy Markdown
Contributor

@kptdobe kptdobe commented May 8, 2026

Problem

When a Durable Object is evicted (e.g. during an image upload that briefly drops the WebSocket), bindState refetches the document from da-admin and compares fromStorage === current. This comparison fails whenever worker storage holds edits that have not yet been flushed to da-admin (debounced PUT still in flight). The mismatch sets restored = false, fires the 1-second reload timer, and overwrites the Yjs document with the stale da-admin snapshot — causing recently-inserted images to disappear.

Root cause

The fromStorage === current string comparison was meant to detect "da-admin was externally modified while the DO was sleeping." In practice it fires in the normal case: any edit that hasn't been saved to da-admin yet will produce fromStorage !== current, triggering the destructive reload.

Solution

Remove the string comparison and always set restored = true when valid storage exists.

Why this is safe: Worker storage is updated synchronously on every Yjs transaction (via the update listener) and therefore always holds the most-recent collaborative state — potentially newer than da-admin. The syncadmin and deleteadmin admin APIs are the correct mechanism for clearing stale storage when da-admin is modified externally.

Tests

Added: Test bindstate trusts storage when storage is newer than da-admin — verifies that the reload timer is NOT registered when storage contains edits not yet in da-admin, and that the storage content survives without being overwritten.

🤖 Generated with Claude Code

Comment thread src/shareddoc.js
Comment on lines 383 to +396
// Read the stored state from internal worker storage (errors are non-fatal)
try {
const timingBeforeReadState = Date.now();
const stored = await readState(docName, storage);
timingReadStateDuration = Date.now() - timingBeforeReadState;

if (stored && stored.length > 0) {
Y.applyUpdate(ydoc, stored);

// Check if the state from the worker storage is the same as the current state in da-admin.
// So for example if da-admin doesn't have the doc any more, or if it has been altered in
// another way, we don't use the state of the worker storage.
const fromStorage = docType === 'json' ? doc2json(ydoc) : doc2aem(ydoc);
if (fromStorage === current) {
restored = true;

// eslint-disable-next-line no-console
console.log('[docroom] Restored from worker persistence', docName);
}
// Worker storage holds the most-recent Yjs state, including edits that have not yet
// been flushed to da-admin (e.g. because the debounced PUT was still in flight when the
// DO was evicted). Always trust storage over da-admin; the syncadmin/deleteadmin APIs
// are responsible for clearing stale storage when da-admin is modified externally.
restored = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the document is updated directly in da-admin (eg via a PUT there) while someone has an editor open on the same document? It seems like this change will then basically ignore that change made in da-admin and overwrite it with the editor content...?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a real use case ? If you PUT a completely different doc while someone is editing, I think it is much better that the editor wins...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a real use case for situations like localizing documents. Someone may have a doc open in their editor when a new localization batch is started and pushed. In that case we would want the da-admin PUT to overwrite the content in an open editor.

The `fromStorage === current` string comparison failed whenever worker
storage held edits newer than da-admin (e.g. debounced PUT still in
flight when the DO was evicted). This set restored=false, fired the
1-second reload timer, and overwrote the ydoc with the stale da-admin
snapshot — causing recently-inserted images to disappear.

Worker storage is always the most-recent Yjs state for the collaborative
session. The syncadmin/deleteadmin APIs are responsible for clearing
storage when da-admin is modified externally, so removing the string
comparison is safe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kptdobe kptdobe force-pushed the fix/trust-worker-storage branch from a761e94 to 9c8e251 Compare May 11, 2026 07:55
@kptdobe
Copy link
Copy Markdown
Contributor Author

kptdobe commented May 13, 2026

This one is too controversial, let's close it for now.

@kptdobe kptdobe closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants