Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interleaving can happen if you insert elements in reverse order into a list #395

Open
pvh opened this issue May 25, 2021 · 2 comments
Open
Milestone

Comments

@pvh
Copy link
Member

pvh commented May 25, 2021

Martin thinks this might not be a big deal in practice, but it is technically a breaking API change, so if we want to change that algorithm we need to slightly change the way we record insert operations by adding an extra field to the operations.

We should either fix this before 1.0 or at least put the extra field into the data format so that we don't need to do a second migration.

(We should probably just adopt the algorithm Y-js uses for this which is marginally more expensive but not significantly, so let's just standardize.)

@pvh pvh added this to the 1.0 milestone May 25, 2021
@dmonad
Copy link

dmonad commented Jun 18, 2021

Thanks for opening this issue. Just wanted to drop in here to say that the current YATA implementation in Yjs (which is based on the same mathematical model as the original YATA algorithm) outperforms RGA. Yjs applies any update in O(c), where c is the number of conflicting insertions at the same position. On average, it applies updates in O(c/2). When applying several updates in bulk (e.g. loading the initial state of the document), each contained update is applied in O(1). There are a lot of interesting caveats to the algorithm that make it shine in the crdt-benchmarks repository.

However, it is true that YATA requires more fields (one more integer). But the size of the encoded Yjs document is usually smaller than Automerge, even without garbage collection. Furthermore, since everything is kept in a single linked list, Yjs/YATA doesn't require additional bookkeeping for potential child elements and therefore has a smaller memory footprint.

If you are looking to switch algorithms, it might make sense to adapt Yjs/Yrs in the backend. You can express a custom view (e.g. using immutable data structures). Another advantage might be that you could make use of all the existing editor bindings for Yjs. Also, it would mean that we could work together :)

One critique point I heard a lot from Automerge folks is that Yjs does perform garbage collection. You can turn that off to keep the complete history or configure it to keep only relevant parts of the history (e.g. information that is necessary to restore snapshots).

@ept
Copy link
Member

ept commented Jun 28, 2021

Thank you @dmonad for chiming in. I have looked at this a bit further, with help from @josephg, and found that Yjs can also exhibit interleaving when list elements are inserted in reverse order. The following code demonstrates this happening:

const Y = require('yjs')
let doc1 = new Y.Doc(), doc2 = new Y.Doc(), doc3 = new Y.Doc()
doc1.clientID = 1
doc2.clientID = 2
doc3.clientID = 3

// First doc3 inserts a bunch of 'a's, then doc1 prepends another bunch of 'a's
// (note these might actually be the same user in different editing sessions)
doc3.getArray().insert(0, ['a', 'a', 'a', 'a', 'a'])
Y.applyUpdateV2(doc1, Y.encodeStateAsUpdateV2(doc3, Y.encodeStateVector(doc1)))
doc1.getArray().insert(0, ['a', 'a', 'a', 'a', 'a'])

// Concurrently, doc2 inserts a bunch of 'B's
doc2.getArray().insert(0, ['B', 'B', 'B', 'B', 'B'])

// In the merged final state, the 'B's are interleaved among the 'a's
Y.applyUpdateV2(doc1, Y.encodeStateAsUpdateV2(doc2, Y.encodeStateVector(doc1)))
console.log(JSON.stringify(doc1.getArray().toArray()))
// Prints: ["a","a","a","a","a","B","B","B","B","B","a","a","a","a","a"]

Interleaving happens less often in Yjs than in RGA because (as far as I can tell) it happens only if the reverse order insertions use multiple clientIDs. However, I can easily imagine scenarios in which that would be the case (e.g. a user does some work on one device, and then continues that work on another device, while concurrently another user is editing the document offline).

I believe @josephg's variant of YATA, which is called integrateYjsMod in his reference CRDTs repo, fixes this interleaving issue. For that reason I am thinking of adopting this modified algorithm. Ideally I would like to prove it correct, though, because it's a very subtle algorithm that would be easy to get wrong. My colleagues and I have previously proved RGA correct, and I'm considering a similar approach here. We don't necessarily need to wait until the proof is done before adopting the algorithm in Automerge: I think @josephg's fuzzing already gives us pretty good confidence in the algorithm's correctness, and I'm just looking for extra certainty. Especially as my last attempt to produce an interleaving-free variant of RGA turned out to be disastrously wrong.

Thank you for the offer of collaborating on Yjs. For me, Automerge is a research project as much as it is an open source product, and I have tons of CRDT research ideas that I want to try. For that purpose it is useful to have a self-contained implementation with minimal dependencies, giving us the freedom to change the internals in whatever way supports our research. For example, we want to explore Git-style branching workflows in which users don't just immediately apply every edit as soon as they receive it, but where users can explicitly manage and compare several versions/branches/forks of their document. This requires not only retaining the editing history, but also being able to diff document versions to produce "suggested changes", and selectively merging those changes. I'm not sure how well Yjs would be able to support such workflows currently.

More generally, the CRDT space is big, and I think there is plenty of room for several CRDT libraries to coexist happily, each targeting a different point in the trade-off space. It's healthy to have several projects that try different approaches, so that we can learn what works and what doesn't. For that reason I think it's good for Automerge and Yjs to remain independent, and I strongly believe that both are good projects that can both thrive.

Moreover, I hope we can still work together, even if our projects remain independent! For example, I would like to write a paper on this interleaving issue, and I'd love for you and @josephg to contribute to this if you're interested. We can discuss that privately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants