-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient sync #12
Comments
Earthstar still needs efficient sync. I've got an idea for a new method, and first it's worth revisiting how we thought we could solve this issue: Cinnamon was developing a 'local index' approach which gave log-like properties to Earthstar's k/v stores:
This approach was promising but it's not tolerant to a few scenarios: Telephone sync
Causality faults
So we need another approach. I've been looking at vector clocks, version vectors, and even interval tree clocks, but I don't think any of these are suitable for Earthstar: they would require recording a version per replica ID per document, for starters. They also seem to be built with the assumption that a cluster is under the control of a single entity, which is not the case with Earthstar. In a thread on SSB it was suggested to send the content hashes of all documents from one replica to another, and derive which docs are wanted from that. This can't work for Earthstar because Earthstar is able to overwrite and even delete values from its store: the absence of a hash is not enough to determine whether a replica wants a doc or not. What do replicas want?Replicas want the latest version of a valid document by any given author. A replica will discard documents written by the same author with a lower timestamp. If we provide this minimum level of information — path, author, timestamp — we can determine whether a replica wants that document or not. Let's imagine how this sync would work with the participants of our share, Nature's network has three participants: a computer owned by Cinnamon, and a computer owned by gwil, and a server owned by gwil. Each device has its own replica. Let's take a peek inside these replicas and remind ourselves how they work. Here's replica A: Each path (e.g. Now let's look at the insides of replica B: Replica B has some different documents to A. It has an updated version of gwil's document at In an ideal world, we would send only these new documents from replica B to replica A.
When Replica A syncs with Replica B, it's going to ask it for the minimum amount of information needed to determine if it wants any of Replica B's documents or not. Replica B will answer with something like this: Each item has a path, a collection of authors it has documents by, and the timestamps of those documents. Each item has a temporary ID generated for this sync. Replica A will proceed through each of these items like this:
Replica A will then send the IDs of the documents it wants: And replica B will send over the documents associated with those temporary IDs. That's my idea. I think that's about as good as I'm capable of at this point. It's not terribly complicated, which is an acceptance criteria for the Earthstar project.
|
I like the simplicity of this approach. Here are some additional considerations:
|
You're not overthinking it: messages with the pathname + versions will be streamed from one peer to the other.
My goal is to make syncing quick and cheap enough that restarting doesn't matter. Ideally replicas shouldn't have to remember any state about each other.
I think we should definitely have some tools to do stuff like this more easily though. I don't have much experience with workers, though.
Sounds reasonable, I'm up for trying it. |
I've been researching this week, and I found the talk CRDTs for Mortals (Offline First Apps). It gives a good overview of the problems with conflict resolution and syncing. The talk mostly skips over the syncing part, but the repo of the associated example app says: "This also contains an implementation of a merkle tree to check consistency of the data to make sure all clients are in sync." This repo is a very small implementation, which helps to better understand how it works. As I understand it, when the client syncs with the server, the client sends its latest messages (it would be "documents" for Earthstar) and a Merkle tree representing the contents of the client's data. The messages are added to the server's data. A new tree is created based on that. Then a diff is made between the client tree and the new server tree. That diff results in a timestamp used to query all messages the server has that the client does not have. Then the server responds with those messages and the server's tree. The client then does another tree diff and confirms that the two sides are in sync. The Hypercore protocol also uses Merkle trees "to only download the parts of the log they are interested in." I wonder if using a Merkle tree could help the syncing process for Earthstar. It certainly makes the library implementation more complex than what you were outlining, but if it improves syncing, then it may be worth it. Has Merkle trees been considered for Earthstar, yet? I would need to research more to get a better sense of how it would work. |
I've also been doing my research, and am wondering whether this is worth exploring some other (funded) time. I think merkle trees do not work for Earthstar as Earthstar shares store mutable sets of documents with identity. You can't know if you want a document or not simply whether you have it or not: a document peer A has but which peer B does not may be a document peer B has since overwritten. I have been reading this outline of Simple And Efficient Set Reconciliation which I think could be applied to Earthstar. I'm trying to assess whether it's something I can really fit into this year's work or not. |
That's a fascinating proposal. This is how I'm interpreting it: Searching based on a range means there has to be some agreed upon way to sort items within that range. For Earthstar, this range could be at few different things. Although I suspect there are problems I'm overlooking, because this is just a quick thought.
|
Before leaving on holiday, I had a call with @AljoschaMeyer to talk about using their range-based set reconciliation method with Earthstar. Long story short, we can do it! We also determined that Earthstar will still need to perform the step where a subset of document information (path + author + timestamp) needs to be sent over to determine if the peer actually wants the document it doesn't have yet. So the final procedure will be:
@basham Like you say, to do this we need a total order on documents. As it is possible for two documents to have the same timestamp (however remote the chance), we need to do either path-then-author / author-then-path. I think I am going to go with path-then-author just because this is how things are organised in other places in the codebase. |
Step 2. should not end up being an explicit step however, the items that are being exchanged by the range-based set reconciliation the procedure could/should be those 'thumbnails'.
Shouldn't the timestamp also appear in there somewhere? The same author is expected to write to the same path multiple time, aren't they? The total order might indeed be the most design-intensive remaining decision. Remember that the items over which you define the total order are also items you need to transmit when communicating the boundaries of a range fingerprint. So if you define the order on the thumbnails directly, each range fingerprint contributes an author, a path and a timestamp to the amount of bits that need to be sent. A more efficient approach is to hash each thumbnail and reconcile those hashes, simply using numeric ordering on those hashes. When transmitting range item sets, you could still transmit the actual thumbnails rather than their hashes. The main drawback is that this order is completely meaningless. This has an actual impact on the replication performance, as the protocol performs better in sessions where the items that differ between two peers are successive in the total order. Ordering hashes destroys all locality, whereas ordering by timestamp/author/path would lead to nice clustering in many realistic scenarios. The worst-case setting stays identical though. A meaningful order on thumbnails is more in the spirit of #6, as asking for a specific subrange, which, for example, contains only documents under a single path, naturally corresponds to a limited query. Keep in mind that multi-dimensional ranges are also a possibility; you could define a range as a triplet of a least and greatest path, least and greatest author, and least and greatest timestamp. This does however increase the implementation effort, as the auxiliary tree data structure would need to be a kd-tree. From a purely aesthetic view, it feels to me like it should either be an ordering on hashes, or three-dimensional ranges on thumbnails. But ordering thumbnails by an arbitrary linearization of their three dimensions seems, well, arbitrary. And since multi-dimensional ranges are more complicated, I would lean toward ordering hashes and leaving meaningful, three-dimensional ranges as a future possibility. But of course I'm a project outsider, so take this recommendation with a large grain of salt. |
Very happy to have your guidance here @AljoschaMeyer! I think I muddied the waters with the casual introduction of new phrasing.
In our last call, I think we talked about thumbnails being inappropriate for use as a fingerprint, as they’d be too big: a document thumbnail as a string would be minimum ~70 characters. My current thinking was that the fingerprint would be a much shorter hash derived from the document thumbnail. Could I map the total order of the document thumbnails (path / author) to their derived fingerprints? For example... Peer A making fingerprints:
Peer B, with a newer document at
Yes, but once an author writes to a path it replaces the previous document. A path will only ever have a single version from each author. The timestamp needs to be transmitted because a peer might have a newer version, in which case it doesn’t care about the older version of the document. |
Looks like we are on the same page, I used "thumbnail" in my last comment synonymously to how you just defined "document thumbnail".
Two different things are being exchanged: fingerprints and actual items. For earthstar, the actual items would be document thumbnails. The fingerprints would be (sums of) hashes of document thumbnails. In terminology of the thesis, the function that lifts document thumbnails into the universe of the monoid is some arbitrary hash function, the monoid could be addition modulo the number of distinct digests.
That is unfortunately not possible. If I give you nothing but a (secure) hash of some document thumbnail, you are by definition unable to compute a pre-image, so how would you know which document thumbnail to use for determining your position in the order? And you cannot really hope to find special hash functions that produce identically ordered hashes either; handwavingly speaking, hashes need to be too random to preserve such a high amount of structure.
Just to emphasize this: the timestamp must also influence the fingerprints, as otherwise peers could not distinguish between different versions during reconciliation. (I think you already know this, just want us to have it explicitly in this comment thread). |
Ignore this comment, if I'm not following all the intricacies of what's being proposed here. But I suspect the timestamp may need to be included. Since the three parts can't be hashed together, what if the parts within timestamp+path+author could be individually hashed? If the parts can't have a consistent and predicable length, they could be joined together with some delimiter that's outside of the hash range. Let's say for the sake of this example, the hashes are only alphanumeric ( So, perhaps the fingerprints could be something like this? (I just made up some characters in the derived fingerprint to illustrate some examples.)
|
Indeed it has to be. Just not explicitly, it suffices if it was somehow involved in computing the fingerprint.
This assumption is not true, it (thankfully) is completely sufficient to concatenate those three components and hashing the result. Hence I cannot see the necessity of your proposal - but I unfortunately cannot see where our mismatching perspectives come from either at this point. There is however also a subtle flaw with your proposal: it is actually important that the timestamp is not hashed on its own but in combination with other data, else fingerprint collisions become too likely. In your scheme, you would probably define the fingerprint of a set as the component-wise sum (or XOR or whichever monoid you prefer) of all document thumbnails in the set. Given the comparatively small number of timestamps, collisions become dangerously realistic. Particularly when simply combining literal timestamps, this can easily happen. But even when hashing time stems before combining them, 2^64 distinct values are simply not enough to consider the probability of collisions negligible. That is actually the main reason why I couldn't figure out how to get rid of the step of requesting all the thumbnails for which you actually want the content. I couldn't figure out how to thread timestamp information through the recursive fingerprint comparisons without high probability of collisions. |
Maybe part of the misunderstanding comes from the value of the author. My interpretation of "author" for the sake of this conversation would be more accurately Earthstar Identities.
There could be multiple Is the problem with collisions resolved when considering the combined parts of
Neat! |
It is. Suppose you have four different document thumbnails |
Yep, that makes sense. Thanks! |
(First, probably do #14 set up RPC)
Right now we sync by sending everything. Instead, do something like:
hash(list of individual item hashes)
This could obviously recurse further into the 1000 items, but there are tradeoffs:
We can't pre-compute the hashes because replication filters might choose a different subset of items each time.
This should eventually support #6 Replication filters.
The text was updated successfully, but these errors were encountered: