Reduce size of vector clocks #28

ept · 2017-06-22T15:41:32Z

At present, we create a new actor ID every time a node is started (either as a completely fresh node, or loading its state from file). That approach ensures that even if you run a process twice on the same machine, they will have different actor IDs, and so they won't step on each others' toes. However, the downside is that the rate of churn of actor IDs is high, and so the vector clocks (which have an entry for each actor) get rather large.

I did a first step in this direction in 9d8f136: instead of including the full vector clock in each changeset, we now only include the actor ID of the originator, the sequence number (which starts at 1 for a new actor ID and is incremented on every changeset), and a minimal set of dependencies (as actorId-seqNum pairs). The set of dependencies does not include any dependencies that can be reached transitively through other dependencies, and it implicitly includes seqNum–1 for the same actorId. In other words, it only references changesets from other actors that were received by the originator between seqNum–1 and seqNum, making the dependencies much like a merge commit in git.

However, the API for figuring out which changesets to send between peers (getVClock(), getDeltasAfter()) still uses full vector clocks without any truncation. Here, reducing the size is a little trickier. It can be done, but it will require more than one round-trip between peers to figure out what deltas need to be sent.

The reason is as follows: for example, say the latest changeset known by peer A (the "HEAD" in git terminology) has sequence number 42. All other changesets known by A can be reached transitively by following the dependency chains from A:42. So it is technically sufficient for getVClock() to return {A:42}, since that contains all the necessary information. However, any peer that does not have recent changesets from A cannot interpret the clock {A:42}. All it can tell is that it is missing a bunch of changesets from A, but it may also be missing changesets from other actors, and it won't find out what those are until it has received the changesets from A.

To solve this, we can probably take a look at the git transfer protocol as implemented by git-send-pack and git-upload-pack. Git has a similar problem, since it identifies the state of a repository by a commit hash, and if you only know a commit hash, you don't know what history you're missing in order to get there.

The text was updated successfully, but these errors were encountered:

ept · 2017-06-23T19:33:57Z

Current implementation of network protocol for exchanging changesets/deltas is in DeltaRouter. It might make sense to move some of that into Tesseract (as far as it can be done abstractly, without introducing a dependency on WebRTC).

pvh · 2018-02-14T23:48:08Z

Is this basically done, then, Martin? If so we might close this issue and reopen a new one that described any additional work you envision.

ept · 2018-02-15T21:57:44Z

@pvh Sort of. The vector clock attached to each individual change is now quite compact, since it only includes changes that arrived since the last change that was sent. Full vector clocks are still used in the Automerge.Connection network protocol for synchronising peers. This issue was intended to cover vector clock compaction in Automerge.Connection as well. However, I don't think we particularly need to do anything at the moment, since I think other areas of Automerge are in greater need of optimisation, and the vector clocks don't seem to be a bottleneck at the moment.

So I will close this issue and set it aside for now. We can always revisit it in the future if necessary.

ept mentioned this issue Jun 22, 2017

API for communicating deltas #20

Closed

ept mentioned this issue Jul 4, 2017

Channel concept, granularity of documents and dependency tracking #31

Open

ept mentioned this issue Aug 18, 2017

Connection abstraction to manage peer communication #36

Merged

ept closed this as completed Feb 15, 2018

edahlseng mentioned this issue Aug 6, 2019

Using hash chaining to encode the dependency graph #200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce size of vector clocks #28

Reduce size of vector clocks #28

ept commented Jun 22, 2017

ept commented Jun 23, 2017

pvh commented Feb 14, 2018

ept commented Feb 15, 2018

Reduce size of vector clocks #28

Reduce size of vector clocks #28

Comments

ept commented Jun 22, 2017

ept commented Jun 23, 2017

pvh commented Feb 14, 2018

ept commented Feb 15, 2018