New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce size of vector clocks #28
Comments
Current implementation of network protocol for exchanging changesets/deltas is in DeltaRouter. It might make sense to move some of that into Tesseract (as far as it can be done abstractly, without introducing a dependency on WebRTC). |
Is this basically done, then, Martin? If so we might close this issue and reopen a new one that described any additional work you envision. |
@pvh Sort of. The vector clock attached to each individual change is now quite compact, since it only includes changes that arrived since the last change that was sent. Full vector clocks are still used in the So I will close this issue and set it aside for now. We can always revisit it in the future if necessary. |
At present, we create a new actor ID every time a node is started (either as a completely fresh node, or loading its state from file). That approach ensures that even if you run a process twice on the same machine, they will have different actor IDs, and so they won't step on each others' toes. However, the downside is that the rate of churn of actor IDs is high, and so the vector clocks (which have an entry for each actor) get rather large.
I did a first step in this direction in 9d8f136: instead of including the full vector clock in each changeset, we now only include the actor ID of the originator, the sequence number (which starts at 1 for a new actor ID and is incremented on every changeset), and a minimal set of dependencies (as actorId-seqNum pairs). The set of dependencies does not include any dependencies that can be reached transitively through other dependencies, and it implicitly includes seqNum–1 for the same actorId. In other words, it only references changesets from other actors that were received by the originator between seqNum–1 and seqNum, making the dependencies much like a merge commit in git.
However, the API for figuring out which changesets to send between peers (
getVClock()
,getDeltasAfter()
) still uses full vector clocks without any truncation. Here, reducing the size is a little trickier. It can be done, but it will require more than one round-trip between peers to figure out what deltas need to be sent.The reason is as follows: for example, say the latest changeset known by peer A (the "HEAD" in git terminology) has sequence number 42. All other changesets known by A can be reached transitively by following the dependency chains from A:42. So it is technically sufficient for
getVClock()
to return{A:42}
, since that contains all the necessary information. However, any peer that does not have recent changesets from A cannot interpret the clock{A:42}
. All it can tell is that it is missing a bunch of changesets from A, but it may also be missing changesets from other actors, and it won't find out what those are until it has received the changesets from A.To solve this, we can probably take a look at the git transfer protocol as implemented by git-send-pack and git-upload-pack. Git has a similar problem, since it identifies the state of a repository by a commit hash, and if you only know a commit hash, you don't know what history you're missing in order to get there.
The text was updated successfully, but these errors were encountered: