Skip to content

Logootish

Nathan Pennie edited this page Nov 4, 2019 · 3 revisions

This assumes that you've read the section in "Home" and are familiar with Logoot.

Terminology

This may be a little bit confusing at first. I'd recommend that you skim the terminology section first and use it for reference when reading the rest of this.

  • Atom, character -- The two terms are used somewhat interchangeably. This is the smallest segment of text that cannot be split into smaller pieces. In this implementation, characters and atoms are the same thing, but in other Logoot implementations, lines could be used as atoms, for example. The downside to this would be that you couldn't insert text in the middle of the line. So for this algorithm, characters are used as atoms
  • Position -- A unique value for the atom that specifies its position
    • These positions must be able to be ordered in a set (I will explain ordering a bit later)
    • The current implementation used for a position is an array of numbers in a class used to provide utility functions. Examples of positions in text format are [0] or [0,0] or [1,2,3]
    • For any two positions, another position must be able to be created between them. If just integers are used, this poses a problem, since there's nothing between 0 and 1, for example. The solution to this is to add another integer to the array to make the position more specific
  • Levels -- The number of elements in the position array. For example, [0] would have 1 level. Be careful though! When getting the variable levels from a LogootPosition, 0 represents one level, not 1 like you would expect from an array length. That way, myposition.level(myposition.levels) will return the lowest level.
  • Vector Clock, rclk -- Imagine that you first insert a single atom at a particular position and then remove it to insert a different letter. If these edits came out of order, how would the client know that the second letter should be used and not removed? The solution is the vector clock, which is named rclk in the code and in events. The vector clock is incremented with every removal
  • Events -- This refers to the Matrix events actually sent in JSON form. Since it would be inefficient to have one event per insertion or removal, any length properties used are simply added on to the lowest level to find the end position. For example, if I insert hi at [0,1,0], the message would have a start position of [0,1,0] and the body of hi. The client would then figure out that the end position, or the position where the next node would be, is [0,1,2]. Thanks to in browser spell check, I now realize that it's insertion instead of insertation. If you see that, I meant to change it. Oops. Not a big deal
  • Node -- Since text is sent in longer chunks, to save on RAM, it is stored in longer chunks as well. A single segment spanning between two positions with (and this is key!) no nodes in the middle of it is a single node. Any nodes placed in the middle must cause the parent node to be split in two. (Though I will explain it in more detail later, this is one of the many jobs of the _mergeNode function) The reason for the necessary split is because the binary search tree must not have overlap to find nodes

Insertions

(Look, I even spelled it right this time!) These are the simplest type of modification. Let's assume that our document looks like 134, where the 1 has the position of [0] and the 3 and 4 have the positions [2] and [3], respectively. If 2 is inserted between the 1 and 3, it's easy to find a position in between their positions. Since 1 has the position [0] and 3 has the position [2], the position [1] is conveniently located right between them. So the new 2 can be given the position [1].

The final document would be represented like this:

[0]     - '1'
[1]     - '2'
[2]-[3] - '34'

Of course, usually it is never that easy. Let's assume that we have the same 134 as before, but this time the positions are [0], [1] and [2], respectively. Now, we try to insert 2 between 1 and 3, as before. This time, we must create a new position between [0] and [1] by adding a new level. Remember that the node with more levels comes first, so the new position would be [1,0]. It seems counter-intuitive to have [1,0] come before [1], but it greatly simplifies the code.

The final document would be represented like this:

[0]     - '1'
[1,0]   - '2'
[1]-[2] - '34'

Removals

Obviously, having text that can only be added is not super useful. The solution is another type of event: The removal event. Removal events have an array of removals. This is because one region of text that is continuous on the user's screen may have positions are not continuous. In order to reduce the number of events sent, multiple different removals can be packed into one event. Removals function pretty much like you'd expect them to: An array of removals has elements with both a start position and length. Like additions, the end position of a removal is determined by adding the length to the lowest level of the start position.

For example, let's assume that our document looks like 153 and the positions are sequential from [0] to [2]. Let's say that we want to remove 5 with the position [1]. A removal would be created with the first position at [1] and a length of 1, of course. The text 2 could be added at position [1], effectively replacing the 5.

However, this creates a problem. Let's say the events are reversed in order. The first document would read 123, but then the removal and addition would be performed second leading to a document reading 153. This is the exact opposite of what we want! The solution is to establish an order of events. This is where the vector clock comes in. Though I didn't mention it before, each event (including insertions) has a variable called rclk, which is the vector clock. The vector clock is incremented with each removal.

To re-explain the previous example in context of the vector clock, let's assume that our document looks like 153 and the positions are sequential from [0] to [2] all with the vector clock of 0. Now we remove 5 at the position [1] with a length of 1. The removal event would be sent with a vector clock of 0, but the vector clock would be incremented. The future 2 that would be inserted would be sent with the vector clock of 1. This way, if the 5 and the removal event are received second, they are rejected.

Now, hopefully, I clarified a few things. If I explained it poorly and it's confusing, feel free to ask any questions on #matrix-collaboration:kb1rd.net and I'd be happy to clarity it. (And hopefully fix this page!)

Contributing

Clone this wiki locally
You can’t perform that action at this time.