You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are several features that need to know when the local Storage instance received a certain document.
Implementation
When ingesting a document, give it an extra field _receivedTimestamp which is equal to the local time on this peer.
That field is sort of a private local field; don't send it to other nodes when syncing. That's why it has an underscore.
Allow queries to specify sorting by _receivedTimestamp and continuing from after a given timetstamp. This might mean adding an extra index to the storage backends.
This timestamp should be enforced to be monotonic, e.g. despite changes in the computer clock this number should only ever increase.
It could also just be an incrementing sequence number but there are a few benefits to using timestamps, such as the ability to merge event streams across different workspaces.
Implications
The Document type sometimes needs to hold this local-only field, _receivedTimestamp. Maybe the type needs to split into two, DocumentWithLocalData and DocumentForWire...
Storage backends need to support the extra field, and sorting by it.
The field should never be sent across the wire to another peer
Document validation shouldn't ever encounter this field, or should ignore it
Monotonic time tracking needs a little bit of storage to record the last value; it could go in the workspace config storage.
Features this unlocks
Reliable live streaming Live syncing relies on a livestream of changes to a Storage. If that's interrupted, we want to pick up where it left off. The reliable way to do that is to sort by _receivedTimestamp, and resume a stream with anything after a given _receivedTimestamp.
Reliable indexing of workspace data If you wanted to build an index against a Storage it's much the same problem -- you need a feed of changes for updating your index. You can't just sort by regular document timestamp because sometimes you get documents a long time after they're authored, so their regular timestamp is different than the order you received them in. In this case you want a tuple: (generation, _receivedTimestamp). Generation is an integer that increments whenever the entire storage is forgotten and reset, or some documents have been locally forgotten (besides ephemeral documents), or when the storage is recreated from scratch. If the generation changes, the index has to start over and re-index everything. Generation could also be a plain timestamp for any of the previous types of events.
Possibly more efficient syncing Haven't worked this out yet, but it might help two peers figure out what data to trade with each other
The text was updated successfully, but these errors were encountered:
cinnamon-bun
changed the title
Track when docs were received by a peer: _receivedTimestamp
Track when docs were received by a peer: _receivedTimestamp
Feb 14, 2021
cinnamon-bun
changed the title
Track when docs were received by a peer: _receivedTimestamp
Track when docs were received by a peer: _receivedTimestamp or _localSeq
Mar 18, 2021
A big diagram explaining this situation from the perspective of an App or Layer that wants to index an Earthstar IStorage.
This is the "Reliable Indexing" use case. The "Reliable live streaming" situation is very similar, just replace the orange box with another Peer instead of an App. In both cases the other party wants to track how much of a Storage it has processed using a minimal amount of state, like just a single index integer, so it can resume indexing later when it has been away and missed some events.
If an app or peer is always there and receives all the events, none of this is needed, the events are enough to tell it what it needs to know.
The downside of adding _localSeq metadata to documents is that new we have yet another way we need to query and sort them. IStorages will need to have an index for this purpose...
cinnamon-bun
changed the title
Track when docs were received by a peer: _receivedTimestamp or _localSeq
Track when docs were received by a peer: _receivedTimestamp or _localSeq - to enable reliable indexing and reliable livestreaming
Mar 18, 2021
There are several features that need to know when the local Storage instance received a certain document.
Implementation
_receivedTimestamp
which is equal to the local time on this peer._receivedTimestamp
and continuing from after a given timetstamp. This might mean adding an extra index to the storage backends.Implications
Document
type sometimes needs to hold this local-only field,_receivedTimestamp
. Maybe the type needs to split into two,DocumentWithLocalData
andDocumentForWire
...Features this unlocks
_receivedTimestamp
, and resume a stream with anything after a given_receivedTimestamp
.(generation, _receivedTimestamp)
. Generation is an integer that increments whenever the entire storage is forgotten and reset, or some documents have been locally forgotten (besides ephemeral documents), or when the storage is recreated from scratch. If the generation changes, the index has to start over and re-index everything. Generation could also be a plain timestamp for any of the previous types of events.The text was updated successfully, but these errors were encountered: