New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy up src/seq fields and use a sparse index? #94
Milestone
Comments
Things that we discussed:
We're going to punt on this for now, to let us stew it over a bit more |
Open
Marking this is a goal for v2, because we'd need a breaking change in the |
alecgibson
added a commit
to share/sharedb
that referenced
this issue
May 17, 2024
The `DB.getCommitedOpVersion()` function is only used [in one place][1]: in a corner case that differentiates between: - a create op that conflicts with another create op - a create op that has been re-submitted because the connection was closed before the ack was received The first of these cases should result in an error, and the second is non-fatal and the error can be swallowed. At the moment, this differentiation is made using the special `DB.getCommittedOpVersion()` function, which - given an op with a `src` and `seq` combination - will return `null` if the op hasn't been committed, or its version number if it has. If the op has a committed version number, we know that the submit is a duplicate, and we can safely ignore it. This approach is problematic, because: - it [needs a whole op index][2] just to handle this niche corner case - the default behaviour [fetches **all** ops][3] unless the driver overrides this behaviour This change proposes that we actually don't need this function **at all**, and implements an alternative approach to differentiate between the above cases. Instead of having to fetch all ops, we just fetch a *single* op, by version number, which [is already indexed][4] anyway. We then check to see if this retrieved op is our creation op. You can't do this in the general case, because an op's version number may be changed between submit and commit because of conflicts with remote ops, which cause it to be transformed up, and its version number incremented. **However**, the create op is a **special case**, which [**cannot** be transformed against **any** other op][5], which means that the version number it's submitted with **must** be the version number it's committed with. We can therefore use this special property of the create op to safely retrieve itself from the data store by version if it has been committed. [1]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/submit-request.js#L112 [2]: share/sharedb-mongo#94 [3]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/db/index.js#L69 [4]: https://github.com/share/sharedb-mongo/blob/7e88b0a23e968e3672ee151bf6abaf3bfdf62484/index.js#L401 [5]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/test/ot.js#L152-L162
alecgibson
added a commit
that referenced
this issue
May 20, 2024
- Fixes #94 - Follows on from share/sharedb#657 This change is to work alongside upstream work to remove ShareDB's usage of `getCommittedOpVersion()`. In `sharedb-mongo` this function requires an entire extra op index, just to handle a corner case where two `create` ops are submitted at the same time, which should happen relatively infrequently. This change allows consumers to opt out of individual indexes. For example, if we're not using `getCommittedOpVersion()`, consumers can opt out of the automatic `src`/`seq`/`v` index creation with: ```js new ShareDbMongo( mongoUrl, { disableIndexCreation: { src_seq_v: true, }, }, ); ``` Note that if this index already exists, consumers will need to manually remove it. Previous behaviour will still work, so setting: ```js new ShareDbMongo( mongoUrl, { disableIndexCreation: true, }, ); ``` disables **all** index creation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The
o_XXX
collection for holding ops has an index on{src: 1, seq: 1, v: 1}
. Since this index points to every single op, it's pretty big. However, from what I can tell, its only use is to differentiate between slightly different errors when getting a document creation error, and it won't be needed for most of the time under "happy" conditions.It seems pretty mad to have an entire index just for this corner case, but there we are. I was wondering if we could at least contain the index size by doing something like:
src
is a transient value meant to be unique per client, once a client disconnects we could potentiallynull
all of thesrc
fields on the ops they've touched in their session (could be expensive?), which would then release them from the index (I think? Not 100% sure this is how Mongo handles sparse indexes). Should note that clients now have a stablesrc
between reconnects, so handling a "true" disconnect may be tricky?I don't know if there's anything else we can do about this? Like change how ShareDB acts upstream?
The text was updated successfully, but these errors were encountered: