Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy up src/seq fields and use a sparse index? #94

Open
alecgibson opened this issue Nov 5, 2020 · 2 comments · May be fixed by #157
Open

Tidy up src/seq fields and use a sparse index? #94

alecgibson opened this issue Nov 5, 2020 · 2 comments · May be fixed by #157
Milestone

Comments

@alecgibson
Copy link
Contributor

The o_XXX collection for holding ops has an index on {src: 1, seq: 1, v: 1}. Since this index points to every single op, it's pretty big. However, from what I can tell, its only use is to differentiate between slightly different errors when getting a document creation error, and it won't be needed for most of the time under "happy" conditions.

It seems pretty mad to have an entire index just for this corner case, but there we are. I was wondering if we could at least contain the index size by doing something like:

  • Making the index sparse
  • Adding a mechanism for clients to "tidy up", like a disconnection hook or similar. Given that src is a transient value meant to be unique per client, once a client disconnects we could potentially null all of the src fields on the ops they've touched in their session (could be expensive?), which would then release them from the index (I think? Not 100% sure this is how Mongo handles sparse indexes). Should note that clients now have a stable src between reconnects, so handling a "true" disconnect may be tricky?

I don't know if there's anything else we can do about this? Like change how ShareDB acts upstream?

@ericyhwang
Copy link
Contributor

ericyhwang commented Nov 11, 2020

Things that we discussed:

  • Adding a partial index on {d: 1, create: 1, v: 1} with {create: {$exists: true}} and using that to find all create ops, which in most cases is one per doc. Could be O(n) in the common case though
  • Or {d:1, src: 1, create: 1} with a sparse index?
  • Removing getCommittedOpVersion and instead scanning with d and v to find the create op. May need to scan the entire op history since create ops usually don't contain a v when submitted, since creates are usually done "blind" without fetching the doc by id first

We're going to punt on this for now, to let us stew it over a bit more

@alecgibson
Copy link
Contributor Author

Marking this is a goal for v2, because we'd need a breaking change in the sharedb API.

alecgibson added a commit to share/sharedb that referenced this issue May 17, 2024
The `DB.getCommitedOpVersion()` function is only used [in one place][1]:
in a corner case that differentiates between:

 - a create op that conflicts with another create op
 - a create op that has been re-submitted because the connection was
   closed before the ack was received

The first of these cases should result in an error, and the second is
non-fatal and the error can be swallowed.

At the moment, this differentiation is made using the special
`DB.getCommittedOpVersion()` function, which - given an op with a
`src` and `seq` combination - will return `null` if the op hasn't been
committed, or its version number if it has.

If the op has a committed version number, we know that the submit is a
duplicate, and we can safely ignore it.

This approach is problematic, because:

 - it [needs a whole op index][2] just to handle this niche corner case
 - the default behaviour [fetches **all** ops][3] unless the driver
   overrides this behaviour

This change proposes that we actually don't need this function
**at all**, and implements an alternative approach to differentiate
between the above cases.

Instead of having to fetch all ops, we just fetch a *single* op, by
version number, which [is already indexed][4] anyway. We then check to
see if this retrieved op is our creation op.

You can't do this in the general case, because an op's version number
may be changed between submit and commit because of conflicts with
remote ops, which cause it to be transformed up, and its version number
incremented.

**However**, the create op is a **special case**, which
[**cannot** be transformed against **any** other op][5], which means
that the version number it's submitted with **must** be the version
number it's committed with.

We can therefore use this special property of the create op to safely
retrieve itself from the data store by version if it has been committed.

[1]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/submit-request.js#L112
[2]: share/sharedb-mongo#94
[3]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/db/index.js#L69
[4]: https://github.com/share/sharedb-mongo/blob/7e88b0a23e968e3672ee151bf6abaf3bfdf62484/index.js#L401
[5]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/test/ot.js#L152-L162
alecgibson added a commit that referenced this issue May 20, 2024
- Fixes #94
- Follows on from share/sharedb#657

This change is to work alongside upstream work to remove ShareDB's
usage of `getCommittedOpVersion()`.

In `sharedb-mongo` this function requires an entire extra op index,
just to handle a corner case where two `create` ops are submitted at the
same time, which should happen relatively infrequently.

This change allows consumers to opt out of individual indexes. For
example, if we're not using `getCommittedOpVersion()`, consumers can
opt out of the automatic `src`/`seq`/`v` index creation with:

```js
new ShareDbMongo(
  mongoUrl,
  {
    disableIndexCreation: {
      src_seq_v: true,
    },
  },
);
```

Note that if this index already exists, consumers will need to manually
remove it.

Previous behaviour will still work, so setting:

```js
new ShareDbMongo(
  mongoUrl,
  {
    disableIndexCreation: true,
  },
);
```

disables **all** index creation.
@alecgibson alecgibson linked a pull request May 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants