Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data model for storing revision history in FoundationDB #1957

Closed
kocolosk opened this issue Feb 28, 2019 · 8 comments
Closed

Data model for storing revision history in FoundationDB #1957

kocolosk opened this issue Feb 28, 2019 · 8 comments

Comments

@kocolosk
Copy link
Member

kocolosk commented Feb 28, 2019

Introduction

This is a proposal for the storage of document revision history metadata as a set of KVs in FoundationDB.

Abstract

This design stores each edit branch as its own KV, and all of the edit branches are stored separately from the actual document data. Document reads can avoid retrieving this information, while writes can avoid retrieving the document body.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Terminology


Detailed Description

The size limits in FoundationDB preclude storing the entire revision tree as a single value; in pathological situations the tree could exceed 100KB. Rather, we propose to store each edit branch as a separate KV. Specifically, we create a "revisions" subdirectory in each database directory to store the revision trees with keys and values that look like

(“revisions”, DocID, NotDeleted, RevFormat, RevPosition, RevHash) = (Versionstamp, [ParentRev, GrandparentRev, …])

where the individual elements of the key and value are defined as follows:

  • DocID: the document ID
  • NotDeleted: \x00 if the leaf of the edit branch is deleted, \x01 otherwise
  • RevFormat: enum for the revision encoding being used, start at \x01 with this proposal
  • RevPosition: positive integer encoded using standard tuple layer encoding (signed, variable-length, order-preserving)
  • RevHash: 16 bytes uniquely identifying this revision
  • Versionstamp: the FoundationDB versionstamp associated with the last transaction that modified the document (NB: not necessarily the last edit to this branch).
  • [ParentRev, GrandparentRev, ...]: 16 byte identifiers of ancestors, up to 1000 by default

Limits

In order to stay compatible with FoundationDB size limits we need to prevent administrators from increasing _revs_limit beyond what we can fit into a single value. Suggest 4000 as a max.

Update Path

Multiple edit branches on a document are largely independent of one another in this design, but some coordination is required around the Versionstamp. Recall that the _changes feed includes each document exactly once, so we do not want to be able to extend different edit branches in parallel and end up adding both stamps to the feed. We address this by storing the Versionstamp only on the so-called "winning" branch. Other branches set this to null.

If a writer comes in and tries to extend a losing edit branch, it will find the Versionstamp to be null and will do an additional edit branch read to retrieve the winning branch. It can then compare both branches to see which one will be the winner following that edit, and can assign the new Versionstamp to that branch accordingly.

A writer attempting to delete the winning branch (i.e., setting NotDeleted to 0) will need to read two contiguous KVs, the one for the winner and the one right before it. If the branch before it will be the winner following the deletion then we move the storage of the new Versionstamp to it accordingly. If the tombstoned branch remains the winner for this document then we only update that branch.

A writer extending the winning branch with an updated document (the common case) will proceed reading just the one branch.

Summarizing the performance profile:

  • Extending a losing branch: 2 KVs, 2 roundtrips
  • Deleting the winning branch: 2 KVs, 1 roundtrip
  • Extending the winning branch: 1 KV, 1 roundtrip
  • new_edits=false update: <N> KVs, 1 roundtrip

Advantages

We can read a document revision without retrieving the revision tree, which in the case of frequently-edited documents may be larger than the doc itself.

We ensure that an interactive document update against the winning branch only needs to read the edit branch KV against which the update is being applied, and it can read that branch immediately knowing only the content of the edit that is being attempted (i.e., it does not need to read the current version of the document itself). The less common scenario of updating a losing branch is only slightly less efficient, requiring two roundtrips.

Interactively updating a document with a large number of edit branches is therefore dramatically cheaper, as no more than two edit branches are read or modified regardless of the number of branches that exist, and no tree merge logic is required.

Including NotDeleted in the key ensures that we can efficiently accept the case where we upload a new document with the same ID where all previous edit branches have been deleted; i.e. we can construct a key selector which automatically tells us there are no deleted=false edit branches.

The RevFormat enum gives us the ability to evolve revision history storage over time, and to support alternative conflict resolution policies like Last Writer Wins.

Access to Versionstamp ensures we can clear the old entry in the by_seq space during an edit. The set_versionstamped_value API is used to store this value automatically.

The key structure above naturally sorts so that the "winning" revision is the last one in the list, which we leverage when deleting the winning edit branch (and thus promoting the one next in line), and extending a conflict branch (to coordinate the update to the Versionstamp) This is also a small optimization for reads with ?revs=true or ?revs_info=true, where we want the details of the winning edit branch but don't actually know the RevPosition and RevHash of that branch.

Disadvantages

Historical revision identifiers shared by multiple edit branches are duplicated.

Key Changes

Administrators cannot set _revs_limit larger than 4,000 (previously unlimited?). Default stays the same at 1,000.

The intention with this data model is that an interactive edit that supplies a revision identifier of a deleted leaf will always fail with a conflict. This is a subtle departure from CouchDB 2.3 behavior, where an attempt to extend a deleted edit branch can succeed if some other deleted=false edit branch exists. This is an undocumented and seemingly unintentional behavior. If we need to match that behavior it will require reading 3 KVs in 2 roundtrips for every edit that we reject with a conflict.

Modules affected

TBD depending on exact code layout going forward, but the couch_key_tree module contains the current revision tree implementation.

HTTP API additions

None.

HTTP API deprecations

None.

Security Considerations

None have been identified.

References

Original mailing list discussion

Acknowledgements

Thanks to @iilyak, @davisp, @janl, @garrensmith and @rnewson for comments on the mailing list discussion.

@kocolosk
Copy link
Member Author

If one were inclined to save a byte it would be easy to combine NotDeleted and RevFormat into a single byte that still sorts optimally.

@rnewson
Copy link
Member

rnewson commented Feb 28, 2019

a) I love it. b) do we really need to let revs limit go above 1000? In fact, should we allow this to be edited at all?

I note that we didn't finish the RFC thread but the last comment was to add a "security considerations" section, so if you would add that it's appreciated. I suspect it will be short.

@kocolosk
Copy link
Member Author

Thanks! I added the Security Considerations section, and tweaked the Advantages section to replace a redundant description of the write path behavior with a description of the read path.

@wohali
Copy link
Member

wohali commented Mar 1, 2019

Sorry about that - the RFC PR is going to be merged shortly: #1914

@kocolosk
Copy link
Member Author

kocolosk commented Mar 1, 2019

@davisp pointed out something quite important in a discussion on IRC that I want to capture here. The _changes feed has the property that each document shows up exactly once regardless of the number of edit branches. The current design in the RFC isn’t quite compatible with that requirement, because we’re saying each edit branch can be extended independently without paying attention to the others.

We talked about a way to address this, which I’ll leave first in the comment here. We use the fact that a new edit branch can only be created by a new_edits=false write, where we anticipate retrieving the entire set of branch KVs. The writer in that case should store VersionStampForCurrentRev only in the “winning” edit branch and set the other ones to null.

If a writer comes in and tries to extend a losing edit branch, it will find the VersionstampForCurrentRev to be null and will do an additional edit branch read to retrieve the winning branch. It can then compare both branches to see which one will be the winner following that edit, and can assign VersionStampForCurrentRev accordingly.

A writer attempting to delete the winning branch (i.e., setting NotDeleted to 0) will need to read two contiguous KVs, the one for the winner and the one right before it. If the branch before it will be the winner following the deletion then we move the VersionstampForCurrentRev to it accordingly. If the tombstones branch remains the winner for this document then we only update that branch.

A writer extending the winning branch with an updated document (the common case) will proceed as before with no loss in efficiency.

I’ve tried to think through all possible concurrency issues here but I think that FoundationDB’s transaction isolation model delivers the goods in every case.

As far as the data model is concerned, the only change is that the Versionstamp is stored exclusively in the winning edit branch KV at all times, with all other branches having a null byte there instead. Also we should rename the field as it’s really VersionstampOfLatestEdit, regardless of whether that corresponds to the revision in the same KV.

@kocolosk
Copy link
Member Author

kocolosk commented Mar 5, 2019

I updated the text of the proposal to incorporate the details from my last comment.

Separately, @wohali pointed me to an old bug report in #1418 which has some bearing here. It seems that we are currently rather inconsistent in how we handle attempts to extend a tombstoned edit branch. If every branch in the document is tombstoned, we reject edit attempts that specify an explicit _rev with a conflict. But if at least one branch is deleted=false, we will allow other tombstoned branches to be extended. This is inconsistent and seemingly unintentional behavior.

The proposed data model is most efficient if we block all explicit updates to tombstoned edit branches (i.e., updates that supply the _rev of the tombstone as a base). If we do need to support the current CouchDB behavior, it will require attempting to read 3 KVs for every edit that gets rejected with a conflict: the deleted=false KV for that revision, the deleted=true KV for that revision, and the last KV for the document (to see if all branches are deleted=true). We would likely do this in 2 roundtrips to avoid paying the extra cost on every edit. It's doable, but I think we may want to take this opportunity to make our API more consistent. I've updated the proposal to point us in that direction.

Digging even deeper, folks should be aware that when users create a new document (with no _rev) over top of a document where all branches are tombstoned, current CouchDB will internally extend the "winning" tombstoned branch with the new edit. I believe this was intended solely as a performance optimization, to avoid creating a too-wide revision tree, but as #1418 makes clear this optimization has real user-visible implications in the API. The data model in this RFC naturally supports the same optimization. We can have a detailed discussion of what we think the right behavior ought to be going forward, but I don't think it should hold up the decision on the data model.

@kocolosk
Copy link
Member Author

kocolosk commented Mar 5, 2019

Some discussion of whether it makes more sense to model RFCs as PRs against the documentation repo. Merits to both options. I filed the PR version of this issue at apache/couchdb-documentation#397

@rnewson
Copy link
Member

rnewson commented Mar 5, 2019

general agreement on couchdb-dev (IRC) to use the PR approach as it allows clarifying commits, reviews and a trail of approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants