Data model for storing revision history in FoundationDB #1957

kocolosk · 2019-02-28T21:25:05Z

Introduction

This is a proposal for the storage of document revision history metadata as a set of KVs in FoundationDB.

Abstract

This design stores each edit branch as its own KV, and all of the edit branches are stored separately from the actual document data. Document reads can avoid retrieving this information, while writes can avoid retrieving the document body.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Terminology

Detailed Description

The size limits in FoundationDB preclude storing the entire revision tree as a single value; in pathological situations the tree could exceed 100KB. Rather, we propose to store each edit branch as a separate KV. Specifically, we create a "revisions" subdirectory in each database directory to store the revision trees with keys and values that look like

(“revisions”, DocID, NotDeleted, RevFormat, RevPosition, RevHash) = (Versionstamp, [ParentRev, GrandparentRev, …])

where the individual elements of the key and value are defined as follows:

DocID: the document ID
NotDeleted: \x00 if the leaf of the edit branch is deleted, \x01 otherwise
RevFormat: enum for the revision encoding being used, start at \x01 with this proposal
RevPosition: positive integer encoded using standard tuple layer encoding (signed, variable-length, order-preserving)
RevHash: 16 bytes uniquely identifying this revision
Versionstamp: the FoundationDB versionstamp associated with the last transaction that modified the document (NB: not necessarily the last edit to this branch).
[ParentRev, GrandparentRev, ...]: 16 byte identifiers of ancestors, up to 1000 by default

Limits

In order to stay compatible with FoundationDB size limits we need to prevent administrators from increasing _revs_limit beyond what we can fit into a single value. Suggest 4000 as a max.

Update Path

Multiple edit branches on a document are largely independent of one another in this design, but some coordination is required around the Versionstamp. Recall that the _changes feed includes each document exactly once, so we do not want to be able to extend different edit branches in parallel and end up adding both stamps to the feed. We address this by storing the Versionstamp only on the so-called "winning" branch. Other branches set this to null.

If a writer comes in and tries to extend a losing edit branch, it will find the Versionstamp to be null and will do an additional edit branch read to retrieve the winning branch. It can then compare both branches to see which one will be the winner following that edit, and can assign the new Versionstamp to that branch accordingly.

A writer attempting to delete the winning branch (i.e., setting NotDeleted to 0) will need to read two contiguous KVs, the one for the winner and the one right before it. If the branch before it will be the winner following the deletion then we move the storage of the new Versionstamp to it accordingly. If the tombstoned branch remains the winner for this document then we only update that branch.

A writer extending the winning branch with an updated document (the common case) will proceed reading just the one branch.

Summarizing the performance profile:

Extending a losing branch: 2 KVs, 2 roundtrips
Deleting the winning branch: 2 KVs, 1 roundtrip
Extending the winning branch: 1 KV, 1 roundtrip
new_edits=false update: <N> KVs, 1 roundtrip

Advantages

We can read a document revision without retrieving the revision tree, which in the case of frequently-edited documents may be larger than the doc itself.

We ensure that an interactive document update against the winning branch only needs to read the edit branch KV against which the update is being applied, and it can read that branch immediately knowing only the content of the edit that is being attempted (i.e., it does not need to read the current version of the document itself). The less common scenario of updating a losing branch is only slightly less efficient, requiring two roundtrips.

Interactively updating a document with a large number of edit branches is therefore dramatically cheaper, as no more than two edit branches are read or modified regardless of the number of branches that exist, and no tree merge logic is required.

Including NotDeleted in the key ensures that we can efficiently accept the case where we upload a new document with the same ID where all previous edit branches have been deleted; i.e. we can construct a key selector which automatically tells us there are no deleted=false edit branches.

The RevFormat enum gives us the ability to evolve revision history storage over time, and to support alternative conflict resolution policies like Last Writer Wins.

Access to Versionstamp ensures we can clear the old entry in the by_seq space during an edit. The set_versionstamped_value API is used to store this value automatically.

The key structure above naturally sorts so that the "winning" revision is the last one in the list, which we leverage when deleting the winning edit branch (and thus promoting the one next in line), and extending a conflict branch (to coordinate the update to the Versionstamp) This is also a small optimization for reads with ?revs=true or ?revs_info=true, where we want the details of the winning edit branch but don't actually know the RevPosition and RevHash of that branch.

Disadvantages

Historical revision identifiers shared by multiple edit branches are duplicated.

Key Changes

Administrators cannot set _revs_limit larger than 4,000 (previously unlimited?). Default stays the same at 1,000.

The intention with this data model is that an interactive edit that supplies a revision identifier of a deleted leaf will always fail with a conflict. This is a subtle departure from CouchDB 2.3 behavior, where an attempt to extend a deleted edit branch can succeed if some other deleted=false edit branch exists. This is an undocumented and seemingly unintentional behavior. If we need to match that behavior it will require reading 3 KVs in 2 roundtrips for every edit that we reject with a conflict.

Modules affected

TBD depending on exact code layout going forward, but the couch_key_tree module contains the current revision tree implementation.

HTTP API additions

None.

HTTP API deprecations

None.

Security Considerations

None have been identified.

References

Original mailing list discussion

Acknowledgements

Thanks to @iilyak, @davisp, @janl, @garrensmith and @rnewson for comments on the mailing list discussion.

The text was updated successfully, but these errors were encountered:

kocolosk · 2019-02-28T21:25:53Z

If one were inclined to save a byte it would be easy to combine NotDeleted and RevFormat into a single byte that still sorts optimally.

rnewson · 2019-02-28T22:31:45Z

a) I love it. b) do we really need to let revs limit go above 1000? In fact, should we allow this to be edited at all?

I note that we didn't finish the RFC thread but the last comment was to add a "security considerations" section, so if you would add that it's appreciated. I suspect it will be short.

kocolosk · 2019-02-28T22:42:02Z

Thanks! I added the Security Considerations section, and tweaked the Advantages section to replace a redundant description of the write path behavior with a description of the read path.

wohali · 2019-03-01T04:16:46Z

Sorry about that - the RFC PR is going to be merged shortly: #1914

kocolosk · 2019-03-01T20:33:56Z

@davisp pointed out something quite important in a discussion on IRC that I want to capture here. The _changes feed has the property that each document shows up exactly once regardless of the number of edit branches. The current design in the RFC isn’t quite compatible with that requirement, because we’re saying each edit branch can be extended independently without paying attention to the others.

We talked about a way to address this, which I’ll leave first in the comment here. We use the fact that a new edit branch can only be created by a new_edits=false write, where we anticipate retrieving the entire set of branch KVs. The writer in that case should store VersionStampForCurrentRev only in the “winning” edit branch and set the other ones to null.

If a writer comes in and tries to extend a losing edit branch, it will find the VersionstampForCurrentRev to be null and will do an additional edit branch read to retrieve the winning branch. It can then compare both branches to see which one will be the winner following that edit, and can assign VersionStampForCurrentRev accordingly.

A writer attempting to delete the winning branch (i.e., setting NotDeleted to 0) will need to read two contiguous KVs, the one for the winner and the one right before it. If the branch before it will be the winner following the deletion then we move the VersionstampForCurrentRev to it accordingly. If the tombstones branch remains the winner for this document then we only update that branch.

A writer extending the winning branch with an updated document (the common case) will proceed as before with no loss in efficiency.

I’ve tried to think through all possible concurrency issues here but I think that FoundationDB’s transaction isolation model delivers the goods in every case.

As far as the data model is concerned, the only change is that the Versionstamp is stored exclusively in the winning edit branch KV at all times, with all other branches having a null byte there instead. Also we should rename the field as it’s really VersionstampOfLatestEdit, regardless of whether that corresponds to the revision in the same KV.

kocolosk · 2019-03-05T19:02:27Z

I updated the text of the proposal to incorporate the details from my last comment.

Separately, @wohali pointed me to an old bug report in #1418 which has some bearing here. It seems that we are currently rather inconsistent in how we handle attempts to extend a tombstoned edit branch. If every branch in the document is tombstoned, we reject edit attempts that specify an explicit _rev with a conflict. But if at least one branch is deleted=false, we will allow other tombstoned branches to be extended. This is inconsistent and seemingly unintentional behavior.

The proposed data model is most efficient if we block all explicit updates to tombstoned edit branches (i.e., updates that supply the _rev of the tombstone as a base). If we do need to support the current CouchDB behavior, it will require attempting to read 3 KVs for every edit that gets rejected with a conflict: the deleted=false KV for that revision, the deleted=true KV for that revision, and the last KV for the document (to see if all branches are deleted=true). We would likely do this in 2 roundtrips to avoid paying the extra cost on every edit. It's doable, but I think we may want to take this opportunity to make our API more consistent. I've updated the proposal to point us in that direction.

Digging even deeper, folks should be aware that when users create a new document (with no _rev) over top of a document where all branches are tombstoned, current CouchDB will internally extend the "winning" tombstoned branch with the new edit. I believe this was intended solely as a performance optimization, to avoid creating a too-wide revision tree, but as #1418 makes clear this optimization has real user-visible implications in the API. The data model in this RFC naturally supports the same optimization. We can have a detailed discussion of what we think the right behavior ought to be going forward, but I don't think it should hold up the decision on the data model.

kocolosk · 2019-03-05T21:41:58Z

Some discussion of whether it makes more sense to model RFCs as PRs against the documentation repo. Merits to both options. I filed the PR version of this issue at apache/couchdb-documentation#397

rnewson · 2019-03-05T21:52:58Z

general agreement on couchdb-dev (IRC) to use the PR approach as it allows clarifying commits, reviews and a trail of approval.

kocolosk added discussion rfc labels Feb 28, 2019

rnewson closed this as completed Mar 5, 2019

kocolosk mentioned this issue Mar 5, 2019

BUG: Recreating deleted document can break replication when VDU function is active #1418

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data model for storing revision history in FoundationDB #1957

Data model for storing revision history in FoundationDB #1957

kocolosk commented Feb 28, 2019 •

edited

Loading

kocolosk commented Feb 28, 2019

rnewson commented Feb 28, 2019

kocolosk commented Feb 28, 2019

wohali commented Mar 1, 2019

kocolosk commented Mar 1, 2019

kocolosk commented Mar 5, 2019 •

edited

Loading

kocolosk commented Mar 5, 2019

rnewson commented Mar 5, 2019

Data model for storing revision history in FoundationDB #1957

Data model for storing revision history in FoundationDB #1957

Comments

kocolosk commented Feb 28, 2019 • edited Loading

Introduction

Abstract

Requirements Language

Terminology

Detailed Description

Limits

Update Path

Advantages

Disadvantages

Key Changes

Modules affected

HTTP API additions

HTTP API deprecations

Security Considerations

References

Acknowledgements

kocolosk commented Feb 28, 2019

rnewson commented Feb 28, 2019

kocolosk commented Feb 28, 2019

wohali commented Mar 1, 2019

kocolosk commented Mar 1, 2019

kocolosk commented Mar 5, 2019 • edited Loading

kocolosk commented Mar 5, 2019

rnewson commented Mar 5, 2019

kocolosk commented Feb 28, 2019 •

edited

Loading

kocolosk commented Mar 5, 2019 •

edited

Loading