Skip to content
This repository was archived by the owner on Oct 17, 2022. It is now read-only.

Conversation

kocolosk
Copy link
Member

Overview

Description of implementing _all_docs and GET /dbname info in FDB

Checklist

  • Documentation is written and is accurate;
  • make check passes with no errors
  • Update rebar.config.script with the commit hash once this PR is rebased and merged

term representation of the JSON. This is a somewhat awkward choice as the
internal Erlang term representation is liable to change over time (e.g. with the
introduction of Maps in newer Erlang releases, or plausibly even a JSON decoder
that directly emits the format defined in the document storage RFC).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed recently: apache/couchdb#1983

@rnewson
Copy link
Member

rnewson commented Apr 18, 2019

Firstly, I feel we can deprecate or change anything in the db info blob as we see fit. We are planning a major version bump, 3.0, to signal the path forward, and a further major version bump, 4.0, which is when the foundationdb backplane appears.

The paragraphs on the _all_docs subspace are clear to me as are the rules on incrementing and decrementing doc_count and doc_del_count. Both of those values retain meaning to the user, given that tombstones will still exist.

It would be good if purge_seq were to vanish, replaced by atomic removal of all trace of a document including secondary indexes, but that does seem unlikely (or, at least, it makes third party indexing difficult to impossible). Purge is a large topic of its own and so I won't say more here.

On sizes, the new couch_ejson_size:encoded_size(Body) calculation should be the same value as today and worth preserving. Beyond that, given the absence of compaction, is there any need for other sizes? The only other obvious one that we can calculate is the sum of the lengths of all the keys and values used to store the database (that is, including all the overhead of the exploded form and the metadata). I think that's doable with atomic operations but, I'm imagining, involves housekeeping in many places. We need to decide what these values are for first, what decisions do they help users make? external exists, largely, for Cloudant, as it represents the data as the user understands it, and Cloudant charges by the GB for that. It is obviously also useful to other users but I suspect most users would only care for the file value, in order to direct compaction efforts or other cleanup (including deleting databases entirely). The distinction between active data and trash is eliminated by FoundationDB and our choice of update semantic (the old revision is atomically removed when the new revision supercedes it), so we should drop that item.

For r/w/n/q I feel strongly we should drop them. They are intimately coupled to the current architecture and I think it would be misleading to carry these values across. The configuration of the FoundationDB cluster can take many forms and can change at runtime; we should cleanly decouple ourselves from it. I don't think we should pass-through any foundationdb status or config either, though our documentation / release notes for 4.0 should include a user guide for the basics. Others might disagree on how much handholding we should be doing. Are we, in effect, embedding FoundationDB or are we a layer above it? As code, it's emphatically the latter, but a user could legitimately hold either perspective. This matters for this RFC as its the deciding factor on the scope of the "info" in /dbname and elsewhere.

+1 on removing total_rows, offset, compact_running, other, data_size and disk_size but I note that the replicator looks for instance_start_time. We can enhance the replicator in 3.0 to no longer do so and then state that 4.0 cannot replicate to < 3.0 systems, but perhaps it wouldn't hurt to include the field either hardcoded to 0 (as we've done from 2.0 onward for compatibility reasons) or set it to the start time of the fdb cluster if that's cheaply available.

Copy link
Member

@janl janl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 mostly to what Bob says modulo my question about offset and total_rows.

Also:

Beyond that, given the absence of compaction, is there any need for other sizes?

I think if a user-need arises for getting anything but the Erlang-term data size (e.g. the exploded docs as @rnewson posits), we can think about adding this later, but I’d leave this and any disk-level data sizes out.

I can’t yet tell how much introspection we get from FDB, but I assume there is enough in there to make storage ops decisions.


## HTTP API deprecations

The `total_rows` and `offset` fields are removed from the response to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we dropping those and if we do, how do we tell users to “iterate over all docs in a db”? I realise the semantics of this are going to change with the 5s transaction limit, but users need at least guidance on how to do this, and what to expect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thinking was to drop offset because there's a fair bit of overhead to compute it (FDB doesn't maintain counts of keys within in a range by default), and it seemed to be of limited value. I thought we had gotten to the point where the stock advice on paging was to use start_key / end_key and limit. And then I just went ahead and dropped total_rows to avoid the extra lookup and for parity with _changes.

If there's an argument to keep total_rows that's a pretty cheap thing to include.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, the current implementation appears to include total_rows. It also includes an offset field set to null


* `compact_running`

* `disk_format_version`: this is a tricky one. We define "format versions" for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, but would choosing smart values for format versions and ORing them into a chmod-style bitfield work?

If that’s impractical, do we expect format definitions to mix-and-match over time, or could we define “the set of per-let format versions XYZ is 1, XZZ is 2” and so on. This would be close to what we have now where we make arbitrary changes to the db header record and each change gets an inc by one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would devolve to "CouchDB version 4.x.y writes data using on-disk format version 12"; i.e., it'd be a property of the running CouchDB code rather than a property of a database. Individual databases are just portions of a global key space, and we don't have any background process that's sitting down and rewriting an entire database into the latest format, so at first glance we'll need to support full schema evolution for each data type we're storing in FDB.

atomic operations, and a record of the size of each revision in the ?REVISIONS
subpsace so that a transaction can compute the delta for each document.

### Clustering
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @rnewson that we should drop these.

@wohali wohali marked this pull request as ready for review August 12, 2020 00:42
@wohali wohali merged commit e692ef4 into master Aug 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants