-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement flattened doc storage #2539
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Most of these tests are for quorum and clustered response handling which will no longer exist with FoundationDB. Eventually we'll want to go through these and pick out anything that is still applicable and ensure that we re-add them to the new test suite.
This provides a base implementation of a fabric API backed by FoundationDB. While a lot of functionality is provided there are a number of places that still require work. An incomplete list includes: 1. Document bodies are currently a single key/value 2. Attachments are stored as a range of key/value pairs 3. There is no support for indexing 4. Request size limits are not enforced directly 5. Auth is still backed by a legacy CouchDB database 6. No support for before_doc_update/after_doc_read 7. Various implementation shortcuts need to be expanded for full API support.
This provides a good bit of code coverage for the new implementation. We'll want to expand this to include relevant tests from the previous fabric test suite along with reading through the various other tests and ensuring that we cover the API as deeply as is appropriate for this layer.
This is not an exhaustive port of the entire chttpd API. However, this is enough to support basic CRUD operations far enough that replication works.
This still holds all attachment data in RAM which we'll have to revisit at some point.
When uploading an attachment we hadn't yet flushed data to FoundationDB which caused the md5 to be empty. The `new_revid` algorithm then declared that was because it was an old style attachment and thus our new revision would be a random number. This fix just flushes our attachments earlier in the process of updating a document.
I was accidentally skipping this step around properly serializing/deserializing attachments. Note to self: If someon specifies attachment headers this will likely break when we attempt to pack the value tuple here.
The older chttpd/fabric split configured filters as one step in the coordinator instead of within each RPC worker.
This fixes the behavior when validating a document update that is recreating a previously deleted document. Before this fix we were sending a document body with `"_deleted":true` as the existing document. However, CouchDB behavior expects the previous document passed to VDU's to be `null` in this case.
This was a remnant before we used a version per database.
This changes `chttpd_auth_cache` to use FoundationDB to back the `_users` database including the `before_doc_update` and `after_doc_read` features.
RFC: apache/couchdb-documentation#409 Main API is in the `couch_jobs` module. Additional description of internals is in the README.md file.
Neither partitioned databases or shard splitting will exist in a FoundationDB layer.
This adds the mapping of CouchDB start/end keys and so on to the similar yet slightly different concepts in FoundationDB. The handlers for `_all_dbs` and `_all_docs` have been udpated to use this new logic.
The existing logic around return codes and term formats is labyrinthine. This is the result of much trial and error to get the new logic to behave exactly the same as the previous implementation.
Simple function change to `fabric2_db:name/1`
Previously I was forgetting to keep the previous history around which ended up limiting the revision depth to two.
The old test got around this by using couch_httpd_auth cache in its tests which is fairly odd given that we run chttpd_auth_cache in production. This fixes that mistake and upgrades chttpd_auth_cache so that it works in the test scenario of changing the authentication_db configuration.
This API allows for listing all database info blobs in a single request. It accepts the same parameters as `_all_dbs` for controlling pagination of results and so on.
Previously only `POST` with a list of keys was supported. The new `GET` support just dumps all database info blobs in a single ordered response.
Previously changes feeds would fail if they streamed data for more than five seconds. This was because of the FoundationDB's transaction time limit. After the timeout fired, an 1007 (transaction_too_long) error was raised, and transaction was retried. The emitted changes feed would often crash or simple hang because the HTTP state would be garbled as response data was re-sent over the same socket stream again. To fix the issue introduce a new `{restart_tx, true}` option for `fold_range/4`. This option sets up a new transaction to continue iterating over the range from where the last one left off. To avoid data being resent in the response stream, user callback functions must first read all the data they plan on sending during that callback, send it out, and then after that it must not do any more db reads so as not to trigger a `transaction_too_old` error.
Index builder performs writes in the same transaction as the changes feed so we can't use iterators as they disable writes.
I accidentally ported part of the old couch_att test suite into an actual "feature" that's not actually accessible through any API.
This tracks the number of bytes that would be required to store the contents of a database as flat files on disk. Currently the following items are tracked: * Doc ids * Revisions * Doc body as JSON * Attachment names * Attachment type * Attachment length * Attachment md5s * Attachment headers * Local doc id * Local doc revision * Local doc bodies
Versionstamp sequences should always be binaries when retrieved from a rev info map.
Previously each doc was read in a separate transaction. It turns out that size limits do not apply to read-only transactions so we don't have to worry about that here. Also transaction restart are already implemented so we don't have to worry about timeout either.
We already handle them in couch_jobs_type_monitor so let's do it in `couch_jobs:wait_pending` as well. Recent fixes in FDB 6.2 didn't completely fix the issue and ther are still spurious 1009 errors dumped in the logs. They seem to be benign as far as couch_jobs operation goes as type monitor code already showed, so let's not pollute the logs with them.
Previously, if the metadata key is bumped in a transaction, the same transaction could not be used to add jobs with `couch_jobs`. That's because metadata is a versionstamped value, and when set, it cannot be read back until that transaction has committed. In `fabric2_fdb` there is a process dict key that is set which declares that metadata was already read, which happens before any db update, however `couch_jobs` uses it's own caching mechanism and doesn't know about that pdict key. Ideally we'd implement a single `couch_fdb` module to be shared between `couch_jobs` and `fabric2_db` but until then it maybe simpler to just let `couch_jobs` use its own metadata key. This way, it doesn't get invalidated or bumped every time dbs get recreated or design docs are updated. The only time it would be bumped is if the FDB layer prefix changed at runtime.
It's possible for other couch_epi plugins to interfere with this test, so mock `couch_epi:decide/5` to always return `no_decision`.
We started to emit that in CouchDB 4.x for temporary views and possibly other endpoints.
eiri
force-pushed
the
prototype/flattened-doc-storage
branch
from
February 21, 2020 17:53
1c97a07
to
ae3d86c
Compare
eiri
force-pushed
the
prototype/flattened-doc-storage
branch
from
February 21, 2020 18:45
ae3d86c
to
26097d6
Compare
davisp
force-pushed
the
prototype/fdb-layer
branch
from
March 2, 2020 22:53
b3bd36b
to
bdd0578
Compare
Closing as obsolete. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Implementation of "flattened" doc storage format to Document Storage RFC
Testing recommendations