Prototype/fdb layer final rebase #3137

davisp · 2020-09-09T15:02:30Z

Overview

This is prototype/fdb-layer rebased onto master. Lets merge it already.

Testing recommendations

make check

Checklist

Code is written and works correctly
Changes are covered by tests
Any new configurable parameters are documented in rel/overlay/etc/default.ini
A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

Use `fabric2_db:is_clustered/1` instead of `couch_db:is_clustered`

Seeing log errors like the following: CRASH REPORT Process epep_fdb_decision_cache (<0.633.0>) with 0 neighbors exited with reason: {bad_info,{#Ref<0.2506675824.3127640065.49278>,ready}} at gen_server:handle_common_reply/8(line:726) <= proc_lib:init_p_do_apply/3(line:247); initial_call: {couch_expiring_cache_server,init,['Argument__1']}, ancestors: [epep_sup,<0.596.0>], message_queue_len: 0, messages: [], links: [<0.614.0>], dictionary: [{rand_seed,{#{bits => 58,jump => #Fun<rand.8.15449617>,next => #Fun<..>,...},...}},...], trap_exit: false, status: running, heap_size: 2586, stack_size: 27, reductions: 7493102 This should handle those errors, and prevent the crashes.

Previously those endpoints would break when transactions time-out and are retried. To fix it we re-use the mechanism from changes feeds. There is a longer discussion about this on the mailing list: https://lists.apache.org/thread.html/r02cee7045cac4722e1682bb69ba0ec791f5cce025597d0099fb34033%40%3Cdev.couchdb.apache.org%3E

* There was a good amount of duplication between `_db_crud_tests` and `_changes_fold_tests`, so make a common test utility module so both suites can use. * Clean up test names. Previously some were named `tx_too_long` but since the official FDB error is `transaction_too_old` rename them to match a bit better. * `list_dbs_info` implementation queue of 100 futures to parallelize fetching. So its test was update to create more than 100 dbs. Creating 100 dbs took about 3 seconds so add a small parallel map (pmap) utility function to help with that.

* Made a silly error building the accumulator list: `[Row, Acc]` -> `[Row | Acc]` * Left some debugging code in `list_dbs_tx_too_old` test The test was supposed to setup only 1 failure in the user callback

`list_dbs_info/3` maintains a queue of up to 100 futures which are used to concurrently fetch data. Previously, if the transaction was reset, and the accumulator inside the fold may have had futures from a previous transaction, which have gotten their results yet, they threw a transaction_canceled (1025) error. To fix this, if we're in a read-only transaction, we return the tx object in the opaque db info record. Then, if `erlfdb:wait/1` throws a transaction canceled error, we re-fetch the future from the now restarted transaction. Potentially, the transaction may also time-out while the futures queues is drained after the main range fold has finished already. Handle that case by reseting the transaction and then re-fetching the futures. To avoid an infinite loop we allow up to 2 retries only. This approach is not the most optimal but simpler as it hides the complexity inside the fabric2_fdb module where we already handle these conditions. It means that every 5 or so seconds we might have to refetch less than 100 extra futures from the queue (as some or all may have gotten their results back already).

Previously transactions could time-out, retry and re-emit the same data. Use the same mechanism as the _list_dbs and _changes feeds to fix it. Additional detail in the mailing list discussion: https://lists.apache.org/thread.html/r02cee7045cac4722e1682bb69ba0ec791f5cce025597d0099fb34033%40%3Cdev.couchdb.apache.org%3E

* Use the `row/3` helper function in a few more places * Make a `run_query/3` function to shorten all the query calls * A few minor emilio suggesions (whitespace, comma issues, ...)

Previously it was possible for a database to be re-created while a `Db` handle was open and the `Db` handle would continue operating on the new db without any error. To avoid that situation ensure instance UUID is explicitly checked during open and reopen calls. This includes checking it after the metadata is loaded in `fabric2_fdb:open/2` and when fetching the handle from the cache. Also, create a `{uuid, UUID}` option to allow specifying a particular instance UUID when opening a database. If that instance doesn't exist raise a `database_does_not_exist` error.

Check that on a transaction restarts `database_does_not_exist` error is thrown properly if database was re-created. Also we forgot to properly unload the mocked erlfdb module in `tx_too_old_mock_erlfdb/0` so we make sure to do that, otherwise it has a chance of messing up subsequent tests.

The e520294 commit inadvertently changed the `fabric2_fdb:refresh/1` head matching to accept db instances with no names. `couch_jobs` uses those but they are not cached in fabric2_server. So here we return to the previous matching rule where contexts without names don't get refreshed from the cache.

* add info endpoint This commit adds the info endpoint for design docs stored in fdb.

Previously we are using the DbName to set DbPrefix for clarity. In order to support soft-deletion while providing efficient value for DbPrefix allocation, we use value allocated with erlfdb_hca for DbPrefix.

This is a more efficient method to get all of the design documents than relying on fabric2_db:fold_docs which doesn't load doc bodies in parallel.

* Avoid a cause clause error in after 0 when the database is deleted * Handle db re-creation by checking the instance UUID during fabric2_db:open/2 Since we added a few extra arguments switch to use a map as the State

Add the db instance id to indexing job data. During indexing ensure the database is opened with the `{uuid, DbUUID}` option. After that any stale db reads in `update/3` will throw the `database_does_not_exist` error. In addition, when the indexing job is re-submitted in `build_view_async/2`, check if it contains a reference to an old db instance id and replace the job. That has to happen since couch_jobs doesn't overwrite job data for running jobs.

After the recent upgrade to using HCA we forgot to check all the places where the db prefix was constructed so a few places still used the old pattern of {?DBS, DbName}. In the case of `check_metadata_version` we also have to account for the fact that during db creation, there might not be a db_prefix in the `Db` handle yet.

Previously we didn't reset the metadata flag in case of a transaction retry so we could have used a stale `?PDICT_CHECKED_MD_IS_CURRENT = true` value.

For use by the native couchdb at-rest encryption feature. * From NIST Special Publication 800-38F.

Currently we return a 500 but a 400 return code makes more sense ``` $ http $DB1/db1/_changes?since=0-1345 HTTP/1.1 400 Bad Request { "error": "invalid_since_seq", "reason": "0-1345", "ref": 442671026 } ```

Removes the following features from the welcome message: - reshard - partitioned - pluggable-storage-engines - scheduler Although `scheduler` at least will presumably be returned once that feature is complete.

Previously, if a database was re-created on another node, a request with that database might have found the previous db instance in the cache. In that case it would have correctly reopened the db while in a transaction, but, because the old db instance was deleted it would throw a database_does_not_exist which was not the correct behavior. To prevent that from happening, introduce an interactive = true|false option when opening a database. User requests may specify that option and then when the db is re-opened, it will allow it to automatically upgrade to the new db instance instead returning an error. Background processes will still get a database_doest_not_exist error if they keep a db open which has now been re-created. The interactive option may also be used in the future to set other transaction parameters like timeouts and retries that might be different for interactive requests vs background tasks.

add local_seq option to views

Fixes the case where no writes are done for an index, the rater limiter assumed it was a failure.

Any error there would just be generating a case clause. Remove the `{not_found, missing}` clause since it was accidentally matching on the Rev string and the case was included in the `_Else` clause anyway.

@nickva

During job removal, it was not cleared from the active area so active_tasks would mistakenly believe the job still existed. When we try to actually open the data it is not there and not_found error would be issued.@nickva found this issue during replication work.

The pagination relied on id of the document. However for views it should use combination of key and id.

Empty maps maybe useful to initialize the data in some cases but we don't want to emit an entry in the output with just an empty map. While at it, add some tests to check the basics.

This makes it compatible with CouchDB <= 3.x where we can create deleted documents. How to check: ``` $ http put $DB1/mydb $ http put $DB1/mydb/foo _deleted:='true' a=b { "id": "foo", "ok": true, "rev": "1-ad7eb689fcae75e7a7edb57dc1f30939" } ```

This allows for batch insertion of keys in order to minimize node serialization and collation costs.

This allows looking up multiple keys simultaneously which reduces the amount of overhead due to node serialization and collation.

Inner nodes of the B+Tree are now immutable so that they can be cached.

Using lists:umerge/3 adds extra invocations of the collation algorithm because its using `=<` semantics when ebtree collations are capable of producing `lt, eq, gt` results.

This keeps validation during tests but disables the validation during production to avoid the overhead of collation.

davisp · 2020-09-09T15:13:01Z

The comparison that's most important is probably against prototype/fdb-layer which you can see here:

prototype/fdb-layer...prototype/fdb-layer-final-rebase

davisp · 2020-09-09T18:22:05Z

Turns out a merge is much cleaner:

#3140

nickva and others added 30 commits September 9, 2020 09:36

Fix _changes with a _view filter

03032ae

Use `fabric2_db:is_clustered/1` instead of `couch_db:is_clustered`

A few minor cleanups in fabric2_db_crud_tests

0e6bc89

* Made a silly error building the accumulator list: `[Row, Acc]` -> `[Row | Acc]` * Left some debugging code in `list_dbs_tx_too_old` test The test was supposed to setup only 1 failure in the user callback

add _conflicts field to docs for views

b3abe44

Reformat and simplify couch_views_indexer_test module

5c11a85

* Use the `row/3` helper function in a few more places * Make a `run_query/3` function to shorten all the query calls * A few minor emilio suggesions (whitespace, comma issues, ...)

Add additional get_doc spans

bc5c5ff

add info endpoint for fdb stored views (#2706)

8b4de18

* add info endpoint This commit adds the info endpoint for design docs stored in fdb.

set DbPrefix with value allocated with erlfdb_hca

0d48755

Previously we are using the DbName to set DbPrefix for clarity. In order to support soft-deletion while providing efficient value for DbPrefix allocation, we use value allocated with erlfdb_hca for DbPrefix.

Implement fabric2_db:get_design_docs/1

da99ca9

This is a more efficient method to get all of the design documents than relying on fabric2_db:fold_docs which doesn't load doc bodies in parallel.

Improve fabric2_events

bb79704

* Avoid a cause clause error in after 0 when the database is deleted * Handle db re-creation by checking the instance UUID during fabric2_db:open/2 Since we added a few extra arguments switch to use a map as the State

Make sure to clear db metadata flag before each transaction

7204f28

Previously we didn't reset the metadata flag in case of a transaction retry so we could have used a stale `?PDICT_CHECKED_MD_IS_CURRENT = true` value.

Implement AES KW algorithm*

cda053e

For use by the native couchdb at-rest encryption feature. * From NIST Special Publication 800-38F.

Return a 400 error code for an invalid update sequence

1b06c01

Currently we return a 500 but a 400 return code makes more sense ``` $ http $DB1/db1/_changes?since=0-1345 HTTP/1.1 400 Bad Request { "error": "invalid_since_seq", "reason": "0-1345", "ref": 442671026 } ```

Don't advertise unimplemented features

45a8440

Removes the following features from the welcome message: - reshard - partitioned - pluggable-storage-engines - scheduler Although `scheduler` at least will presumably be returned once that feature is complete.

change _all_docs to raw collation

03bdf7d

Switch erlfdb to the couchdb repo at tag v1.0.0

619f611

Use couch_rate application for couch_view

17efbfc

fix all_docs call to return row

4075ffc

move all_doc view options to fabric2_util

cdce3b7

add include_docs option to fold_docs

e2303f6

add fabric2 after_doc_write plugin

196fab2

nickva and others added 25 commits September 9, 2020 09:44

Optionally add a key manager application as a dependency

5154340

Validate the result from collate_fun

1f08f9f

add local_seq option to views (#3043)

d4a9ae4

add local_seq option to views

Export reduce/5

80d646c

Handle empty reduce batches

01b4c7c

Fix range scans over an empty tree

9498342

Speed up ebtree test suite without losing coverage

914bd58

Tighten expectation of members format by level

2ff6b86

extra tests

e1d1172

Pluggable persist_fun

f57713b

add has_failures to couch_rate_limiter (#3088)

735caae

Fixes the case where no writes are done for an index, the rater limiter assumed it was a failure.

Do not use (catch ...) in couch_views_reader:load_docs/4

99ab818

Any error there would just be generating a case clause. Remove the `{not_found, missing}` clause since it was accidentally matching on the Rev string and the case was included in the `_Else` clause anyway.

Don't log client disconnects

61f72b2

Clear sensitive flag at end of public api functions

2d86f93

Fix ordering of page_size based pagination for views

bcdf984

The pagination relied on id of the document. However for views it should use combination of key and id.

Handle empty maps in active_tasks data structure

8ca9e55

Empty maps maybe useful to initialize the data in some cases but we don't want to emit an entry in the output with just an empty map. While at it, add some tests to check the basics.

Allow creating new deleted documents

b8301cc

This makes it compatible with CouchDB <= 3.x where we can create deleted documents. How to check: ``` $ http put $DB1/mydb $ http put $DB1/mydb/foo _deleted:='true' a=b { "id": "foo", "ok": true, "rev": "1-ad7eb689fcae75e7a7edb57dc1f30939" } ```

Implement ebtree:insert_multi/3

073aa1a

This allows for batch insertion of keys in order to minimize node serialization and collation costs.

Implement ebtree:lookup_multi/3

f7d042c

This allows looking up multiple keys simultaneously which reduces the amount of overhead due to node serialization and collation.

Implement caching of immutable ebtree nodes

450c74a

Inner nodes of the B+Tree are now immutable so that they can be cached.

Optimize umerge_members

5eb9be0

Using lists:umerge/3 adds extra invocations of the collation algorithm because its using `=<` semantics when ebtree collations are capable of producing `lt, eq, gt` results.

Disable node validation in production

0ccd2f9

This keeps validation during tests but disables the validation during production to avoid the overhead of collation.

Make COPY doc return only one "ok"

7ee3cb3

davisp mentioned this pull request Sep 9, 2020

Prototype/fdb layer final merge #3140

Closed

4 tasks

davisp closed this Sep 9, 2020

wohali deleted the prototype/fdb-layer-final-rebase branch October 21, 2020 19:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype/fdb layer final rebase #3137

Prototype/fdb layer final rebase #3137

davisp commented Sep 9, 2020

davisp commented Sep 9, 2020

davisp commented Sep 9, 2020

Prototype/fdb layer final rebase #3137

Prototype/fdb layer final rebase #3137

Conversation

davisp commented Sep 9, 2020

Overview

Testing recommendations

Checklist

davisp commented Sep 9, 2020

davisp commented Sep 9, 2020