Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype/fdb layer final rebase #3137

Closed
wants to merge 401 commits into from
Closed

Conversation

davisp
Copy link
Member

@davisp davisp commented Sep 9, 2020

Overview

This is prototype/fdb-layer rebased onto master. Lets merge it already.

Testing recommendations

make check

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

nickva and others added 30 commits September 9, 2020 09:36
Use `fabric2_db:is_clustered/1` instead of `couch_db:is_clustered`
Seeing log errors like the following:

CRASH REPORT Process epep_fdb_decision_cache (<0.633.0>) with 0 neighbors exited with reason: {bad_info,{#Ref<0.2506675824.3127640065.49278>,ready}} at gen_server:handle_common_reply/8(line:726) <= proc_lib:init_p_do_apply/3(line:247); initial_call: {couch_expiring_cache_server,init,['Argument__1']}, ancestors: [epep_sup,<0.596.0>], message_queue_len: 0, messages: [], links: [<0.614.0>], dictionary: [{rand_seed,{#{bits => 58,jump => #Fun<rand.8.15449617>,next => #Fun<..>,...},...}},...], trap_exit: false, status: running, heap_size: 2586, stack_size: 27, reductions: 7493102

This should handle those errors, and prevent the crashes.
Previously those endpoints would break when transactions time-out and are
retried. To fix it we re-use the mechanism from changes feeds.

There is a longer discussion about this on the mailing list:

https://lists.apache.org/thread.html/r02cee7045cac4722e1682bb69ba0ec791f5cce025597d0099fb34033%40%3Cdev.couchdb.apache.org%3E
* There was a good amount of duplication between `_db_crud_tests` and
 `_changes_fold_tests`, so make a common test utility module so both suites can
 use.

* Clean up test names. Previously some were named `tx_too_long` but since the
  official FDB error is `transaction_too_old` rename them to match a bit
  better.

* `list_dbs_info` implementation queue of 100 futures to parallelize fetching.
  So its test was update to create more than 100 dbs. Creating 100 dbs took
  about 3 seconds so add a small parallel map (pmap) utility function to help
  with that.
 * Made a silly error building the accumulator list:
   `[Row, Acc]` -> `[Row | Acc]`

 * Left some debugging code in `list_dbs_tx_too_old` test
   The test was supposed to setup only 1 failure in the user callback
`list_dbs_info/3` maintains a queue of up to 100 futures which are used to
concurrently fetch data. Previously, if the transaction was reset, and the
accumulator inside the fold may have had futures from a previous transaction,
which have gotten their results yet, they threw a transaction_canceled (1025)
error.

To fix this, if we're in a read-only transaction, we return the tx object in
the opaque db info record. Then, if `erlfdb:wait/1` throws a transaction
canceled error, we re-fetch the future from the now restarted transaction.

Potentially, the transaction may also time-out while the futures queues is
drained after the main range fold has finished already. Handle that case by
reseting the transaction and then re-fetching the futures. To avoid an infinite
loop we allow up to 2 retries only.

This approach is not the most optimal but simpler as it hides the complexity
inside the fabric2_fdb module where we already handle these conditions. It
means that every 5 or so seconds we might have to refetch less than 100 extra
futures from the queue (as some or all may have gotten their results back
already).
Previously transactions could time-out, retry and re-emit the same data. Use
the same mechanism as the _list_dbs and _changes feeds to fix it. Additional
detail in the mailing list discussion:

https://lists.apache.org/thread.html/r02cee7045cac4722e1682bb69ba0ec791f5cce025597d0099fb34033%40%3Cdev.couchdb.apache.org%3E
 * Use the `row/3` helper function in a few more places

 * Make a `run_query/3` function to shorten all the query calls

 * A few minor emilio suggesions (whitespace, comma issues, ...)
Previously it was possible for a database to be re-created while a `Db` handle
was open and the `Db` handle would continue operating on the new db without any
error.

To avoid that situation ensure instance UUID is explicitly checked during open
and reopen calls. This includes checking it after the metadata is loaded in
`fabric2_fdb:open/2` and when fetching the handle from the cache.

Also, create a `{uuid, UUID}` option to allow specifying a particular instance
UUID when opening a database. If that instance doesn't exist raise a
`database_does_not_exist` error.
Check that on a transaction restarts `database_does_not_exist` error is
thrown properly if database was re-created.

Also we forgot to properly unload the mocked erlfdb module in
`tx_too_old_mock_erlfdb/0` so we make sure to do that, otherwise it
has a chance of messing up subsequent tests.
The e520294 commit inadvertently
changed the `fabric2_fdb:refresh/1` head matching to accept db
instances with no names. `couch_jobs` uses those but they are not
cached in fabric2_server. So here we return to the previous matching
rule where contexts without names don't get refreshed from the cache.
* add info endpoint

This commit adds the info endpoint for design docs stored in fdb.
Previously we are using the DbName to set DbPrefix for clarity. In order
to support soft-deletion while providing efficient value for DbPrefix
allocation, we use value allocated with erlfdb_hca for DbPrefix.
This is a more efficient method to get all of the design documents than
relying on fabric2_db:fold_docs which doesn't load doc bodies in
parallel.
 * Avoid a cause clause error in after 0 when the database is deleted

 * Handle db re-creation by checking the instance UUID during fabric2_db:open/2

Since we added a few extra arguments switch to use a map as the State
Add the db instance id to indexing job data. During indexing ensure the
database is opened with the `{uuid, DbUUID}` option. After that any stale db
reads in `update/3` will throw the `database_does_not_exist` error.

In addition, when the indexing job is re-submitted in `build_view_async/2`,
check if it contains a reference to an old db instance id and replace the job.
That has to happen since couch_jobs doesn't overwrite job data for running
jobs.
After the recent upgrade to using HCA we forgot to check all the places where
the db prefix was constructed so a few places still used the old pattern of
{?DBS, DbName}.

In the case of `check_metadata_version` we also have to account for the fact
that during db creation, there might not be a db_prefix in the `Db` handle yet.
Previously we didn't reset the metadata flag in case of a transaction retry so
we could have used a stale `?PDICT_CHECKED_MD_IS_CURRENT = true` value.
For use by the native couchdb at-rest encryption feature.

* From NIST Special Publication 800-38F.
Currently we return a 500 but a 400 return code makes more sense

```
$
http  $DB1/db1/_changes?since=0-1345

HTTP/1.1 400 Bad Request

{
    "error": "invalid_since_seq",
    "reason": "0-1345",
    "ref": 442671026
}
```
Removes the following features from the welcome message:
- reshard
- partitioned
- pluggable-storage-engines
- scheduler

Although `scheduler` at least will presumably be returned once that
feature is complete.
nickva and others added 25 commits September 9, 2020 09:44
Previously, if a database was re-created on another node, a request with that
database might have found the previous db instance in the cache. In that case
it would have correctly reopened the db while in a transaction, but, because
the old db instance was deleted it would throw a database_does_not_exist which
was not the correct behavior.

To prevent that from happening, introduce an interactive = true|false option
when opening a database. User requests may specify that option and then when
the db is re-opened, it will allow it to automatically upgrade to the new db
instance instead returning an error.

Background processes will still get a database_doest_not_exist error if they
keep a db open which has now been re-created.

The interactive option may also be used in the future to set other transaction
parameters like timeouts and retries that might be different for interactive
requests vs background tasks.
 add local_seq option to views
Fixes the case where no writes are done for an index, the rater limiter
assumed it was a failure.
Any error there would just be generating a case clause.

Remove the `{not_found, missing}` clause since it was accidentally matching on
the Rev string and the case was included in the `_Else` clause anyway.
During job removal, it was not cleared from the active area so
active_tasks would mistakenly believe the job still existed. When we
try to actually open the data it is not there and not_found error
would be issued.@nickva found this issue during replication work.
The pagination relied on id of the document. However for views it should use
combination of key and id.
Empty maps maybe useful to initialize the data in some cases but we don't want
to emit an entry in the output with just an empty map.

While at it, add some tests to check the basics.
This makes it compatible with CouchDB <= 3.x where we can create deleted
documents.

How to check:

```
$ http put $DB1/mydb
$ http put $DB1/mydb/foo _deleted:='true' a=b
{
    "id": "foo",
    "ok": true,
    "rev": "1-ad7eb689fcae75e7a7edb57dc1f30939"
}
```
This allows for batch insertion of keys in order to minimize node
serialization and collation costs.
This allows looking up multiple keys simultaneously which reduces the
amount of overhead due to node serialization and collation.
Inner nodes of the B+Tree are now immutable so that they can be cached.
Using lists:umerge/3 adds extra invocations of the collation algorithm
because its using `=<` semantics when ebtree collations are capable of
producing `lt, eq, gt` results.
This keeps validation during tests but disables the validation during
production to avoid the overhead of collation.
@davisp
Copy link
Member Author

davisp commented Sep 9, 2020

The comparison that's most important is probably against prototype/fdb-layer which you can see here:

prototype/fdb-layer...prototype/fdb-layer-final-rebase

@davisp davisp mentioned this pull request Sep 9, 2020
4 tasks
@davisp
Copy link
Member Author

davisp commented Sep 9, 2020

Turns out a merge is much cleaner:

#3140

@davisp davisp closed this Sep 9, 2020
@wohali wohali deleted the prototype/fdb-layer-final-rebase branch October 21, 2020 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet