-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype/fdb layer final rebase #3137
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Use `fabric2_db:is_clustered/1` instead of `couch_db:is_clustered`
Seeing log errors like the following: CRASH REPORT Process epep_fdb_decision_cache (<0.633.0>) with 0 neighbors exited with reason: {bad_info,{#Ref<0.2506675824.3127640065.49278>,ready}} at gen_server:handle_common_reply/8(line:726) <= proc_lib:init_p_do_apply/3(line:247); initial_call: {couch_expiring_cache_server,init,['Argument__1']}, ancestors: [epep_sup,<0.596.0>], message_queue_len: 0, messages: [], links: [<0.614.0>], dictionary: [{rand_seed,{#{bits => 58,jump => #Fun<rand.8.15449617>,next => #Fun<..>,...},...}},...], trap_exit: false, status: running, heap_size: 2586, stack_size: 27, reductions: 7493102 This should handle those errors, and prevent the crashes.
Previously those endpoints would break when transactions time-out and are retried. To fix it we re-use the mechanism from changes feeds. There is a longer discussion about this on the mailing list: https://lists.apache.org/thread.html/r02cee7045cac4722e1682bb69ba0ec791f5cce025597d0099fb34033%40%3Cdev.couchdb.apache.org%3E
* There was a good amount of duplication between `_db_crud_tests` and `_changes_fold_tests`, so make a common test utility module so both suites can use. * Clean up test names. Previously some were named `tx_too_long` but since the official FDB error is `transaction_too_old` rename them to match a bit better. * `list_dbs_info` implementation queue of 100 futures to parallelize fetching. So its test was update to create more than 100 dbs. Creating 100 dbs took about 3 seconds so add a small parallel map (pmap) utility function to help with that.
* Made a silly error building the accumulator list: `[Row, Acc]` -> `[Row | Acc]` * Left some debugging code in `list_dbs_tx_too_old` test The test was supposed to setup only 1 failure in the user callback
`list_dbs_info/3` maintains a queue of up to 100 futures which are used to concurrently fetch data. Previously, if the transaction was reset, and the accumulator inside the fold may have had futures from a previous transaction, which have gotten their results yet, they threw a transaction_canceled (1025) error. To fix this, if we're in a read-only transaction, we return the tx object in the opaque db info record. Then, if `erlfdb:wait/1` throws a transaction canceled error, we re-fetch the future from the now restarted transaction. Potentially, the transaction may also time-out while the futures queues is drained after the main range fold has finished already. Handle that case by reseting the transaction and then re-fetching the futures. To avoid an infinite loop we allow up to 2 retries only. This approach is not the most optimal but simpler as it hides the complexity inside the fabric2_fdb module where we already handle these conditions. It means that every 5 or so seconds we might have to refetch less than 100 extra futures from the queue (as some or all may have gotten their results back already).
Previously transactions could time-out, retry and re-emit the same data. Use the same mechanism as the _list_dbs and _changes feeds to fix it. Additional detail in the mailing list discussion: https://lists.apache.org/thread.html/r02cee7045cac4722e1682bb69ba0ec791f5cce025597d0099fb34033%40%3Cdev.couchdb.apache.org%3E
* Use the `row/3` helper function in a few more places * Make a `run_query/3` function to shorten all the query calls * A few minor emilio suggesions (whitespace, comma issues, ...)
Previously it was possible for a database to be re-created while a `Db` handle was open and the `Db` handle would continue operating on the new db without any error. To avoid that situation ensure instance UUID is explicitly checked during open and reopen calls. This includes checking it after the metadata is loaded in `fabric2_fdb:open/2` and when fetching the handle from the cache. Also, create a `{uuid, UUID}` option to allow specifying a particular instance UUID when opening a database. If that instance doesn't exist raise a `database_does_not_exist` error.
Check that on a transaction restarts `database_does_not_exist` error is thrown properly if database was re-created. Also we forgot to properly unload the mocked erlfdb module in `tx_too_old_mock_erlfdb/0` so we make sure to do that, otherwise it has a chance of messing up subsequent tests.
The e520294 commit inadvertently changed the `fabric2_fdb:refresh/1` head matching to accept db instances with no names. `couch_jobs` uses those but they are not cached in fabric2_server. So here we return to the previous matching rule where contexts without names don't get refreshed from the cache.
* add info endpoint This commit adds the info endpoint for design docs stored in fdb.
Previously we are using the DbName to set DbPrefix for clarity. In order to support soft-deletion while providing efficient value for DbPrefix allocation, we use value allocated with erlfdb_hca for DbPrefix.
This is a more efficient method to get all of the design documents than relying on fabric2_db:fold_docs which doesn't load doc bodies in parallel.
* Avoid a cause clause error in after 0 when the database is deleted * Handle db re-creation by checking the instance UUID during fabric2_db:open/2 Since we added a few extra arguments switch to use a map as the State
Add the db instance id to indexing job data. During indexing ensure the database is opened with the `{uuid, DbUUID}` option. After that any stale db reads in `update/3` will throw the `database_does_not_exist` error. In addition, when the indexing job is re-submitted in `build_view_async/2`, check if it contains a reference to an old db instance id and replace the job. That has to happen since couch_jobs doesn't overwrite job data for running jobs.
After the recent upgrade to using HCA we forgot to check all the places where the db prefix was constructed so a few places still used the old pattern of {?DBS, DbName}. In the case of `check_metadata_version` we also have to account for the fact that during db creation, there might not be a db_prefix in the `Db` handle yet.
Previously we didn't reset the metadata flag in case of a transaction retry so we could have used a stale `?PDICT_CHECKED_MD_IS_CURRENT = true` value.
For use by the native couchdb at-rest encryption feature. * From NIST Special Publication 800-38F.
Currently we return a 500 but a 400 return code makes more sense ``` $ http $DB1/db1/_changes?since=0-1345 HTTP/1.1 400 Bad Request { "error": "invalid_since_seq", "reason": "0-1345", "ref": 442671026 } ```
Removes the following features from the welcome message: - reshard - partitioned - pluggable-storage-engines - scheduler Although `scheduler` at least will presumably be returned once that feature is complete.
Previously, if a database was re-created on another node, a request with that database might have found the previous db instance in the cache. In that case it would have correctly reopened the db while in a transaction, but, because the old db instance was deleted it would throw a database_does_not_exist which was not the correct behavior. To prevent that from happening, introduce an interactive = true|false option when opening a database. User requests may specify that option and then when the db is re-opened, it will allow it to automatically upgrade to the new db instance instead returning an error. Background processes will still get a database_doest_not_exist error if they keep a db open which has now been re-created. The interactive option may also be used in the future to set other transaction parameters like timeouts and retries that might be different for interactive requests vs background tasks.
add local_seq option to views
Fixes the case where no writes are done for an index, the rater limiter assumed it was a failure.
Any error there would just be generating a case clause. Remove the `{not_found, missing}` clause since it was accidentally matching on the Rev string and the case was included in the `_Else` clause anyway.
During job removal, it was not cleared from the active area so active_tasks would mistakenly believe the job still existed. When we try to actually open the data it is not there and not_found error would be issued.@nickva found this issue during replication work.
The pagination relied on id of the document. However for views it should use combination of key and id.
Empty maps maybe useful to initialize the data in some cases but we don't want to emit an entry in the output with just an empty map. While at it, add some tests to check the basics.
This makes it compatible with CouchDB <= 3.x where we can create deleted documents. How to check: ``` $ http put $DB1/mydb $ http put $DB1/mydb/foo _deleted:='true' a=b { "id": "foo", "ok": true, "rev": "1-ad7eb689fcae75e7a7edb57dc1f30939" } ```
This allows for batch insertion of keys in order to minimize node serialization and collation costs.
This allows looking up multiple keys simultaneously which reduces the amount of overhead due to node serialization and collation.
Inner nodes of the B+Tree are now immutable so that they can be cached.
Using lists:umerge/3 adds extra invocations of the collation algorithm because its using `=<` semantics when ebtree collations are capable of producing `lt, eq, gt` results.
This keeps validation during tests but disables the validation during production to avoid the overhead of collation.
The comparison that's most important is probably against |
Turns out a merge is much cleaner: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This is
prototype/fdb-layer
rebased ontomaster
. Lets merge it already.Testing recommendations
make check
Checklist
rel/overlay/etc/default.ini