No DB shards could be opened - req_err(1577439538) #3733

nickva · 2021-09-08T22:52:30Z

Discussed in #3573

^{Originally posted by BrodaUa May 19, 2021}

Description

I try to build a messaging app with CouchDB +NodeJS using nano lib. For that I try to create a large number of user-message dbs (up to 10k) and start listening changes feed of each db to track new message and reroute it accordingly.
Currently, when I try to create 10k of dbs and subscribe to the I get No DB shards could be opened - please, see attached error log. This issue linked to the ticket in couchdb-nano repo: apache/couchdb-nano#267 . In short, before I had a performance issue that nano was picking up last updates very slowly. Following the advice to increase maxSockets of http agent, now I am stuck with No DB shards could be opened issue.

I tried to increase the file descriptors value of the docker container to 128k, but still have the issue.

Steps to Reproduce

create 10k databases
try to subscribe to changes feed

Expected Behaviour

no errors

Your Environment

CouchDB version used: couchdb:3.1 from https://hub.docker.com/_/couchdb/
Browser name and version:
Operating system and version: ubuntu 18.04

Additional Context

App error message:

{"message":" No DB shards could be opened.","level":"error","timestamp":"2021-05-19T12:08:48.160Z","metadata":{"scope":"couch","statusCode":500,"request":{"method":"post","headers":{"content-type":"application/json","accept":"application/json","user-agent":"nano/9.0.3 (Node.js v12.22.1)","Accept-Encoding":"deflate, gzip"},"agent":null,"qsStringifyOptions":{"arrayFormat":"repeat"},"url":"http://XXXXXX:XXXXXX@127.0.0.1:5984/sampleDB/_changes","params":{"feed":"longpoll","timeout":60000,"since":"now","limit":100,"include_docs":true},"data":"{}","maxRedirects":0,"httpAgent":null,"httpsAgent":null},"headers":{"uri":"http://XXXXXX:XXXXXX@127.0.0.1:5984/sampleDB/_changes","statusCode":500,"cache-control":"must-revalidate","connection":"close","content-type":"application/json","date":"Wed, 19 May 2021 12:06:07 GMT","x-couch-request-id":"6165cd9925","x-couch-stack-hash":"1577439538","x-couchdb-body-time":"0"},"errid":"non_200","name":"Error","description":"No DB shards could be opened.","error":"internal_server_error","reason":"No DB shards could be opened.","ref":1577439538,"stack":"Error: No DB shards could be opened.\n at responseHandler (/home/user/project/node_modules/nano/lib/nano.js:175:20)\n at /home/user/project/node_modules/nano/lib/nano.js:405:13\n at runMicrotasks (<anonymous>)\n at processTicksAndRejections (internal/process/task_queues.js:97:5)"}}

CouchDB error:

[error] 2021-05-19T12:10:00.576437Z nonode@nohost <0.17889.57> 95d02fadcb req_err(1577439538) internal_server_error : No DB shards could be opened.
[<<"fabric_util:get_shard/4 L111">>,<<"fabric_util:get_shard/4 L128">>,<<"fabric_util:get_shard/4 L128">>,<<"fabric:get_security/2 L183">>,<<"chttpd_auth_request:db_authorization_check/1 L110">>,<<"chttpd_auth_request:authorize_request/1 L19">>,<<"chttpd:handle_req_after_auth/2 L321">>,<<"chttpd:process_request/1 L306">>]

linked issue apache/couchdb-nano#267

The text was updated successfully, but these errors were encountered:

Previously, users with low {Q, N} dbs often got the `"No DB shards could be opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout was too low to open the few available shards and the whole request would crash with a 500 error. Attempt to calculate an optimal timeout value based on the number of shards and the max fabric request timeout limit. The sequence of doubling (by default) timeouts forms a geometric progression. Use the well known closed form formula for the sum [0], and the maximum request timeout, to calculate the initial timeout. The test case illustrates a few examples with some default Q and N values. Because we don't want the timeout value to be too low, since it takes time to open shards, and we don't want to quickly cycle through a few initial shards and discard the results, the minimum inital timeout is clipped to the previously hard-coded 100 msec timeout. Unlike previously however, this minimum value can now also be configured. Another issue with the previous code was that it was emitting a generic error without a specific reason why the shards could not be opened. Timeout was the most likely reason, but to confirm user either had to enable debug logging, or apply clever erlang tracing on the `couch_log:debug/2` call. So as an improvement, emit the reason string into the get_shard/5 recursive call so it can be bubbled up with the error tuple. [0] https://en.wikipedia.org/wiki/Geometric_series Fixes: #3733

Previously, users with low {Q, N} dbs often got the `"No DB shards could be opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout was too low to open the few available shards and the whole request would crash with a 500 error. Attempt to calculate an optimal timeout value based on the number of shards and the max fabric request timeout limit. The sequence of doubling (by default) timeouts forms a geometric progression. Use the well known closed form formula for the sum [0], and the maximum request timeout, to calculate the initial timeout. The test case illustrates a few examples with some default Q and N values. Because we don't want the timeout value to be too low, since it takes time to open shards, and we don't want to quickly cycle through a few initial shards and discard the results, the minimum inital timeout is clipped to the previously hard-coded 100 msec timeout. Unlike previously however, this minimum value can now also be configured. [0] https://en.wikipedia.org/wiki/Geometric_series Fixes: #3733

nickva · 2021-09-09T15:00:22Z

PR merged

nickva mentioned this issue Sep 8, 2021

Improve fabric_util get_db timeout logic #3734

Merged

nickva closed this as completed Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No DB shards could be opened - req_err(1577439538) #3733

No DB shards could be opened - req_err(1577439538) #3733

nickva commented Sep 8, 2021

Description

Steps to Reproduce

Expected Behaviour

Your Environment

Additional Context

nickva commented Sep 9, 2021

No DB shards could be opened - req_err(1577439538) #3733

No DB shards could be opened - req_err(1577439538) #3733

Comments

nickva commented Sep 8, 2021

Discussed in #3573

Description

Steps to Reproduce

Expected Behaviour

Your Environment

Additional Context

nickva commented Sep 9, 2021