Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No DB shards could be opened - req_err(1577439538) #3733

Closed
nickva opened this issue Sep 8, 2021 Discussed in #3573 · 1 comment
Closed

No DB shards could be opened - req_err(1577439538) #3733

nickva opened this issue Sep 8, 2021 Discussed in #3573 · 1 comment

Comments

@nickva
Copy link
Contributor

nickva commented Sep 8, 2021

Discussed in #3573

Originally posted by BrodaUa May 19, 2021

Description

I try to build a messaging app with CouchDB +NodeJS using nano lib. For that I try to create a large number of user-message dbs (up to 10k) and start listening changes feed of each db to track new message and reroute it accordingly.
Currently, when I try to create 10k of dbs and subscribe to the I get No DB shards could be opened - please, see attached error log. This issue linked to the ticket in couchdb-nano repo: apache/couchdb-nano#267 . In short, before I had a performance issue that nano was picking up last updates very slowly. Following the advice to increase maxSockets of http agent, now I am stuck with No DB shards could be opened issue.

I tried to increase the file descriptors value of the docker container to 128k, but still have the issue.

Steps to Reproduce

  1. create 10k databases
  2. try to subscribe to changes feed

Expected Behaviour

no errors

Your Environment

Additional Context

App error message:

{"message":" No DB shards could be opened.","level":"error","timestamp":"2021-05-19T12:08:48.160Z","metadata":{"scope":"couch","statusCode":500,"request":{"method":"post","headers":{"content-type":"application/json","accept":"application/json","user-agent":"nano/9.0.3 (Node.js v12.22.1)","Accept-Encoding":"deflate, gzip"},"agent":null,"qsStringifyOptions":{"arrayFormat":"repeat"},"url":"http://XXXXXX:XXXXXX@127.0.0.1:5984/sampleDB/_changes","params":{"feed":"longpoll","timeout":60000,"since":"now","limit":100,"include_docs":true},"data":"{}","maxRedirects":0,"httpAgent":null,"httpsAgent":null},"headers":{"uri":"http://XXXXXX:XXXXXX@127.0.0.1:5984/sampleDB/_changes","statusCode":500,"cache-control":"must-revalidate","connection":"close","content-type":"application/json","date":"Wed, 19 May 2021 12:06:07 GMT","x-couch-request-id":"6165cd9925","x-couch-stack-hash":"1577439538","x-couchdb-body-time":"0"},"errid":"non_200","name":"Error","description":"No DB shards could be opened.","error":"internal_server_error","reason":"No DB shards could be opened.","ref":1577439538,"stack":"Error: No DB shards could be opened.\n at responseHandler (/home/user/project/node_modules/nano/lib/nano.js:175:20)\n at /home/user/project/node_modules/nano/lib/nano.js:405:13\n at runMicrotasks (<anonymous>)\n at processTicksAndRejections (internal/process/task_queues.js:97:5)"}}

CouchDB error:

[error] 2021-05-19T12:10:00.576437Z nonode@nohost <0.17889.57> 95d02fadcb req_err(1577439538) internal_server_error : No DB shards could be opened.
[<<"fabric_util:get_shard/4 L111">>,<<"fabric_util:get_shard/4 L128">>,<<"fabric_util:get_shard/4 L128">>,<<"fabric:get_security/2 L183">>,<<"chttpd_auth_request:db_authorization_check/1 L110">>,<<"chttpd_auth_request:authorize_request/1 L19">>,<<"chttpd:handle_req_after_auth/2 L321">>,<<"chttpd:process_request/1 L306">>]

linked issue apache/couchdb-nano#267

nickva added a commit that referenced this issue Sep 8, 2021
Previously, users with low {Q, N} dbs often got the `"No DB shards could be
opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout
was too low to open the few available shards and the whole request would crash
with a 500 error.

Attempt to calculate an optimal timeout value based on the number of shards and
the max fabric request timeout limit.

The sequence of doubling (by default) timeouts forms a geometric progression.
Use the well known closed form formula for the sum [0], and the maximum request
timeout, to calculate the initial timeout. The test case illustrates a few
examples with some default Q and N values.

Because we don't want the timeout value to be too low, since it takes time to
open shards, and we don't want to quickly cycle through a few initial shards
and discard the results, the minimum inital timeout is clipped to the
previously hard-coded 100 msec timeout. Unlike previously however, this minimum
value can now also be configured.

Another issue with the previous code was that it was emitting a generic error
without a specific reason why the shards could not be opened. Timeout was the
most likely reason, but to confirm user either had to enable debug logging, or
apply clever erlang tracing on the `couch_log:debug/2` call. So as an
improvement, emit the reason string into the get_shard/5 recursive call so it
can be bubbled up with the error tuple.

[0] https://en.wikipedia.org/wiki/Geometric_series

Fixes: #3733
nickva added a commit that referenced this issue Sep 8, 2021
Previously, users with low {Q, N} dbs often got the `"No DB shards could be
opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout
was too low to open the few available shards and the whole request would crash
with a 500 error.

Attempt to calculate an optimal timeout value based on the number of shards and
the max fabric request timeout limit.

The sequence of doubling (by default) timeouts forms a geometric progression.
Use the well known closed form formula for the sum [0], and the maximum request
timeout, to calculate the initial timeout. The test case illustrates a few
examples with some default Q and N values.

Because we don't want the timeout value to be too low, since it takes time to
open shards, and we don't want to quickly cycle through a few initial shards
and discard the results, the minimum inital timeout is clipped to the
previously hard-coded 100 msec timeout. Unlike previously however, this minimum
value can now also be configured.

[0] https://en.wikipedia.org/wiki/Geometric_series

Fixes: #3733
nickva added a commit that referenced this issue Sep 9, 2021
Previously, users with low {Q, N} dbs often got the `"No DB shards could be
opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout
was too low to open the few available shards and the whole request would crash
with a 500 error.

Attempt to calculate an optimal timeout value based on the number of shards and
the max fabric request timeout limit.

The sequence of doubling (by default) timeouts forms a geometric progression.
Use the well known closed form formula for the sum [0], and the maximum request
timeout, to calculate the initial timeout. The test case illustrates a few
examples with some default Q and N values.

Because we don't want the timeout value to be too low, since it takes time to
open shards, and we don't want to quickly cycle through a few initial shards
and discard the results, the minimum inital timeout is clipped to the
previously hard-coded 100 msec timeout. Unlike previously however, this minimum
value can now also be configured.

[0] https://en.wikipedia.org/wiki/Geometric_series

Fixes: #3733
nickva added a commit that referenced this issue Sep 9, 2021
Previously, users with low {Q, N} dbs often got the `"No DB shards could be
opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout
was too low to open the few available shards and the whole request would crash
with a 500 error.

Attempt to calculate an optimal timeout value based on the number of shards and
the max fabric request timeout limit.

The sequence of doubling (by default) timeouts forms a geometric progression.
Use the well known closed form formula for the sum [0], and the maximum request
timeout, to calculate the initial timeout. The test case illustrates a few
examples with some default Q and N values.

Because we don't want the timeout value to be too low, since it takes time to
open shards, and we don't want to quickly cycle through a few initial shards
and discard the results, the minimum inital timeout is clipped to the
previously hard-coded 100 msec timeout. Unlike previously however, this minimum
value can now also be configured.

[0] https://en.wikipedia.org/wiki/Geometric_series

Fixes: #3733
@nickva
Copy link
Contributor Author

nickva commented Sep 9, 2021

PR merged

@nickva nickva closed this as completed Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant