New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No DB shards could be opened - req_err(1577439538) #3733
Comments
nickva
added a commit
that referenced
this issue
Sep 8, 2021
Previously, users with low {Q, N} dbs often got the `"No DB shards could be opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout was too low to open the few available shards and the whole request would crash with a 500 error. Attempt to calculate an optimal timeout value based on the number of shards and the max fabric request timeout limit. The sequence of doubling (by default) timeouts forms a geometric progression. Use the well known closed form formula for the sum [0], and the maximum request timeout, to calculate the initial timeout. The test case illustrates a few examples with some default Q and N values. Because we don't want the timeout value to be too low, since it takes time to open shards, and we don't want to quickly cycle through a few initial shards and discard the results, the minimum inital timeout is clipped to the previously hard-coded 100 msec timeout. Unlike previously however, this minimum value can now also be configured. Another issue with the previous code was that it was emitting a generic error without a specific reason why the shards could not be opened. Timeout was the most likely reason, but to confirm user either had to enable debug logging, or apply clever erlang tracing on the `couch_log:debug/2` call. So as an improvement, emit the reason string into the get_shard/5 recursive call so it can be bubbled up with the error tuple. [0] https://en.wikipedia.org/wiki/Geometric_series Fixes: #3733
nickva
added a commit
that referenced
this issue
Sep 8, 2021
Previously, users with low {Q, N} dbs often got the `"No DB shards could be opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout was too low to open the few available shards and the whole request would crash with a 500 error. Attempt to calculate an optimal timeout value based on the number of shards and the max fabric request timeout limit. The sequence of doubling (by default) timeouts forms a geometric progression. Use the well known closed form formula for the sum [0], and the maximum request timeout, to calculate the initial timeout. The test case illustrates a few examples with some default Q and N values. Because we don't want the timeout value to be too low, since it takes time to open shards, and we don't want to quickly cycle through a few initial shards and discard the results, the minimum inital timeout is clipped to the previously hard-coded 100 msec timeout. Unlike previously however, this minimum value can now also be configured. [0] https://en.wikipedia.org/wiki/Geometric_series Fixes: #3733
nickva
added a commit
that referenced
this issue
Sep 9, 2021
Previously, users with low {Q, N} dbs often got the `"No DB shards could be opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout was too low to open the few available shards and the whole request would crash with a 500 error. Attempt to calculate an optimal timeout value based on the number of shards and the max fabric request timeout limit. The sequence of doubling (by default) timeouts forms a geometric progression. Use the well known closed form formula for the sum [0], and the maximum request timeout, to calculate the initial timeout. The test case illustrates a few examples with some default Q and N values. Because we don't want the timeout value to be too low, since it takes time to open shards, and we don't want to quickly cycle through a few initial shards and discard the results, the minimum inital timeout is clipped to the previously hard-coded 100 msec timeout. Unlike previously however, this minimum value can now also be configured. [0] https://en.wikipedia.org/wiki/Geometric_series Fixes: #3733
nickva
added a commit
that referenced
this issue
Sep 9, 2021
Previously, users with low {Q, N} dbs often got the `"No DB shards could be opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout was too low to open the few available shards and the whole request would crash with a 500 error. Attempt to calculate an optimal timeout value based on the number of shards and the max fabric request timeout limit. The sequence of doubling (by default) timeouts forms a geometric progression. Use the well known closed form formula for the sum [0], and the maximum request timeout, to calculate the initial timeout. The test case illustrates a few examples with some default Q and N values. Because we don't want the timeout value to be too low, since it takes time to open shards, and we don't want to quickly cycle through a few initial shards and discard the results, the minimum inital timeout is clipped to the previously hard-coded 100 msec timeout. Unlike previously however, this minimum value can now also be configured. [0] https://en.wikipedia.org/wiki/Geometric_series Fixes: #3733
PR merged |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Discussed in #3573
Originally posted by BrodaUa May 19, 2021
Description
I try to build a messaging app with CouchDB +NodeJS using nano lib. For that I try to create a large number of user-message dbs (up to 10k) and start listening changes feed of each db to track new message and reroute it accordingly.
Currently, when I try to create 10k of dbs and subscribe to the I get No DB shards could be opened - please, see attached error log. This issue linked to the ticket in couchdb-nano repo: apache/couchdb-nano#267 . In short, before I had a performance issue that nano was picking up last updates very slowly. Following the advice to increase maxSockets of http agent, now I am stuck with No DB shards could be opened issue.
I tried to increase the file descriptors value of the docker container to 128k, but still have the issue.
Steps to Reproduce
Expected Behaviour
no errors
Your Environment
Additional Context
App error message:
CouchDB error:
linked issue apache/couchdb-nano#267
The text was updated successfully, but these errors were encountered: