Improve fabric_util get_db timeout logic #3734
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, users with low {Q, N} dbs often got the
"No DB shards could be opened."
error when the cluster is overloaded. The hard-coded 100 msec timeout was too low to open the few available shards and the whole request would crash with a 500 error.Attempt to calculate an optimal timeout value based on the number of shards and the max fabric request timeout limit.
The sequence of doubling (by default) timeouts forms a geometric progression. Use the well known closed form formula for the sum [0], and the maximum request timeout, to calculate the initial timeout. The test case illustrates a few examples with some default Q and N values.
Because we don't want the timeout value to be too low, since it takes time to open shards, and we don't want to quickly cycle through a few initial shards and discard the results, the minimum initial timeout is clipped to the
previously hard-coded 100 msec timeout. Unlike previously however, this minimum value can now also be configured.
[0] https://en.wikipedia.org/wiki/Geometric_series
Fixes: #3733