Conversation
With `q=8` being the default shard setting, and the Linux default of 1024 file handles per process out of the box on virtually every distro, the maximum number of databases an unmodified user environment could conceivably open simultaneously is 128. (Assuming one view/index per database open as well, that number drops to 64.) When we hit these limits (enfile / emfile Erlang errors) CouchDB starts acting very erratically, leading to lots of scary errors and in some cases inconsistent databases. We have a recent report of mem3 dying as a result, which took the ets tables with it - and this occurred with just 32 databases, with multiple views . In another case a user reported pereceived database corruption (which was actually a combination of error reports + app-server layer errors). While we should recommend ulimit/fs.file-max be bumped in production use, and consequently bumping max_dbs_open as well, we should defend against these kinds of errors for users who compile and install from source (our official distribution option). max_dbs_open = 100 makes this kind of error less likely. This also matches our couchdb-documentation repository, which still reflects a setting of 100 and was never updated to match our increased 500 limit. The alternate approach is for us to better handle enfile / emfile errors and react accordingly, but there are are a significant number of touch points in the code where this can cause issues.
|
Note that tests are failing because a change in a separate repo, couchdb-config is required. See reference above. |
|
For the failing tests I'd just add the dependency update in this PR as a separate commit. Did we just talk about a log message during startup that looked at the max file handle count and added a warning or did we actually add that. And/or is that just a hard thing to do portably for our target platforms? I seem to recall a similar discussion about this issue for the test suite. |
|
The I don't think reducing this value to 100 is a good change, as 100 is quite low for a default. |
|
I will close this PR, but before I do, I'll run some tests as to what we can do to fix this issue. Given recent issues filed I'm not sure what @chewbranca says is true in practice. The especially difficult part is if the system wide If what we're running into is an issue with view shards not db shards, then yeah, we need more. @sagelywizard was working on this in #477 so perhaps we should close that out as well. |
|
also @davisp even if I was to move forward with merging the other repo, it's a circular dependency problem: couchdb-config's test depends on the definition of MAX_DBS_OPEN which is in this repo, and the overall couchdb repo test suite depends on couchdb-config's test suite passing. What a pain. |
|
Best thing I can think of without thinking too hard is to just merge things that aren't this repo to master and then we gate that on the CI on this repo passing? Can always revert. Second option is to pull the hash as a temporary bump here maybe? |
|
Closing until I can confirm what we should do re: @chewbranca 's comments. |
With
q=8being the default shard setting, and the Linux default of1024 file handles per process out of the box on virtually every distro,
the maximum number of databases an unmodified user environment could
conceivably open simultaneously is 128. (Assuming one view/index per
database open as well, that number drops to 64.)
When we hit these limits (enfile / emfile Erlang errors) CouchDB starts
acting very erratically, leading to lots of scary errors and in some
cases inconsistent databases. We have a recent report of mem3 dying as a
result, which took the ets tables with it - and this occurred with just
32 databases, with multiple views . In another case a user reported
pereceived database corruption (which was actually a combination of
error reports + app-server layer errors).
While we should recommend ulimit/fs.file-max be bumped in production
use, and consequently bumping max_dbs_open as well, we should defend
against these kinds of errors for users who compile and install from
source (our official distribution option). max_dbs_open = 100 makes this
kind of error less likely.
This also matches our couchdb-documentation repository, which still
reflects a setting of 100 and was never updated to match our increased
500 limit.
The alternate approach is for us to better handle enfile / emfile errors
and react accordingly, but there are are a significant number of touch
points in the code where this can cause issues.