Skip to content

Reduce default max_dbs_open to 100#763

Closed
wohali wants to merge 1 commit intomasterfrom
reduce_default_max_dbs_open
Closed

Reduce default max_dbs_open to 100#763
wohali wants to merge 1 commit intomasterfrom
reduce_default_max_dbs_open

Conversation

@wohali
Copy link
Member

@wohali wohali commented Aug 20, 2017

With q=8 being the default shard setting, and the Linux default of
1024 file handles per process out of the box on virtually every distro,
the maximum number of databases an unmodified user environment could
conceivably open simultaneously is 128. (Assuming one view/index per
database open as well, that number drops to 64.)

When we hit these limits (enfile / emfile Erlang errors) CouchDB starts
acting very erratically, leading to lots of scary errors and in some
cases inconsistent databases. We have a recent report of mem3 dying as a
result, which took the ets tables with it - and this occurred with just
32 databases, with multiple views . In another case a user reported
pereceived database corruption (which was actually a combination of
error reports + app-server layer errors).

While we should recommend ulimit/fs.file-max be bumped in production
use, and consequently bumping max_dbs_open as well, we should defend
against these kinds of errors for users who compile and install from
source (our official distribution option). max_dbs_open = 100 makes this
kind of error less likely.

This also matches our couchdb-documentation repository, which still
reflects a setting of 100 and was never updated to match our increased
500 limit.

The alternate approach is for us to better handle enfile / emfile errors
and react accordingly, but there are are a significant number of touch
points in the code where this can cause issues.

With `q=8` being the default shard setting, and the Linux default of
1024 file handles per process out of the box on virtually every distro,
the maximum number of databases an unmodified user environment could
conceivably open simultaneously is 128. (Assuming one view/index per
database open as well, that number drops to 64.)

When we hit these limits (enfile / emfile Erlang errors) CouchDB starts
acting very erratically, leading to lots of scary errors and in some
cases inconsistent databases. We have a recent report of mem3 dying as a
result, which took the ets tables with it - and this occurred with just
32 databases, with multiple views . In another case a user reported
pereceived database corruption (which was actually a combination of
error reports + app-server layer errors).

While we should recommend ulimit/fs.file-max be bumped in production
use, and consequently bumping max_dbs_open as well, we should defend
against these kinds of errors for users who compile and install from
source (our official distribution option). max_dbs_open = 100 makes this
kind of error less likely.

This also matches our couchdb-documentation repository, which still
reflects a setting of 100 and was never updated to match our increased
500 limit.

The alternate approach is for us to better handle enfile / emfile errors
and react accordingly, but there are are a significant number of touch
points in the code where this can cause issues.
@wohali
Copy link
Member Author

wohali commented Aug 20, 2017

Note that tests are failing because a change in a separate repo, couchdb-config is required. See reference above.

@davisp
Copy link
Member

davisp commented Aug 21, 2017

For the failing tests I'd just add the dependency update in this PR as a separate commit.

Did we just talk about a log message during startup that looked at the max file handle count and added a warning or did we actually add that. And/or is that just a hard thing to do portably for our target platforms? I seem to recall a similar discussion about this issue for the test suite.

@chewbranca
Copy link
Contributor

The max_dbs_open field is a bit of a misnomer in that it would be more accurate to call it max_shards_open because it tracks the number of open shards, not the number of open dbs. You should not be hitting file open limits with max_dbs_open=500 on a system with file limit of 1024 with just dbs. If we're letting views fall through the cracks then we should fix it so view shards are regulated as well.

I don't think reducing this value to 100 is a good change, as 100 is quite low for a default.

@wohali
Copy link
Member Author

wohali commented Aug 21, 2017

I will close this PR, but before I do, I'll run some tests as to what we can do to fix this issue. Given recent issues filed I'm not sure what @chewbranca says is true in practice.

The especially difficult part is if the system wide fs.file-max setting is too low, we may never hit the couchdb process's file handle limit before hitting the kernel max limit, and we start getting emfile/enfile errors anyway. Again for reference on CentOS/RHEL the default fs.file-max is 4096, so if you're running 3-4 other fd-hungry daemons on the box, even with the default 1024 fd limit, you're going to run into this situation. Perhaps we need to do more probing on this front, os-dependent as much as possible...

If what we're running into is an issue with view shards not db shards, then yeah, we need more. @sagelywizard was working on this in #477 so perhaps we should close that out as well.

@wohali
Copy link
Member Author

wohali commented Aug 21, 2017

also @davisp even if I was to move forward with merging the other repo, it's a circular dependency problem: couchdb-config's test depends on the definition of MAX_DBS_OPEN which is in this repo, and the overall couchdb repo test suite depends on couchdb-config's test suite passing. What a pain.

@davisp
Copy link
Member

davisp commented Aug 21, 2017

Best thing I can think of without thinking too hard is to just merge things that aren't this repo to master and then we gate that on the CI on this repo passing? Can always revert.

Second option is to pull the hash as a temporary bump here maybe?

@wohali
Copy link
Member Author

wohali commented Sep 20, 2017

Closing until I can confirm what we should do re: @chewbranca 's comments.

@wohali wohali closed this Sep 20, 2017
@nickva nickva deleted the reduce_default_max_dbs_open branch December 18, 2019 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants