CouchDB degrades and starts timeouting on all index operations #1341

pzduniak · 2018-05-24T20:20:12Z

Expected Behavior

CouchDB should start up with ~100 records in its databases and ~14 views. 2.1.1 in Docker. Doesn't work as both root and couchdb user.

Current Behavior

https://gist.github.com/pzduniak/4d4d9c148ee910bb053d9cfdd1b04216
Indexing is never done.

Steps to Reproduce (for bugs)

We're casually running into it using our Dockerfile.
couchdb.zip

I honestly have absolutely no idea how to fix it. As you can see in the Dockerfile, timeout values are high.

Context

Our application waits until CouchDB is done starting up (using /healthcheck.sh) and then tries to create design documents in its database (cubefiles), eventually querying for a user by his username (where everything fails, because indexes can't be built).

Your Environment

Ubuntu 16.04 running Docker 18.03

The text was updated successfully, but these errors were encountered:

wohali · 2018-05-24T21:03:52Z

How are you launching your container? What is your host's CPU/RAM/disk setup?

I don't have time to look at your database, but assuming they're straightforward views, you probably don't have enough CPUs allocated. See #1301 for instance.

pzduniak · 2018-05-24T22:43:35Z

How are you launching your container?

Default Docker image params + my small script, basically the couchdb-docker repo.

What is your host's CPU/RAM/disk setup?

Google Cloud, 1 vCPU, 3.75GB of RAM, their standard 300GB disk

you probably don't have enough CPUs allocated.

Are you seriously suggesting that it should take more than 4 minutes to generate views for 15 rows?

pzduniak · 2018-05-24T22:45:51Z

Let me rephrase: the database doesn't start at all. The view functionality is dead. This should not happen. This is some obscure bug with Erlang failing to startup and I have not modified the scripts in the container enough to warrant any suspicion that it could happen in my code.

pzduniak · 2018-05-24T22:51:38Z

Alright, what the hell, if CouchDB can't even start on a single core (it worked just fine on 1.x), then why is this behavior accepted. Is there absolutely no way to increase the timeouts on the initial handshake or whatever?

wohali · 2018-05-24T23:57:07Z

@pzduniak I'll remind you we have a Code of Conduct at this project. Please choose your words a bit more carefully. You're also getting all of this support for free. We are a volunteer-run project here.

/cc @janl

"The database doesn't start at all" vs. "the view functionality is dead" are two different complaints. If you are able to GET / and get a response back, CouchDB is running.

CouchDB runs inside of Erlang. The JavaScript view functionality is provided by 1 or more additional, forked, external couchjs processes. When CouchDB is busy, Erlang will gladly eat an entire core, meaning you may need to provide an additional core for couchjs.

Further, in 2.x, the default is to split each database into 8 shards on disk. CouchDB 2.x forks a separate couchjs process for each of these shards, meaning your Docker container is running 8 additional, separate processes per design document to build those views. More databases and more design documents mean multiplicatively more external couchjs processes.

There are ways to alter how many processes can run simultaneously, and ways to adjust the number of shards for each newly created database as well as a global default for newly created databases. You may also wish to pipeline your view builds at startup so that not all of the views are attempted to be built at once (by querying them one at a time). Finally, if you simply want to wait longer, you can increase the os process timeout, and have your startup script try multiple times to GET the results of a view before it gives up.

If you're looking to reduce CPU requirements, look into whether or not you can replace your views with declarative Mango secondary indexes. With the addition of Mango-powered secondary indexes to CouchDB, JS views are becoming less necessary. As Mango runs inside of the Erlang process, it shares CPU with the main process more reasonably.

pzduniak · 2018-05-25T05:29:58Z

Thank you, limiting the process count seems to work. But:

What's up with the sudden CoC mention? Are we supposed to be G-rated? Or did you think that some message was a personal attack? I just don't get it. I expressed my shock at how CouchDB has diverted from its roots. The same project that my colleague used on Raspberry Pis on farms in Africa suddenly started choking on relatively powerful hardware when it has like 10 views on barely any data, who would've thought? Maybe if 2.x breaks so easily, a dynamic default should be used for the process limit, probably one scaling with the CPU count? For example, Go doesn't spawn "up to 100 processes" when it starts up a hello world app.
There were no docs to assist me. A guy on IRC had to tell me about the existence of something called "fabric" (that I still don't know what it is, I only know that it has a setting that I had to increase), which is nearly unmentioned in the docs (I only found some references in the changelog). Additionally, I feel like there should be a huge disclaimer that says "the software might break if you run it on perfectly fine 2 server cores, please change the following settings to prevent it" somewhere in the docs.
Additionally, it feels like my issue was discarded nearly immediately without anyone looking into the actual configuration. Out of 3 interlinked issues, 1 was someone starving the database on purpose (50MB of RAM, come on), 1 person with a non-insignificant amount of data and 1 person dealing with different behaviour (degradation vs startup issue). Most of the timeouts that you've suggested were already in place.

I still believe that what I reported is a valid issue. CouchDB somehow manages to break its startup process by spawning tens of compute-hungry processes, even if only 1-2 cores are available. Is 2.x only supposed to be run on beefy servers now? A product that I'm working on (where I was forced to use CouchDB to in fact limit resource usage) started out with 1.x running on mobile servers (think notebook CPUs in small boxes) that are used to deliver educational content to students in developing countries. It would be a shame if I had to rewrite the database interaction layer just because it's impossible to both use modern features of CouchDB and have any reliability on slower hardware.

janl · 2018-05-25T08:09:54Z

I expressed my shock at how CouchDB has diverted from its roots.

Let me express my shock at your analysis of this in light of insufficient data.

As for the CoC mention, I’m not issuing a formal notice yet, but keep the aggressive undertone out of this please. If you don’t get what I mean, let someone who you trust read over what you wrote and see if they can find what we see. I’m not suggesting your feelings aren’t warranted here, but you express them in a way we usually don’t conduct communication in this project. We appreciate your cooperation. In general: if you expect support from someone, best not antagonise them up front, however warranted. We are all doing this in our spare time and getting on our nerves is the easiest way to get ignored here ;) — As for quickly closing issues: it’s just how we do things here to make sure we give everyone the same amount of attention. If you can’t live with your issues being closed while we wait for more info, I can’t help you.

For the technical stuff.

Above anything we don’t recommend running CouchDB in Docker for reasons that have nothing to do with CouchDB and everything to do with the Docker ecosystem not being mature enough to handle low latency network and io-bound applications to a degree that would make us comfortable. That said, there are happy Docker CouchDB users, and I don’t want to discourage them being successful with their configurations.

CouchDB somehow manages to break its startup process by spawning tens of compute-hungry processes

CouchDB startup succeeds just fine, otherwise you wouldn’t get to the error messages you are seeing.

There were no docs to assist me

This point is well taken. CouchDB is designed to run on beefy servers and CouchDB 2.x especially so. That doesn’t mean it can also run on low-spec hardware, but the defaults are not set that way (unless you follow one of the documented setup paths, which you aren't). What’s lacking are docs on configuring CouchDB 2 for low-spec hardware. I’m sorry that’s the case, and once you work through all this, we gladly accept a PR to the docs.

(FWIW: replying to this issue was eating about all the time I can spend on CouchDB for a couple of days, so don’t expect rapid follow-ups here).

pzduniak · 2018-05-25T10:00:34Z

CouchDB is designed to run on beefy servers and CouchDB 2.x especially so.

Huh, my bad then, seems like the applications that I've worked on where Couch's replication was the perfect solution (replication over intermittent connectivity on "edge deployments" and stuff like that) weren't what the project was made for. Looks like PouchDB took over that role after it matured?

That doesn’t mean it can also run on low-spec hardware, but the defaults are not set that way (unless you follow one of the documented setup paths, which you aren't).

Do you mean http://docs.couchdb.org/en/2.1.1/config/query-servers.html#query_server_config/os_process_limit? I'm struggling to find anything else.

As much as being overly PC conflicts my nature (and culture!), whatever, I can do that. Thank you for maintaining the project, I wish you didn't jump to a Code of Conduct as soon as you see something that you don't like. Just correct the person instead of making threats (of a notice). No one's trying to attack you, no one is trying to undermine your competence. Different people talk in different ways, in the end we all want mostly the same thing.

Oh, also, writing a paragraph about your issue workflow might be useful for the CONTRIBUTING.md - I'm used to issues staying open until either of the sides marks them as resolved, unless they're blatantly invalid (or duplicate).

EDIT: That got lost in a few iterations of the comment trying to be PC: sorry if you felt attacked, I usually don't deal with projects with such a strict CoC (if any, usually meritocracy is sufficient).

EDIT2: Oh wow, I just realized how you interpreted my comments. I'm talking about the code. Not about you. Code. I'm surprised by how the code works. Not your comments. The comment about accepting behavior was about the code too. And by closing the issue I feel like you "accept" it. It's a bug for me - it may not be for you. That's cool, I'm not the maintainer, you have the power. I got that it's by design (or lack of safety guards for an edge case), there's no point in offending me too :^)

wohali · 2018-05-25T15:47:19Z

I will open a new ticket on documenting being successful configuring and running CouchDB on low-spec hardware configurations.

FWIW @chewbranca and I both run CouchDB 2.x just fine on Raspberry Pis (but don't use the JavaScript view engine at all!)

Ss to bringing up the CoC: The Apache Software Foundation has a saying: Code is community. You act disrespectful to the code, you are acting disrespectful to the community. We have a lot going on here, as you can see from IRC, Slack, our issue tracker, the repositories, the mailing lists, and so on. We are all volunteers. Trying to solve your issues shouldn't also have to tolerate being yelled at while we do it. This isn't about being PC, it's about being polite. Try holding back your anger for a little longer next time - you might find you'll get better help.

wohali closed this as completed May 24, 2018

wohali added docker labels May 24, 2018

wohali mentioned this issue Oct 17, 2022

Document best configuration settings for low-spec HW #4211

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CouchDB degrades and starts timeouting on all index operations #1341

CouchDB degrades and starts timeouting on all index operations #1341

pzduniak commented May 24, 2018

wohali commented May 24, 2018

pzduniak commented May 24, 2018 •

edited

Loading

pzduniak commented May 24, 2018

pzduniak commented May 24, 2018

wohali commented May 24, 2018

pzduniak commented May 25, 2018

janl commented May 25, 2018

pzduniak commented May 25, 2018 •

edited

Loading

wohali commented May 25, 2018

CouchDB degrades and starts timeouting on all index operations #1341

CouchDB degrades and starts timeouting on all index operations #1341

Comments

pzduniak commented May 24, 2018

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Context

Your Environment

wohali commented May 24, 2018

pzduniak commented May 24, 2018 • edited Loading

pzduniak commented May 24, 2018

pzduniak commented May 24, 2018

wohali commented May 24, 2018

pzduniak commented May 25, 2018

janl commented May 25, 2018

pzduniak commented May 25, 2018 • edited Loading

wohali commented May 25, 2018

pzduniak commented May 24, 2018 •

edited

Loading

pzduniak commented May 25, 2018 •

edited

Loading