Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erlang native query server consumes too much RAM when filters a document with lots of revisions #1209

Closed
mmarinchenko opened this issue Mar 7, 2018 · 7 comments

Comments

@mmarinchenko
Copy link

Hi all! We have online service which uses CouchDB heavily. It has many databases with frequently updated documents. So we faced a problem.

Documents are processed by native query servers with all of its revisions as a one single entity when they are requested with style=all_docs parameter. As a result when document with many, many revisions including deleted ones is requested through native Erlang filter CouchDB eats inadequate amount of memory.

The reason is as follows (branch master is used here just for simplicity, we use CouchDB 1.6 actually):

  1. couch_changes:filter() reads all revisions of a document into Docs and calls couch_query_servers:filter_docs()
  2. couch_query_servers:filter_docs() calls couch_native_process:prompt() through ddoc_prompt()/proc_prompt()
  3. And then couch_native_process:handle_call() tries to convert the document with all of its revisions to_binary at once!

Filter processes each revision one by one, right? Why trying to handle all revisions in memory? And if you can handle all revisions in memory then why not to handle in memory entire database? :)

Of course, this problem may be workarounded from the maintenance perspective. Old revisions may be purged from the databases, databases may be compacted and so on. But this is not mean that current behavior is an example of good software design, I guess.

Expected Behavior

Assume that the RAM in our galaxy is not infinite, and the document may have many and many and many revisions. Simply put do not load all revisions at once. After all each revision is a document itself.

Steps to Reproduce

  1. Create database with document which has a lot of revisions (if you have very much amount of RAM you may try to create 10 000 revisions or even greater).
  2. Enable Erlang native query server in local.ini and write any simple filter using Erlang (the document from point 1 is not required to pass this filter).
  3. Request the _changes feed using created Erlang filter with "style=all_docs" parameter. Just to clarify you may emulate 16 parallel queries as follows:
    for i in {1..16}; do curl -u admin:admin "http://localhost:5984/db/_changes?feed=normal&style=all_docs&filter=app%2FverySimpleErlangFilter" & done
  4. Run htop and look at how CouchDB eats all your memory!

Environment

  • Version used: 1.6.0
  • Operating System and version: Ubuntu 16.04
@janl
Copy link
Member

janl commented Mar 7, 2018

Are you setting your revs_limit to 10k as well? otherwise, creating 1000 revisions will suffice.

And just to clarify, you’re talking about many updates to a doc, but not that many conflicts on the doc?

@mmarinchenko
Copy link
Author

mmarinchenko commented Mar 12, 2018

Thank you for the response, Janl!

But why are you asking this? I've pointed out to the source code :)

I'm talking about any revision, regular, conflicted, or deleted. It doesn't matter in this particular context. Of course, with many conflicts the situation will be even worse. If the style=all_docs parameter is specified in request along with native filter then CouchDB calls to_binary() for each document with all its revisions. All at once!

I'm not claiming that a document with such amount of revisions (especially conflicts) is a good state. No. This happened to us by accident, I guess. I'm not responsible for the application side of things, so I don't know why this happened.

I'm just trying to say that loading a document with all of its associated revisions into memory at once and then convert all this data into plain list using to_binary() without checking the size of loaded data, any buffering, etc. is a very bad idea for a database management system.

If we look at changes enumerator we'll see that situation even worse. Imagine that we need to filter several sequential changes with such a big bad document through native query server. CouchDB will be loading all the revisions and calling to_binary() for all the revisions at once on each single change. For what matter?

@wohali
Copy link
Member

wohali commented Mar 27, 2018

I see this is on 1.6.0. Have you tried this against 1.7.1? Or 2.1.1?

@wohali wohali added the 1.x label Mar 27, 2018
@mmarinchenko
Copy link
Author

Yes, we tried this on 1.7.1 (built for Ubuntu 16.04 by us on our own) and 2.1.1 (from the Apache repository) before actually digging into the source code deeply. Behavior is the same on all versions.

Later I checked the master branch and the issue is there.

@wohali wohali added bug and removed 1.x labels Mar 27, 2018
@nickva
Copy link
Contributor

nickva commented Nov 13, 2023

This should be fixed in 23e5f92

There is a new query_server_config option revs_limit defaulting to 20 which limits the number of branches of heavily conflicted documents.

@nickva nickva closed this as completed Nov 13, 2023
@mmarinchenko
Copy link
Author

@nickva

This should be fixed in 23e5f92

No, this was fixed in 0678742

@mmarinchenko
Copy link
Author

Thanks to @rnewson!👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants