Erlang native query server consumes too much RAM when filters a document with lots of revisions #1209

mmarinchenko · 2018-03-07T15:50:49Z

Hi all! We have online service which uses CouchDB heavily. It has many databases with frequently updated documents. So we faced a problem.

Documents are processed by native query servers with all of its revisions as a one single entity when they are requested with style=all_docs parameter. As a result when document with many, many revisions including deleted ones is requested through native Erlang filter CouchDB eats inadequate amount of memory.

The reason is as follows (branch master is used here just for simplicity, we use CouchDB 1.6 actually):

couch_changes:filter() reads all revisions of a document into Docs and calls couch_query_servers:filter_docs()
couch_query_servers:filter_docs() calls couch_native_process:prompt() through ddoc_prompt()/proc_prompt()
And then couch_native_process:handle_call() tries to convert the document with all of its revisions to_binary at once!

Filter processes each revision one by one, right? Why trying to handle all revisions in memory? And if you can handle all revisions in memory then why not to handle in memory entire database? :)

Of course, this problem may be workarounded from the maintenance perspective. Old revisions may be purged from the databases, databases may be compacted and so on. But this is not mean that current behavior is an example of good software design, I guess.

Expected Behavior

Assume that the RAM in our galaxy is not infinite, and the document may have many and many and many revisions. Simply put do not load all revisions at once. After all each revision is a document itself.

Steps to Reproduce

Create database with document which has a lot of revisions (if you have very much amount of RAM you may try to create 10 000 revisions or even greater).
Enable Erlang native query server in local.ini and write any simple filter using Erlang (the document from point 1 is not required to pass this filter).
Request the _changes feed using created Erlang filter with "style=all_docs" parameter. Just to clarify you may emulate 16 parallel queries as follows:
for i in {1..16}; do curl -u admin:admin "http://localhost:5984/db/_changes?feed=normal&style=all_docs&filter=app%2FverySimpleErlangFilter" & done
Run htop and look at how CouchDB eats all your memory!

Environment

Version used: 1.6.0
Operating System and version: Ubuntu 16.04

The text was updated successfully, but these errors were encountered:

janl · 2018-03-07T15:54:22Z

Are you setting your revs_limit to 10k as well? otherwise, creating 1000 revisions will suffice.

And just to clarify, you’re talking about many updates to a doc, but not that many conflicts on the doc?

mmarinchenko · 2018-03-12T05:59:01Z

Thank you for the response, Janl!

But why are you asking this? I've pointed out to the source code :)

I'm talking about any revision, regular, conflicted, or deleted. It doesn't matter in this particular context. Of course, with many conflicts the situation will be even worse. If the style=all_docs parameter is specified in request along with native filter then CouchDB calls to_binary() for each document with all its revisions. All at once!

I'm not claiming that a document with such amount of revisions (especially conflicts) is a good state. No. This happened to us by accident, I guess. I'm not responsible for the application side of things, so I don't know why this happened.

I'm just trying to say that loading a document with all of its associated revisions into memory at once and then convert all this data into plain list using to_binary() without checking the size of loaded data, any buffering, etc. is a very bad idea for a database management system.

If we look at changes enumerator we'll see that situation even worse. Imagine that we need to filter several sequential changes with such a big bad document through native query server. CouchDB will be loading all the revisions and calling to_binary() for all the revisions at once on each single change. For what matter?

wohali · 2018-03-27T01:49:10Z

I see this is on 1.6.0. Have you tried this against 1.7.1? Or 2.1.1?

mmarinchenko · 2018-03-27T04:09:09Z

Yes, we tried this on 1.7.1 (built for Ubuntu 16.04 by us on our own) and 2.1.1 (from the Apache repository) before actually digging into the source code deeply. Behavior is the same on all versions.

Later I checked the master branch and the issue is there.

nickva · 2023-11-13T05:25:15Z

This should be fixed in 23e5f92

There is a new query_server_config option revs_limit defaulting to 20 which limits the number of branches of heavily conflicted documents.

mmarinchenko · 2023-11-13T06:55:58Z

@nickva

This should be fixed in 23e5f92

No, this was fixed in 0678742

mmarinchenko · 2023-11-13T07:03:26Z

Thanks to @rnewson!👍

wohali added the 1.x label Mar 27, 2018

wohali added bug and removed 1.x labels Mar 27, 2018

janl added the performance label Jul 14, 2018

nickva closed this as completed Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Erlang native query server consumes too much RAM when filters a document with lots of revisions #1209

Erlang native query server consumes too much RAM when filters a document with lots of revisions #1209

mmarinchenko commented Mar 7, 2018

janl commented Mar 7, 2018

mmarinchenko commented Mar 12, 2018 •

edited

Loading

wohali commented Mar 27, 2018

mmarinchenko commented Mar 27, 2018

nickva commented Nov 13, 2023

mmarinchenko commented Nov 13, 2023

mmarinchenko commented Nov 13, 2023

Erlang native query server consumes too much RAM when filters a document with lots of revisions #1209

Erlang native query server consumes too much RAM when filters a document with lots of revisions #1209

Comments

mmarinchenko commented Mar 7, 2018

Expected Behavior

Steps to Reproduce

Environment

janl commented Mar 7, 2018

mmarinchenko commented Mar 12, 2018 • edited Loading

wohali commented Mar 27, 2018

mmarinchenko commented Mar 27, 2018

nickva commented Nov 13, 2023

mmarinchenko commented Nov 13, 2023

mmarinchenko commented Nov 13, 2023

mmarinchenko commented Mar 12, 2018 •

edited

Loading