Document query_response option #628

tkoeppen · 2020-08-05T19:03:54Z

Description

When querying all documents from a database it happens that the http client is not telling that not all data could be read from the database.

We send the same request several times a day and response contains nearly always the expected 10k documents.

From time to time (every other week) the a response for the same query contains e.g. only 8k documents instead of expected 10k.
When this happen we see in couchdb.log the message "aborted" and short after the error "timeout".

The couchdb contains approx. 100 database, but it happens from time to time for 3 different databases only.
The affected databases contain each approx. 10k docs (1 doc approx. 3k bytes)

We run the same application in multiple environments but find this behaviour in one environment only.
We can run thousands of the same request and it does not happen for multiple days.

This happen in a clustered couchdb.

Steps to Reproduce

We cannot reproduce this issue systematically.
We send a request like
curl "http://[user]:[pw]@localhost/database1/_all_docs?include_docs=true

Every other week (not at a specific time) we find in couchdb.log the related error.

We are not sure how to further trace the "aborted" symptom inside couchdb, eventually we add trace to the go-driver to figure out more and to decide if that is a couchdb issue or client driver issue.

Expected Behaviour

The response should always contain all documents without stopping in the middle of the response.
When couchdb query run into "aborted" wouldn't it be better when the http response status is http-500 instead of http-200? (so in our case the go-driver can handle the error)

Your Environment

{
  "couchdb": "Welcome",
  "version": "3.1.0",
  "git_sha": "ff0feea20",
  "uuid": "5efc7bd500f26ded5b4119e82fa659cc",
  "features": [
    "access-ready",
    "partitioned",
    "pluggable-storage-engines",
    "reshard",
    "scheduler"
  ],
  "vendor": {
    "name": "The Apache Software Foundation"
  }
}

CouchDB version used: 3.1.0
Browser name and version: Not applicable
Operating system and version: couchdb-docker on CentOS 7

Additional Context

We upgraded to 3.1.0 short after it was released but cannot prove if we saw this behaviour only since then.
At the time where it happen no other unusual errors or activities has been seen in couchdb.log.
cpu/memory also do not show major anomalies.

Lines from couchdb.log

[notice] 2020-07-23T03:00:45.960786Z couchdb@x.x.x.x <0.10328.1588> 3ece2982af couchdb-service:5984 1.2.3.4 user GET /database1/_all_docs?include_docs=true 200 aborted 17983
[error] 2020-07-23T03:00:45.960831Z couchdb@x.x.x.x <0.10328.1588> 3ece2982af Response abnormally terminated: timeout

[notice] 2020-08-04T03:00:30.455723Z couchdb@x.x.x.x <0.4075.5619> 9f5b0c41b5 couchdb-service:5984 1.2.3.4 user GET /database2/_all_docs?include_docs=true 200 aborted 20067
[error] 2020-08-04T03:00:30.455758Z couchdb@x.x.x.x <0.4075.5619> 9f5b0c41b5 Response abnormally terminated: timeout

The text was updated successfully, but these errors were encountered:

denyeart · 2020-10-18T03:41:01Z

We see the same intermittent problem in CouchDB 3.1.0 (single node). _all_docs requests are returning fine (1000 records each in about 300ms) and then all of a sudden we get a 200 aborted on the next similar query after about 5 seconds:

[notice] 2020-10-17T07:22:18.239062Z nonode@nohost <0.24353.1500> d1c70dc3f1 couchdb-peer0-org1:5984 172.30.188.173 user GET /testorgschannel0_mapcc$$hcollection$simple/_all_docs?attachments=true&include_docs=true&inclusive_end=false&limit=1001&startkey=%22l%2Bxj9Mibwp%2F%2FQ9ZWKb49wURav%2FV7Qll4YkWogbSwhrQ%3D%22 200 aborted 5219
[error] 2020-10-17T07:22:18.239114Z nonode@nohost <0.24353.1500> d1c70dc3f1 Response abnormally terminated: timeout

A second later another query fails with a different error:

[error] 2020-10-17T07:22:19.131115Z nonode@nohost <0.15619.1999> 39d3b723e2 req_err(1577439538) internal_server_error : No DB shards could be opened.
    [<<"fabric_util:get_shard/4 L111">>,<<"fabric_util:get_shard/4 L128">>,<<"fabric_util:get_shard/4 L128">>,<<"fabric:get_security/2 L183">>,<<"chttpd_auth_request:db_authorization_check/1 L110">>,<<"chttpd_auth_request:authorize_request/1 L19">>,<<"chttpd:handle_req_after_auth/2 L321">>,<<"chttpd:process_request/1 L306">>]

And then subsequent queries are fine again.

I agree that CouchDB should not return a 200 status for an error response.

denyeart · 2020-10-19T16:23:49Z

After a discussion on CouchDB slack, it appears that there is a new api option ?buffer_response=true on view queries in v3.1.1 that ensures that the response is complete before returning back a 200. If not set the default is for the response to be streaming in which case a 200 is returned even while the response is streaming back (and may eventually abort). Given the new enhancement, I think we could close this issue (although would be good to get the API docs updated to describe this behavior).

wohali · 2021-03-29T20:33:47Z

FYI this is also available as a server-wide configuration, [chttpd] buffer_response.

janl transferred this issue from apache/couchdb Mar 14, 2021

janl changed the title ~~GET documents aborted and timeout - query all documents does not return all data~~ Document query_response option Mar 14, 2021

wohali mentioned this issue Mar 29, 2021

Add [chttpd] buffer_response option #645

Merged

wohali closed this as completed in #645 Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document query_response option #628

Document query_response option #628

tkoeppen commented Aug 5, 2020

denyeart commented Oct 18, 2020

denyeart commented Oct 19, 2020

wohali commented Mar 29, 2021

Document query_response option #628

Document query_response option #628

Comments

tkoeppen commented Aug 5, 2020

Description

Steps to Reproduce

Expected Behaviour

Your Environment

Additional Context

denyeart commented Oct 18, 2020

denyeart commented Oct 19, 2020

wohali commented Mar 29, 2021