Skip to content
This repository has been archived by the owner on Oct 17, 2022. It is now read-only.

Document query_response option #628

Closed
tkoeppen opened this issue Aug 5, 2020 · 3 comments · Fixed by #645
Closed

Document query_response option #628

tkoeppen opened this issue Aug 5, 2020 · 3 comments · Fixed by #645

Comments

@tkoeppen
Copy link

tkoeppen commented Aug 5, 2020

Description

When querying all documents from a database it happens that the http client is not telling that not all data could be read from the database.

We send the same request several times a day and response contains nearly always the expected 10k documents.

From time to time (every other week) the a response for the same query contains e.g. only 8k documents instead of expected 10k.
When this happen we see in couchdb.log the message "aborted" and short after the error "timeout".

The couchdb contains approx. 100 database, but it happens from time to time for 3 different databases only.
The affected databases contain each approx. 10k docs (1 doc approx. 3k bytes)

We run the same application in multiple environments but find this behaviour in one environment only.
We can run thousands of the same request and it does not happen for multiple days.

This happen in a clustered couchdb.

Steps to Reproduce

We cannot reproduce this issue systematically.
We send a request like
curl "http://[user]:[pw]@localhost/database1/_all_docs?include_docs=true

Every other week (not at a specific time) we find in couchdb.log the related error.

We are not sure how to further trace the "aborted" symptom inside couchdb, eventually we add trace to the go-driver to figure out more and to decide if that is a couchdb issue or client driver issue.

Expected Behaviour

  1. The response should always contain all documents without stopping in the middle of the response.
  2. When couchdb query run into "aborted" wouldn't it be better when the http response status is http-500 instead of http-200? (so in our case the go-driver can handle the error)

Your Environment

{
  "couchdb": "Welcome",
  "version": "3.1.0",
  "git_sha": "ff0feea20",
  "uuid": "5efc7bd500f26ded5b4119e82fa659cc",
  "features": [
    "access-ready",
    "partitioned",
    "pluggable-storage-engines",
    "reshard",
    "scheduler"
  ],
  "vendor": {
    "name": "The Apache Software Foundation"
  }
}
  • CouchDB version used: 3.1.0
  • Browser name and version: Not applicable
  • Operating system and version: couchdb-docker on CentOS 7

Additional Context

We upgraded to 3.1.0 short after it was released but cannot prove if we saw this behaviour only since then.
At the time where it happen no other unusual errors or activities has been seen in couchdb.log.
cpu/memory also do not show major anomalies.

Lines from couchdb.log

[notice] 2020-07-23T03:00:45.960786Z couchdb@x.x.x.x <0.10328.1588> 3ece2982af couchdb-service:5984 1.2.3.4 user GET /database1/_all_docs?include_docs=true 200 aborted 17983
[error] 2020-07-23T03:00:45.960831Z couchdb@x.x.x.x <0.10328.1588> 3ece2982af Response abnormally terminated: timeout

[notice] 2020-08-04T03:00:30.455723Z couchdb@x.x.x.x <0.4075.5619> 9f5b0c41b5 couchdb-service:5984 1.2.3.4 user GET /database2/_all_docs?include_docs=true 200 aborted 20067
[error] 2020-08-04T03:00:30.455758Z couchdb@x.x.x.x <0.4075.5619> 9f5b0c41b5 Response abnormally terminated: timeout
@denyeart
Copy link

We see the same intermittent problem in CouchDB 3.1.0 (single node). _all_docs requests are returning fine (1000 records each in about 300ms) and then all of a sudden we get a 200 aborted on the next similar query after about 5 seconds:

[notice] 2020-10-17T07:22:18.239062Z nonode@nohost <0.24353.1500> d1c70dc3f1 couchdb-peer0-org1:5984 172.30.188.173 user GET /testorgschannel0_mapcc$$hcollection$simple/_all_docs?attachments=true&include_docs=true&inclusive_end=false&limit=1001&startkey=%22l%2Bxj9Mibwp%2F%2FQ9ZWKb49wURav%2FV7Qll4YkWogbSwhrQ%3D%22 200 aborted 5219
[error] 2020-10-17T07:22:18.239114Z nonode@nohost <0.24353.1500> d1c70dc3f1 Response abnormally terminated: timeout

A second later another query fails with a different error:

[error] 2020-10-17T07:22:19.131115Z nonode@nohost <0.15619.1999> 39d3b723e2 req_err(1577439538) internal_server_error : No DB shards could be opened.
    [<<"fabric_util:get_shard/4 L111">>,<<"fabric_util:get_shard/4 L128">>,<<"fabric_util:get_shard/4 L128">>,<<"fabric:get_security/2 L183">>,<<"chttpd_auth_request:db_authorization_check/1 L110">>,<<"chttpd_auth_request:authorize_request/1 L19">>,<<"chttpd:handle_req_after_auth/2 L321">>,<<"chttpd:process_request/1 L306">>]

And then subsequent queries are fine again.

I agree that CouchDB should not return a 200 status for an error response.

@denyeart
Copy link

After a discussion on CouchDB slack, it appears that there is a new api option ?buffer_response=true on view queries in v3.1.1 that ensures that the response is complete before returning back a 200. If not set the default is for the response to be streaming in which case a 200 is returned even while the response is streaming back (and may eventually abort). Given the new enhancement, I think we could close this issue (although would be good to get the API docs updated to describe this behavior).

@janl janl transferred this issue from apache/couchdb Mar 14, 2021
@janl janl changed the title GET documents aborted and timeout - query all documents does not return all data Document query_response option Mar 14, 2021
@wohali
Copy link
Member

wohali commented Mar 29, 2021

FYI this is also available as a server-wide configuration, [chttpd] buffer_response.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants