-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
Description
Description
During backups we create a JSON backup of each database using [DB]/_all_docs?include_docs=true&attachments=true so we can search backups and have a binary neutral format.
Steps to Reproduce
- insert a document into a database
- (using fauxton) add 1-2 large attachments to it (at least 2mb starts failures, while by 40mb or so they are pretty certain.) (I used a 6mb png and 30mb pdf in my most recent test.)
- all_docs with include_docs and attachments begins timing out in 5 seconds.
- tweaking all of the new fabric/etc timeout settings and max sizes seems to do nothing.
[notice] 2019-09-26T15:07:49.207007Z couchdb@couchregistrar <0.29645.4> f02116e9b2 couchregistrar:5984 10.1.3.1 adminuser GET /u_82fde7554b/_all_docs?include_docs=true&attachments=true 200 aborted 5096
[error] 2019-09-26T15:07:49.207246Z couchdb@couchregistrar <0.29645.4> f02116e9b2 Response abnormally terminated: timeout
changing the time per item in recv seems to be necessary since a single large attachment can take several seconds:
diff --git a/src/fabric/src/fabric_view_all_docs.erl b/src/fabric/src/fabric_view_all_docs.erl
index 30c8e8d51..198f29541 100644
--- a/src/fabric/src/fabric_view_all_docs.erl
+++ b/src/fabric/src/fabric_view_all_docs.erl
@@ -125,7 +125,7 @@ go(DbName, _Options, Workers, QueryArgs, Callback, Acc0) ->
update_seq = case UpdateSeq of true -> []; false -> nil end
},
case rexi_utils:recv(Workers, #shard.ref, fun handle_message/3,
- State, infinity, 5000) of
+ State, infinity, 50000) of
{ok, NewState} ->
{ok, NewState#collector.user_acc};
{timeout, NewState} ->
Expected Behaviour
- I think it should be possible to tune one timeout in the configuration and have all unspecified timeouts match the timeout rather than needing to tune each newly invented timeout to prevent existing systems from failing on update based on rather random specifics of our data and infrastructure.
- The log should report which timeout was hit. There are now more than a dozen possible things "timeout" could mean that can be configured and a few that can not.
Your Environment
- CouchDB Version used: 2.3.1-17f8c2265
- Browser name and version: chrome 75
- Operating System and version: debian 9 (fix tested on amd64 node, bug on both amd65 and arm7 nodes.)
Additional context
Fix badmatch in fabric_view_all_docs #2153 looks similar, but seems to be for the call with key arguments.