Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behabior in Views and listing all documents, possible error -> mfa: fabric_rpc:all_docs/3 error:invalid_compression #2307

Closed
af-silva opened this issue Nov 11, 2019 · 6 comments

Comments

@af-silva
Copy link

Description

I will try to explain the best I can the behavior couch is displaying right now. I have a database with 111455 documents. In this database we have a 34 views configured, and it was in working conditions until a few days ago, now we have been experience some problems accessing some of the views and trying listing all the documents (one thing to have in mind is that some of the views work...)

Here is the log from the couch server:

[error] 2019-11-11T10:54:55.185501Z couchdb@127.0.0.1 <0.30219.528> -------- 
gen_server <0.30219.528> terminated with reason: no match of right hand value eof at couch_file:read_raw_iolist_int/3(line:627) <= couch_file:handle_call/3(line:449) <= gen_server:try_handle_call/4(line:615) <= gen_server:handle_msg/5(line:647) <= proc_lib:init_p_do_apply/3(line:247)
  last msg: {pread_iolist,169636056}
     state: [{data,[{"State",{file,{file_descriptor,prim_file,{#Port<0.148544>,758}},false,169976024,#Ref<0.0.71565313.2171>,infinity}},{"InitialFilePath","./data/shards/60000000-7fffffff/pt_2018_reports.1553264630.couch"}]}]
    extra: []


[error] 2019-11-11T10:54:55.185727Z couchdb@127.0.0.1 <0.30219.528> -------- CRASH REPORT Process  (<0.30219.528>) with 1 neighbors exited with reason: no match of right hand value eof at couch_file:read_raw_iolist_int/3(line:627) <= couch_file:handle_call/3(line:449) <= gen_server:try_handle_call/4(line:615) <= gen_server:handle_msg/5(line:647) <= proc_lib:init_p_do_apply/3(line:247) at gen_server:terminate/7(line:812) <= proc_lib:init_p_do_apply/3(line:247); initial_call: {couch_file,init,['Argument__1']}, ancestors: [<0.24437.529>,<0.20258.529>], messages: [], links: [<0.24437.529>], dictionary: [{couch_file_fd,{{file_descriptor,prim_file,{#Port<0.148544>,758}},"./..."}},...], trap_exit: false, status: running, heap_size: 987, stack_size: 27, reductions: 64488

[error] 2019-11-11T10:54:55.186036Z couchdb@127.0.0.1 emulator -------- Error in process <0.2091.531> on node 'couchdb@127.0.0.1' with exit value:
{{badmatch,{'EXIT',{{badmatch,eof},[{couch_file,read_raw_iolist_int,3,[{file,"src/couch_file.erl"},{line,627}]},{couch_file,handle_call,3,[{file,"src/couch_file.erl"},{line,449}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,615}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,647}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}}},[{couch_file,pread_binary,2,[{file,"src/couch_file.erl"},{line,169}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,157}]},{couch_bt_engine,read_doc_body,2,[{file,"src/couch_bt_engine.erl"},{line,343}]},{couch_db,make_doc,5,[{file,"src/couch_db.erl"},{line,1699}]},{couch_db,open_doc_int,3,[{file,"src/couch_db.erl"},{line,1630}]},{couch_index_updater,'-update/3-fun-2-',5,[{file,"src/couch_index_updater.erl"},{line,170}]},{couch_index_updater,'-update/3-fun-3-',7,[{file,"src/couch_index_updater.erl"},{line,180}]},{couch_bt_engine,drop_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1081}]}]}


[error] 2019-11-11T10:54:55.186352Z couchdb@127.0.0.1 <0.13911.531> 437a055316 rexi_server: from: couchdb@127.0.0.1(<0.25557.530>) mfa: fabric_rpc:map_view/5 throw:{{badmatch,{'EXIT',{{badmatch,eof},[{couch_file,read_raw_iolist_int,3,[{file,"src/couch_file.erl"},{line,627}]},{couch_file,handle_call,3,[{file,"src/couch_file.erl"},{line,449}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,615}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,647}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}}},[{couch_file,pread_binary,2,[{file,"src/couch_file.erl"},{line,169}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,157}]},{couch_bt_engine,read_doc_body,2,[{file,"src/couch_bt_engine.erl"},{line,343}]},{couch_db,make_doc,5,[{file,"src/couch_db.erl"},{line,1699}]},{couch_db,open_doc_int,3,[{file,"src/couch_db.erl"},{line,1630}]},{couch_index_updater,'-update/3-fun-2-',5,[{file,"src/couch_index_updater.erl"},{line,170}]},{couch_index_updater,'-update/3-fun-3-',7,[{file,"src/couch_index_updater.erl"},{line,180}]},{couch_bt_engine,drop_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1081}]}]} [{couch_mrview_util,get_view_index_state,5,[{file,"src/couch_mrview_util.erl"},{line,137}]},{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,81}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,247}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]

[error] 2019-11-11T10:54:55.187343Z couchdb@127.0.0.1 <0.25557.530> 437a055316 req_err(4089041121) {badmatch,{'EXIT',{{badmatch,eof},
                   [{couch_file,read_raw_iolist_int,3,
                                [{file,"src/couch_file.erl"},{line,627}]},
                    {couch_file,handle_call,3,
                                [{file,"src/couch_file.erl"},{line,449}]},
                    {gen_server,try_handle_call,4,
                                [{file,"gen_server.erl"},{line,615}]},
                    {gen_server,handle_msg,5,
                                [{file,"gen_server.erl"},{line,647}]},
                    {proc_lib,init_p_do_apply,3,
                              [{file,"proc_lib.erl"},{line,247}]}]}}} : [{couch_file,pread_binary,2,[{file,"src/couch_file.erl"},{line,169}]},
 {couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,157}]},
 {couch_bt_engine,read_doc_body,2,
                  [{file,"src/couch_bt_engine.erl"},{line,343}]},
 {couch_db,make_doc,5,[{file,"src/couch_db.erl"},{line,1699}]},
 {couch_db,open_doc_int,3,[{file,"src/couch_db.erl"},{line,1630}]},
 {couch_index_updater,'-update/3-fun-2-',5,
                      [{file,"src/couch_index_updater.erl"},{line,170}]},
 {couch_index_updater,'-update/3-fun-3-',7,
                      [{file,"src/couch_index_updater.erl"},{line,180}]},
 {couch_bt_engine,drop_reductions,4,
                  [{file,"src/couch_bt_engine.erl"},{line,1081}]}]
    [<<"couch_mrview_util:get_view_index_state/5 L137">>,<<"couch_mrview_util:get_view/4 L81">>,<<"couch_mrview:query_view/6 L247">>,<<"rexi_server:init_p/3 L140">>]

[info] 2019-11-11T10:54:55.187624Z couchdb@127.0.0.1 <0.218.0> -------- db shards/60000000-7fffffff/pt_2018_reports.1553264630 died with reason {{badmatch,eof},[{couch_file,read_raw_iolist_int,3,[{file,"src/couch_file.erl"},{line,627}]},{couch_file,handle_call,3,[{file,"src/couch_file.erl"},{line,449}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,615}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,647}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]} 

I have removed the [notice] tags for brevity.

Some things that caught my eye were:

1st: CRASH REPORT Process (<0.30219.528>) with 1 neighbors exited with reason: no match of right hand value eof at couch_file:read_raw_iolist_int/3(line:627)

2nd: rexi_server: from: couchdb@127.0.0.1(<0.25557.530>) mfa: fabric_rpc:map_view/5 throw:{{badmatch,{'EXIT',{{badmatch,eof} ...

3rd: When executing the //_all_docs I receive SyntaxError: JSON.parse: end of data when ',' or ']' was expected at line 2367 column 97 of the JSON data and when looking at the couch logs I found this:

[error] 2019-11-11T12:39:26.521064Z couchdb@127.0.0.1 <0.16819.1> d0dd1f1c8f rexi_server: from: couchdb@127.0.0.1(<0.7780.1>) mfa: fabric_rpc:all_docs/3 error:invalid_compression [{couch_compress,decompress,1,
[{file,"src/couch_compress.erl"},{line,71}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,158}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,434}]},{couch_btree,stream_node,7,
[{file,"src/couch_btree.erl"},{line,761}]},{couch_btree,stream_kp_node,7,[{file,"src/couch_btree.erl"},{line,775}]},{couch_btree,stream_kp_node,8,[{file,"src/couch_btree.erl"},{line,819}]},{couch_btree,fold,4,
[{file,"src/couch_btree.erl"},{line,222}]},{couch_bt_engine,fold_docs_int,5,[{file,"src/couch_bt_engine.erl"},{line,1045}]}]

here I notice tha mfa: fabric_rpc:all_docs/3 error:invalid_compression.

From the above information my thoughts are that something is not right with the fabric_rpc call, but I have no clue at what is happening to originate this behavior. Some help would be appreciated if possible.

Steps to Reproduce

When trying to access
https://<server address>/<database>/_all_docs I receive -> SyntaxError: JSON.parse: end of data when ',' or ']' was expected at line 2367 column 97 of the JSON data

and

https://<server address>/<database>/_design/<design document>/_view/<view> I receive -> {"error":"unknown_error","reason":"shutdown","ref":4089041121}

The above commands are related to the description above.

Expected Behaviour

What I was expecting was the correct execution of the query, obtaining the requested data (aka documents)

Your Environment

Server: Ubuntu 16.04
Couchdb: 2.3.1
Browser used for tests: Firefox 71.0b8 (64-bit)

{"couchdb":"Welcome","version":"2.3.1","git_sha":"c298091a4","uuid":"7909e5447fbc83e0699155a53b776bf9","features":["pluggable-storage-engines","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}
@davisp
Copy link
Member

davisp commented Dec 17, 2019

Are you by chance storing your data directories on an NFS mount? There was a similar report recently that is similar to this.

The file ./data/shards/60000000-7fffffff/pt_2018_reports.1553264630.couch appears to have been corrupted somehow. The easiest fix would be to move that file aside to something like ./data/shards/60000000-7fffffff/pt_2018_reports.1553264630.couch.old and then let internal replication recreate it (which should happen automatically).

If you are using NFS then the long term fix would be to move your data directories to a locally mounted file system. NFS fakes POSIX compliance in ways that break the requirements for CouchDB.

@af-silva
Copy link
Author

af-silva commented Dec 19, 2019

Hi @davisp thanks for the insight.

I think you are right, the problem was that somehow the shard appears to have been corrupted. In the meanwhile we have recovered some of the data by reverting to a previous backup.

Our current set up is a single virtual machine stored in an Hyper-V server. We are at the moment starting to think that a better choice would be to configure a CouchDB cluster inside the Hyper-V server connected to another an external instance, i.e., three couchdb instances working in parallel. The problem was that our server was shutdown by a power cut, whiteout notice, and the UPS's were not able to handle the load and also failed.

But nonetheless, our virtual machine is Debian based and only holds the CouchDB server, so the file systems is ext4 and then the hyper-v image of the machine, this one yes, is stored in the NFS file system of the server.

We just don't know or understand how the file become corrupted... this never happened, and we have a previous similar situation....

If you have any advice we are thankful,
Best regards, and thanks for the answer.

@davisp
Copy link
Member

davisp commented Dec 19, 2019

The only times I've ever seen file corruption in CouchDB is hardware failure, operator error (i.e., someone truncated the really wrong thing), or storing .couch files on NFS mounts. I don't know enough about Hyper-V to have a solid opinion but it sounds like you've got ext4 inside the VM and outside those writes are going to NFS? If so, that coupled with your power outage is likely the cause of the issue.

@af-silva
Copy link
Author

Hi @davisp ,

but it sounds like you've got ext4 inside the VM and outside those writes are going to NFS?

Yap you are right, but i think (i'm also not an expert in Hyper-V) that the image file of the VM is it self saved as a whole in the NFS filesystem, so if this file was corrupted the virtual machine would not even load.
My objective with this issue was to understand what happened and try to mitigate as much as possible that it occurs again. Also, I'm still learning the the ropes around CouchDB and this error could be a miss configuration or a bad architecture choice (at least we have identified one).

But Thanks for the reply and your insight into this issue.
Do you wish me toclose the issue, or leave it for discussion ??

@davisp
Copy link
Member

davisp commented Dec 20, 2019

@af-silva Hiya!

Doing a quick Google it looks like there are various recommendations about data consistency from within a Hyper-V guest [1]. I'd give a look through some of those to see if your Hyper-V configuration is setup properly. I'd pay particularly close attention to the last couple bullet points on ways that the host needs to be configured to honor a guest's write flushing and so on. Also a heads up that you may want to also Google for hosting MySQL or PostgreSQL in Hyper-V as they'll likely have more hits on how to configure the Hyper-V host properly.

I'll leave this ticket open incase you find any settings that need tweaked that we might want to add to docs somewhere.

[1] https://support.microsoft.com/en-us/help/2801713/hyper-v-storage-caching-layers-and-implications-for-data-consistency

@wohali
Copy link
Member

wohali commented Mar 13, 2020

Closing due to no response for 3 months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants