Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication fails with 500 in couch 1.6.0 #912

Closed
azul opened this issue Oct 24, 2017 · 3 comments
Closed

Replication fails with 500 in couch 1.6.0 #912

azul opened this issue Oct 24, 2017 · 3 comments

Comments

@azul
Copy link

azul commented Oct 24, 2017

We regularly replicate some of our couch databases on the same couch to a different database. Occasionally this fails with a 500 error code.

Expected Behavior

Replication should run as usual and in this particular case it should replicate tokens_581 database to tokens_582

Current Behavior

Request to start the replication results in a 500 http response code. Couch logs look like this:

[Fri, 20 Oct 2017 00:00:07 GMT] [info] [<0.790.723>] 127.0.0.1 - - HEAD /tokens_582 404
[Fri, 20 Oct 2017 00:00:07 GMT] [info] [<0.790.723>] 127.0.0.1 - - PUT /tokens_582 201
[Fri, 20 Oct 2017 00:00:07 GMT] [info] [<0.790.723>] 127.0.0.1 - - PUT /tokens_582/_design/rotation_filter 201
[Fri, 20 Oct 2017 00:00:07 GMT] [info] [<0.790.723>] 127.0.0.1 - - GET /tokens_582/_design/Token 404
[Fri, 20 Oct 2017 00:00:07 GMT] [info] [<0.790.723>] 127.0.0.1 - - PUT /tokens_582/_design/Token 201
[Fri, 20 Oct 2017 00:00:07 GMT] [error] [<0.18896.725>] Could not open file /var/lib/couchdb/tokens.couch: no such file or directory
[Fri, 20 Oct 2017 00:00:07 GMT] [info] [<0.8646.722>] 127.0.0.1 - - HEAD /tokens 404
[Fri, 20 Oct 2017 00:00:07 GMT] [info] [<0.2371.726>] 127.0.0.1 - - HEAD /tokens_581 200
[Fri, 20 Oct 2017 00:00:07 GMT] [info] [<0.790.723>] 127.0.0.1 - - PUT /tokens_581/_design/rotation_filter 201
[Fri, 20 Oct 2017 00:00:07 GMT] [info] [<0.15850.724>] Retrying HEAD request to http://admin:*****@localhost:4000/tokens_582/ in 0.25 seconds due to error {conn_failed,{error,econnrefused}}
[Fri, 20 Oct 2017 00:00:07 GMT] [info] [<0.15850.724>] Retrying HEAD request to http://admin:*****@localhost:4000/tokens_582/ in 0.5 seconds due to error {conn_failed,{error,econnrefused}}
[Fri, 20 Oct 2017 00:00:08 GMT] [info] [<0.15850.724>] Retrying HEAD request to http://admin:*****@localhost:4000/tokens_582/ in 1.0 seconds due to error {conn_failed,{error,econnrefused}}
[Fri, 20 Oct 2017 00:00:09 GMT] [info] [<0.15850.724>] Retrying HEAD request to http://admin:*****@localhost:4000/tokens_582/ in 2.0 seconds due to error {conn_failed,{error,econnrefused}}
[Fri, 20 Oct 2017 00:00:11 GMT] [info] [<0.15850.724>] Retrying HEAD request to http://admin:*****@localhost:4000/tokens_582/ in 4.0 seconds due to error {conn_failed,{error,econnrefused}}
[Fri, 20 Oct 2017 00:00:15 GMT] [info] [<0.15850.724>] Retrying HEAD request to http://admin:*****@localhost:4000/tokens_582/ in 8.0 seconds due to error {conn_failed,{error,econnrefused}}
[Fri, 20 Oct 2017 00:00:23 GMT] [info] [<0.15850.724>] Retrying HEAD request to http://admin:*****@localhost:4000/tokens_582/ in 16.0 seconds due to error {conn_failed,{error,econnrefused}}
[Fri, 20 Oct 2017 00:00:37 GMT] [info] [<0.790.723>] 127.0.0.1 - - POST /_replicate 500
[Fri, 20 Oct 2017 00:00:37 GMT] [error] [<0.22669.724>] ** Generic server <0.22669.724> terminating
** Last message in was {'EXIT',<0.15850.724>,killed}
** When Server state == {state,"http://admin:...@localhost:4000/tokens_582/",
                               20,[],[],
                               {[],[]}}
** Reason for termination ==
** killed

[Fri, 20 Oct 2017 00:00:37 GMT] [error] [<0.22669.724>] {error_report,<0.31.0>,
                         {<0.22669.724>,crash_report,
                          [[{initial_call,
                             {couch_replicator_httpc_pool,init,
                              ['Argument__1']}},
                            {pid,<0.22669.724>},
                            {registered_name,[]},
                            {error_info,
                             {exit,killed,
                              [{gen_server,terminate,6,
                                [{file,"gen_server.erl"},{line,737}]},
                               {proc_lib,init_p_do_apply,3,
                                [{file,"proc_lib.erl"},{line,237}]}]}},
                            {ancestors,
                             [<0.15850.724>,couch_replicator_job_sup,
                              couch_primary_services,couch_server_sup,
                              <0.32.0>]},
                            {messages,[]},
                            {links,[]},
                            {dictionary,[]},
                            {trap_exit,true},
                            {status,running},
                            {heap_size,376},
                            {stack_size,27},
                            {reductions,488}],
                           []]}}
[Fri, 20 Oct 2017 00:00:37 GMT] [error] [<0.790.723>] httpd 500 error response:
 {"error":"timeout"}

Maybe this is fixed already in a later version of couch. Upgrading will take some time and as i said this is non-deterministic for us. But I thought i'd report it anyway. Maybe it's still an issue and clear from the error message what is happening.

Context

The filter mentioned in the logs is:

{"_id":"_design/rotation_filter","_rev":"1-d4f7b95e8ad4e8f6d3721038178dd428","filters":{"not_expired":"function(doc, req) {
  if (doc._deleted) {
    return false;
  } else if (typeof(doc.last_seen_at) != "undefined") {
    return Date.now() < (new Date(doc.last_seen_at)).getTime() + 3600;
  } else {
    return true;
  }
}"}}

We are currently rotating our databases with temporary data to keep them from growing due to deleted records. We're aware that this is not what couch is meant for and will migrate to a different solution in the long run.

Your Environment

  • Version used: 1.6.0
  • Operating System and version (desktop or mobile): debian jessie linux
  • Link to your project: https://leap.se
@janl
Copy link
Member

janl commented Oct 24, 2017

Can you show us your replication definition?

{error,econnrefused} suggests that the source or target isn’t available.

@janl
Copy link
Member

janl commented Oct 27, 2017

Please re-open with additional info, closing in the meantime, thanks!

@janl janl closed this as completed Oct 27, 2017
@azul
Copy link
Author

azul commented Nov 2, 2017

Thanks for the hint @janl. Had a look at the replication definition:

{
"source":"tokens_582",
"filter":"rotation_filter/not_expired",
"target":"http://admin:...@localhost:5000/tokens_583",
"continuous":true
}

Interesting part is the port that corresponds to an stunnel from the couch client to the server. So this port might not work for server to server communication. Funny thing is ... it works most of the time - only failed once. Maybe the stunnel was down. Investigating this on our side. keeping this issue closed until i can rule out a missing stunnel / wrong port number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants