New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EUnit couchdb_os_process_pool timeouts #631

Closed
wohali opened this Issue Jul 3, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@wohali
Member

wohali commented Jul 3, 2017

Seeing a single instance of this failure in Travis so far. (This is a recurrence of COUCHDB-3419.) Need someone to go through the couch.log for me please.

Makefile Log

module 'couchdb_os_proc_pool'
  OS processes pool tests
    couchdb_os_proc_pool:50: should_block_new_proc_on_full_pool...*failed*
in function couchdb_os_proc_pool:'-should_block_new_proc_on_full_pool/0-fun-0-'/2 (test/couchdb_os_proc_pool.erl, line 55)
in call from couchdb_os_proc_pool:'-should_block_new_proc_on_full_pool/0-fun-13-'/0 (test/couchdb_os_proc_pool.erl, line 55)
**error:{assertEqual_failed,[{module,couchdb_os_proc_pool},
                     {line,55},
                     {expression,"ping_client ( Client1 )"},
                     {expected,ok},
                     {value,timeout}]}

Instance(s)

@wohali wohali added the testsuite label Jul 3, 2017

@wohali wohali added this to the 2.1.0 milestone Jul 3, 2017

@wohali

This comment has been minimized.

Show comment
Hide comment
@wohali

wohali Jul 4, 2017

Member
[info] 2017-06-06T15:01:57.314634Z nonode@nohost <0.8966.0> -------- alarm_handler: {set,{system_memory_high_watermark,[]}}
[info] 2017-06-06T15:01:57.348491Z nonode@nohost <0.8982.0> -------- Apache CouchDB has started. Time to relax.

[info] 2017-06-06T15:01:57.348709Z nonode@nohost <0.8982.0> -------- Apache CouchDB has started on http://127.0.0.1:33103/
[info] 2017-06-06T15:01:57.348886Z nonode@nohost <0.7.0> -------- Application couch started on node nonode@nohost
[notice] 2017-06-06T15:01:57.349079Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-06-06T15:01:57.349397Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-06-06T15:01:58.353276Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-06-06T15:01:58.353573Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-06-06T15:01:59.357433Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-06-06T15:01:59.514736Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-06-06T15:02:00.519414Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-06-06T15:02:00.519855Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-06-06T15:02:00.520260Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_idle_limit set to 1 for reason nil
[info] 2017-06-06T15:02:01.527024Z nonode@nohost <0.7.0> -------- Application couch exited with reason: stopped

Looks like we're possibly running out of memory in the Travis instances now, too. :(

Member

wohali commented Jul 4, 2017

[info] 2017-06-06T15:01:57.314634Z nonode@nohost <0.8966.0> -------- alarm_handler: {set,{system_memory_high_watermark,[]}}
[info] 2017-06-06T15:01:57.348491Z nonode@nohost <0.8982.0> -------- Apache CouchDB has started. Time to relax.

[info] 2017-06-06T15:01:57.348709Z nonode@nohost <0.8982.0> -------- Apache CouchDB has started on http://127.0.0.1:33103/
[info] 2017-06-06T15:01:57.348886Z nonode@nohost <0.7.0> -------- Application couch started on node nonode@nohost
[notice] 2017-06-06T15:01:57.349079Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-06-06T15:01:57.349397Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-06-06T15:01:58.353276Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-06-06T15:01:58.353573Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-06-06T15:01:59.357433Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-06-06T15:01:59.514736Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-06-06T15:02:00.519414Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-06-06T15:02:00.519855Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-06-06T15:02:00.520260Z nonode@nohost <0.8831.0> -------- config: [query_server_config] os_process_idle_limit set to 1 for reason nil
[info] 2017-06-06T15:02:01.527024Z nonode@nohost <0.7.0> -------- Application couch exited with reason: stopped

Looks like we're possibly running out of memory in the Travis instances now, too. :(

wohali added a commit that referenced this issue Jul 4, 2017

@wohali

This comment has been minimized.

Show comment
Hide comment
@wohali

wohali Jul 5, 2017

Member

In the build for the PR I submitted, we actually still get a failure: https://travis-ci.org/apache/couchdb/jobs/250141409

The couch.log still shows very little data, except for that worrying high memory watermark:

[info] 2017-07-04T21:58:23.790382Z nonode@nohost <0.9522.0> -------- alarm_handler: {set,{system_memory_high_watermark,[]}}
[info] 2017-07-04T21:58:24.495814Z nonode@nohost <0.9537.0> -------- Apache CouchDB has started on http://127.0.0.1:39669/
[info] 2017-07-04T21:58:24.496132Z nonode@nohost <0.31.0> -------- Application couch started on node nonode@nohost
[notice] 2017-07-04T21:58:24.496476Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-07-04T21:58:24.497220Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-07-04T21:58:27.501516Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-07-04T21:58:27.502859Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[error] 2017-07-04T21:58:29.521926Z nonode@nohost <0.9599.0> -------- OS Process Error <0.9604.0> :: {os_process_error,"OS process timed out."}
[info] 2017-07-04T21:58:29.522011Z nonode@nohost <0.9548.0> -------- couch_proc_manager <0.9604.0> died normal
[info] 2017-07-04T21:58:29.528694Z nonode@nohost <0.9548.0> -------- couch_proc_manager <0.9603.0> died normal
[error] 2017-07-04T21:58:29.528825Z nonode@nohost <0.9598.0> -------- OS Process Error <0.9603.0> :: {os_process_error,"OS process timed out."}
[info] 2017-07-04T21:58:29.540705Z nonode@nohost <0.9548.0> -------- couch_proc_manager <0.9605.0> died normal
[error] 2017-07-04T21:58:29.540817Z nonode@nohost <0.9597.0> -------- OS Process Error <0.9605.0> :: {os_process_error,"OS process timed out."}
[notice] 2017-07-04T21:58:30.507581Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-07-04T21:58:30.508469Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-07-04T21:58:33.512546Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-07-04T21:58:33.512954Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-07-04T21:58:33.513273Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_idle_limit set to 1 for reason nil
[info] 2017-07-04T21:58:36.520899Z nonode@nohost <0.31.0> -------- Application couch exited with reason: stopped
[info] 2017-07-04T21:58:36.524966Z nonode@nohost <0.9522.0> -------- alarm_handler: {clear,system_memory_high_watermark}
Member

wohali commented Jul 5, 2017

In the build for the PR I submitted, we actually still get a failure: https://travis-ci.org/apache/couchdb/jobs/250141409

The couch.log still shows very little data, except for that worrying high memory watermark:

[info] 2017-07-04T21:58:23.790382Z nonode@nohost <0.9522.0> -------- alarm_handler: {set,{system_memory_high_watermark,[]}}
[info] 2017-07-04T21:58:24.495814Z nonode@nohost <0.9537.0> -------- Apache CouchDB has started on http://127.0.0.1:39669/
[info] 2017-07-04T21:58:24.496132Z nonode@nohost <0.31.0> -------- Application couch started on node nonode@nohost
[notice] 2017-07-04T21:58:24.496476Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-07-04T21:58:24.497220Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-07-04T21:58:27.501516Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-07-04T21:58:27.502859Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[error] 2017-07-04T21:58:29.521926Z nonode@nohost <0.9599.0> -------- OS Process Error <0.9604.0> :: {os_process_error,"OS process timed out."}
[info] 2017-07-04T21:58:29.522011Z nonode@nohost <0.9548.0> -------- couch_proc_manager <0.9604.0> died normal
[info] 2017-07-04T21:58:29.528694Z nonode@nohost <0.9548.0> -------- couch_proc_manager <0.9603.0> died normal
[error] 2017-07-04T21:58:29.528825Z nonode@nohost <0.9598.0> -------- OS Process Error <0.9603.0> :: {os_process_error,"OS process timed out."}
[info] 2017-07-04T21:58:29.540705Z nonode@nohost <0.9548.0> -------- couch_proc_manager <0.9605.0> died normal
[error] 2017-07-04T21:58:29.540817Z nonode@nohost <0.9597.0> -------- OS Process Error <0.9605.0> :: {os_process_error,"OS process timed out."}
[notice] 2017-07-04T21:58:30.507581Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-07-04T21:58:30.508469Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-07-04T21:58:33.512546Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_limit set to 3 for reason nil
[notice] 2017-07-04T21:58:33.512954Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_soft_limit set to 2 for reason nil
[notice] 2017-07-04T21:58:33.513273Z nonode@nohost <0.9385.0> -------- config: [query_server_config] os_process_idle_limit set to 1 for reason nil
[info] 2017-07-04T21:58:36.520899Z nonode@nohost <0.31.0> -------- Application couch exited with reason: stopped
[info] 2017-07-04T21:58:36.524966Z nonode@nohost <0.9522.0> -------- alarm_handler: {clear,system_memory_high_watermark}
@wohali

This comment has been minimized.

Show comment
Hide comment
@wohali
Member

wohali commented Jul 9, 2017

@davisp

This comment has been minimized.

Show comment
Hide comment
@davisp

davisp Jul 12, 2017

Member

I'm gonna take a stab at this by replacing os processes with native processes since we don't care about JS calls here.

Member

davisp commented Jul 12, 2017

I'm gonna take a stab at this by replacing os processes with native processes since we don't care about JS calls here.

davisp added a commit that referenced this issue Jul 12, 2017

Fix couchdb_os_proc_pool eunit timeouts
There's a theory that the low memory limits on our CI instances are
causing the tests spawning JS processes to fail. Given that we don't
need them here we can trivially exclude that as a cause of the test
failures.

Fixes #631

davisp added a commit that referenced this issue Jul 12, 2017

Fix couchdb_os_proc_pool eunit timeouts
There's a theory that the low memory limits on our CI instances are
causing the tests spawning JS processes to fail. Given that we don't
need them here we can trivially exclude that as a cause of the test
failures.

Fixes #631

@davisp davisp closed this in 94f2907 Jul 12, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment