Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frozen riak stop [JIRA: RIAK-1663] #685

Closed
cipy opened this issue Feb 19, 2015 · 10 comments

Comments

@cipy
Copy link

commented Feb 19, 2015

Hi,

riak stop freezes with the below error instead of stopping correctly

2015-02-19 10:59:38.299 [info] <0.96.0>@riak_core_sysmon_handler:handle_event:97 Monitor got {suppressed,port_events,1}
2015-02-19 10:59:38.730 [error] <0.11487.0> Supervisor {<0.11487.0>,poolboy_sup} had child riak_repl_fullsync_worker started with riak_repl_fullsync_worker:start_link([{name,{local,riak_repl2_rtsink_pool}},{worker_module,riak_repl_fullsync_worker},{worker_args,[]},...]) at undefined exit with reason shutdown in context shutdown_error
2015-02-19 10:59:38.731 [info] <0.1441.0>@riak_repl_app:stop:119 Stopped application riak_repl
2015-02-19 10:59:38.744 [info] <0.798.0>@riak_kv_app:prep_stop:209 Stopping application riak_kv - marked service down.
2015-02-19 10:59:38.745 [info] <0.798.0>@riak_kv_app:prep_stop:213 Unregistered pb services
2015-02-19 10:59:38.745 [info] <0.798.0>@riak_kv_app:prep_stop:218 unregistered webmachine routes
2015-02-19 10:59:38.745 [info] <0.798.0>@riak_kv_app:prep_stop:220 all active put FSMs completed
2015-02-19 10:59:38.775 [info] <0.876.0>@riak_kv_js_vm:terminate:240 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.876.0>)
2015-02-19 10:59:38.775 [info] <0.883.0>@riak_kv_js_vm:terminate:240 Spidermonkey VM (pool: riak_kv_js_hook) host stopping (<0.883.0>)

@cipy cipy added the Bug label Feb 19, 2015

@cipy cipy added this to the 1.4.12 milestone Feb 19, 2015

@cmeiklejohn

This comment has been minimized.

Copy link
Contributor

commented Feb 19, 2015

This bug should be filed against riak_ee.

@ssylvester87

This comment has been minimized.

Copy link

commented Apr 1, 2015

I have observed this behavior when all nodes in a multi-cluster realtime-replication configuration are issued a stop command simultaneously (via an automation tool such as Ansible). Failed repl leadership negotiation seems to cause a race condition when the operation times out and riak stop never finishes.

@Basho-JIRA Basho-JIRA changed the title frozen riak stop frozen riak stop [JIRA: RIAK-1663] Apr 1, 2015

@Basho-JIRA

This comment has been minimized.

Copy link

commented Apr 7, 2015

An ESL customer has been experiencing a similar issue. Attaching riak-debug.

_[posted via JIRA by Bryan Hunt]_

@Basho-JIRA

This comment has been minimized.

Copy link

commented Apr 28, 2015

#685

_[posted via JIRA by Deborah Rakow]_

@seancribbs seancribbs removed their assignment May 8, 2015

@Basho-JIRA

This comment has been minimized.

Copy link

commented Jun 26, 2015

Jon M confirmed code merged to 2.0.6 on 6/25/15. Dev is done. Now just waiting to go into additional releases.

_[posted via JIRA by Patricia Brewer]_

@Basho-JIRA

This comment has been minimized.

Copy link

commented Jun 29, 2015

Can the fix or commit be linked on this ticket?

_[posted via JIRA by Dan Brown]_

@Basho-JIRA

This comment has been minimized.

Copy link

commented Jul 9, 2015

This is linked to GH Riak/685. [~dbrown] are you looking for something different?

_[posted via JIRA by Patricia Brewer]_

@Basho-JIRA

This comment has been minimized.

Copy link

commented Jul 9, 2015

This is linked to GH Riak/685. [~dbrown] are you looking for something different? This fix didn't go into 2.0.6. Needs to go in a future release.

_[posted via JIRA by Patricia Brewer]_

@Basho-JIRA

This comment has been minimized.

Copy link

commented Jul 21, 2015

Thanks Patricia. I would like to know which version of Riak this will be available in and would like a link to the commit (not the GH issue) of the code change that fixes this.

_[posted via JIRA by Dan Brown]_

@Basho-JIRA

This comment has been minimized.

Copy link

commented Mar 4, 2016

It's been long enough that Jon M and others do not remember the specific context of any fix which may or may not have been merged into riak_kv/leveldb/eleveldb. There doesn't seem to be any commits in the time range of this ticket which would make sense as being a fix for this issue.

The attached riak_debug shows a riak node that is out of disk space failing to stop at all (much less gracefully.) It's difficult to say from the debug whether the "out of disk space" is the root cause of the failure or if there's some deeper defect for which the disk space problem is coincidental.

For now we should close this ticket and open a backlog item about the out of disk space issue (which requires investigation and a debug session) and if this reoccurs in the future, we ought to open a fresh ticket.

_[posted via JIRA by Mark Allen]_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.