Skip to content

Force-remove scenario leaving node stuck in ring. #458

Closed
lukebakken opened this Issue Nov 22, 2013 · 3 comments

2 participants

@lukebakken
Basho Technologies member

Reproduction steps:

  • Build out cluster of N nodes. Mis-configure one node so that -name in vm.args remains as riak@127.0.0.1. Add all nodes to cluster and commit cluster plan.

  • Stop riak@127.0.0.1 node.

  • Use riak-admin force-remove -f riak@127.0.0.1 on another node.

  • Re-configure the riak@127.0.0.1 node to have the correct -name in vm.args and remove ring/ data. Re-start node, re-join it to the cluster, and re-commit the cluster plan.

  • Use riak attach to attach to another node in the cluster and rp() the ring. You will see that riak@127.0.0.1 remains in the ring indefinitely.

One Riak EE user reports that the presence of riak@127.0.0.1 causes errors after setting up fullsync repl and executing the fullsync. Please see ZD ticket 6100 for details.

@jrwest
jrwest commented Dec 6, 2013

@lukebakken just wondering if you had made any more progress on that ticket that might shed light on this issue. iirc we discussed that the 'riak@127.0.0.1' values remained in the vector clock (and the vector clocks in the seen set) but could not be found elsewhere and this was determined not to be a problem.

Does the scenario you describe leave membership in an incorrect state or was it just the leftover traces of the old node name that was the concern? If its the latter can this issue be closed?

@lukebakken
Basho Technologies member

@jrwest - I don't think that this is related to 6100 anymore. However, I'm going to wait until that ticket is closed to be sure.

Otherwise, this just appears to be a super-low-priority big.

@jrwest
jrwest commented Mar 24, 2014

I'm going to go ahead and close this issue. from what I can tell the problem either lived in repl's handling of the ring or not at all. The actor ids from old nodes will certainly live on the vector clocks and should not be a problem. happy to re-open if we run into it again.

@jrwest jrwest closed this Mar 24, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.