Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Fix ring events monitoring by riak_core_node_watcher #399

Open
wants to merge 8 commits into from

3 participants

@slfritchie
Owner

Fixes #343: riak_core_node_watcher should tolerate death of riak_core_ring_events

Before this patch, the riak_core_node_watcher.erl code assumed
that if the riak_core_ring_events proc (a gen_event server) died,
then a {gen_event_EXIT,,} message would be sent. However, that
assumption is not correct. If it dies, we get a regular {EXIT,,}
message.

Tested by repeated use of alternating:

  • exit(whereis(riak_core_ring_events), kill).

... and looking at the length of the links list
of process_info(whereis(riak_core_ring_events), links) -- it should
be three, not two.

@slfritchie slfritchie Fixes #343
Before this patch, the riak_core_node_watcher.erl code assumed
that if the riak_core_ring_events proc (a gen_event server) died,
then a {gen_event_EXIT,_,_} message would be sent.  However, that
assumption is not correct.  If it dies, we get a regular {EXIT,_,_}
message.

Tested by repeated use of alternating:
* exit(whereis(riak_core_ring_events), kill).

... and looking at the length of the links list
of process_info(whereis(riak_core_ring_events), links) -- it should
be three, not two.
6be124b
@engelsanchez engelsanchez was assigned
src/riak_core_node_watcher.erl
((5 lines not shown))
Self = self(),
Fn = fun(R) ->
gen_server:cast(Self, {ring_update, R})
end,
- riak_core_ring_events:add_sup_callback(Fn).
+ riak_core_ring_events:add_sup_callback(Fn),
+ case riak_core_ring_events:get_pid() of
+ P when P == RingEventsPid ->
+ RingEventsPid;
+ _ ->
@engelsanchez Collaborator

It is possible although unlikely that at this point the gen_event server died after the call to get_pid and before the call to add_sup_callback. In that case, we are going to re-enter this function and register a second callback. Gen event is dumb and will happily add two, invoking the module twice for each event. It will also send double notifications in the event of a termination, etc. It might be safer, although hacky, to do the following instead:

  • Get list of linked processes
  • Get pid of current event mgr after registering the callback. If in the list above, all is well. Otherwise retry, possibly trying to drain any gen_event_EXIT messages from a fast dying event handler, which would cause registration to blindly happen again, possibly ending up in a double registration.

Or maybe we could protect ourselves on entry: check the pid of the current event handler. Are we linked to it? Then the gen_event_EXIT message is probably stale and we shouldn't re-register. Or go ahead and try to unregister, then register again instead.

What do you think?

@slfritchie Owner

Hrm. It's probably best if we just unregister if we're going to loop. Except that the riak_core_ring_events:add_sup_callback/1 function is silly and using make_ref() in a way that makes it impossible to delete. Hrm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@slfritchie
Owner

@engelsanchez How about now?

@engelsanchez

Minor nitpick: Better to use the assertEqual macro for a non-totally-useless error message if something breaks in the future.

@engelsanchez

Thanks for making these more useful :)

@engelsanchez
Collaborator

@slfritchie this is what is happening now: The death of the ring event handler is causing the node watcher to immediately try to reregister, causing it to most likely die because the new event handler is not up yet. I suppose that "fixes" the situation :), but would it make sense to have a less tragic ending with less possible side effects? We could make the node watcher wait for a bit for a new ring event handler to pop before trying, and then dying if that doesn't happen. Or we could let the two phoenixes come back from the ashes together and let the Erlang way do its thing.

@jrwest

@slfritchie @engelsanchez haven't looked but if there is something subtle going on here it would be nice to leave a comment so others don't spend time figuring it out down the road :).

@engelsanchez
Collaborator

@jrwest My comment above explains the current situation. I'll try to be clearer: with the changes as they stand now, killing the riak_core_ring_events process has the effect of killing the node watcher, because it now detects that death and tries to call riak_core_ring_events to re-register... but it's dead! So down goes node watcher too with a noproc error. That is something that could happen once in a while anyway: anybody calling any of the gen servers we have the moment they die, before respawning, will die too. I'm stating that this PR as it is makes the node watcher death happen very reliably (every time I tried), so perhaps making it wait a bit for riak_core_ring_events to respawn would be nicer. Is that better?

@jrwest

@engelsanchez i just meant it would be nice to have that comment in the code, if a change is not made to address it. Sorry, if I'm totally off context.

@slfritchie
Owner

Engel, did you want to see something like this?

diff --git a/src/riak_core_node_watcher.erl b/src/riak_core_node_watcher.erl
index ab5dc12..fd5a6c7 100644
--- a/src/riak_core_node_watcher.erl
+++ b/src/riak_core_node_watcher.erl
@@ -372,7 +372,9 @@ update_avsn(State) ->
     State#state { avsn = State#state.avsn + 1 }.

 watch_for_ring_events() ->
-    RingEventsPid = riak_core_ring_events:get_pid(),
+    %% Polling isn't (and cannot be) perfect, good enough 99% of the time
+    %% Our supervisor takes care of the last 1%.
+    RingEventsPid = poll_for_riak_core_ring_events_pid(10),
     Self = self(),
     Fn = fun(R) ->
                  gen_server:cast(Self, {ring_update, R})
@@ -392,6 +394,17 @@ watch_for_ring_events() ->
             watch_for_ring_events()
     end.

+poll_for_riak_core_ring_events_pid(0) ->
+    undefined;
+poll_for_riak_core_ring_events_pid(N) ->
+    case riak_core_ring_events:get_pid() of
+        Pid when is_pid(Pid) ->
+            Pid;
+        _ ->
+            timer:sleep(100),
+            poll_for_riak_core_ring_events_pid(N-1)
+    end.
+
 delete_service_mref(Id) ->
     %% Cleanup the monitor if one exists
     case erlang:get(Id) of
@slfritchie
Owner

Only X more weeks until the next review deadline. Comments?

@slfritchie
Owner

Engel or Jordan, any opinions on the diff above?

If this cross-supervisee-link thing isn't present, then an event handler that riak_core_node_watcher requires is missing, which means that it's broken. Their common supervisor is one_for_one, so we can't rely on the supervisor to solve the problem.

HRM. OK, here's a crazy idea. What if I changed the fix drastically?

  1. Remove all the racy/fiddly crap I've submitted so far.
  2. Add a middle supervisor, with all_for_one strategy.
  3. If either of these fiddly processes dies, the new sup kills the remainder, then restarts both in the required order.

Before:

              |-... many others ...
              |-riak_core_capability----
riak_core_sup-|-riak_core_node_watcher--
              |-riak_core_ring_events---
              |-riak_core_sysmon_minder-
              |-... many others ...

After:

              |-... many others ...
              |-riak_core_capability----------
              |
riak_core_sup-|-riak_core_new_ALL_FOR_ONE_sup-|-riak_core_node_watcher-
              |                               |-riak_core_ring_events--
              |-riak_core_sysmon_minder-------
              |-... many others ...
@jrwest

@slfritchie i like it. its basically implementing the workaround and seems "more OTP". @engelsanchez?

@engelsanchez
Collaborator

@slfritchie this is pretty much what I was about to propose, with the exception that instead of an all for one intermediate supervisor, I was thinking of a rest_for_one with riak_core_ring_events coming before node watcher. The death of the node watcher will be noted already by ring_events, we just need the death of ring events to bring down the node watcher and start them in that sequence again (ring_events, watcher). I would feel a lot better if we remove the hacks built while going down the rabbit hole and implement this.

@slfritchie
Owner

Alright ... so, I'm thinking that it's good to leave the riak_core_ring_events.erl changes, because I think it's good to be able to cancel those callbacks, and the old way made it impossible to do that.

The problem is, ha ha, changing the order of startup items makes Riak unstable. Oi oi, do I even computer?

@slfritchie
Owner

Background on last comment: the basic ./riak_test -c rtdev -t verify_build_cluster fails quite early:

18:07:31.768 [info] joining Node 2 to the cluster... It takes two to make a thing go right
18:07:31.792 [info] [join] 'dev2@127.0.0.1' to ('dev1@127.0.0.1'): ok
18:07:31.792 [info] Wait until all nodes are ready and there are no pending changes
18:07:31.792 [info] Wait until nodes are ready : ['dev1@127.0.0.1','dev2@127.0.0.1']
18:07:32.795 [info] Wait until all members ['dev1@127.0.0.1','dev2@127.0.0.1'] ['dev1@127.0.0.1','dev2@127.0.0.1']
18:07:32.796 [info] Wait until no pending changes on ['dev1@127.0.0.1','dev2@127.0.0.1']
[... timeout after 10 minutes...]
@slfritchie
Owner

Engel, the startup order problem ... my last commit, 6e4f156, still doesn't get it right. If riak_core_ring_events dies & is restarted, things work well. But if riak_core_node_watcher dies & is restarted, things break. Namely, there is no worker process under the riak_core_eventhandler_sup supervisor.

If I use commit 01c0ada, then the riak_core_eventhandler_sup supervisor always has a worker process.

I stared at this for a long time today and didn't understand why the supervisor tree reorganization method doesn't work correctly. So, I recommend using the 01c0ada commit.

@slfritchie
Owner

Ping?

@slfritchie
Owner

Hrm, we're stuck in code freeze prep logjam. To be revisited soon.

@jrwest

in the end this is an issue that has plagued several previous versions of Riak. Like similar ones, I'm marking this as 2.0.1 (which slips it out of the planned 2.0-RC)

@jrwest jrwest added this to the 2.0.1 milestone
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 27, 2013
  1. @slfritchie

    Fixes #343

    slfritchie authored
    Before this patch, the riak_core_node_watcher.erl code assumed
    that if the riak_core_ring_events proc (a gen_event server) died,
    then a {gen_event_EXIT,_,_} message would be sent.  However, that
    assumption is not correct.  If it dies, we get a regular {EXIT,_,_}
    message.
    
    Tested by repeated use of alternating:
    * exit(whereis(riak_core_ring_events), kill).
    
    ... and looking at the length of the links list
    of process_info(whereis(riak_core_ring_events), links) -- it should
    be three, not two.
Commits on Oct 1, 2013
  1. @slfritchie

    Address review comments

    slfritchie authored
  2. @slfritchie
Commits on Oct 4, 2013
  1. @slfritchie
Commits on Oct 15, 2013
  1. @slfritchie
Commits on Oct 16, 2013
  1. @slfritchie
  2. @slfritchie
  3. @slfritchie
This page is out of date. Refresh to see the latest.
View
1  ebin/riak_core.app
@@ -54,6 +54,7 @@
riak_core_repair,
riak_core_ring,
riak_core_ring_events,
+ riak_core_ring_events_sup,
riak_core_ring_handler,
riak_core_ring_manager,
riak_core_ring_util,
View
62 src/riak_core_ring_events.erl
@@ -31,10 +31,12 @@
add_callback/1,
add_sup_callback/1,
add_guarded_callback/1,
+ delete_handler/2,
ring_update/1,
force_update/0,
ring_sync_update/1,
- force_sync_update/0]).
+ force_sync_update/0,
+ get_pid/0]).
%% gen_event callbacks
-export([init/1, handle_event/2, handle_call/2,
@@ -42,6 +44,10 @@
-record(state, { callback }).
+-ifdef(TEST).
+-include_lib("eunit/include/eunit.hrl").
+-endif.
+
%% ===================================================================
%% API functions
%% ===================================================================
@@ -59,13 +65,22 @@ add_guarded_handler(Handler, Args) ->
riak_core:add_guarded_event_handler(?MODULE, Handler, Args).
add_callback(Fn) when is_function(Fn) ->
- gen_event:add_handler(?MODULE, {?MODULE, make_ref()}, [Fn]).
+ HandlerName = {?MODULE, make_ref()},
+ gen_event:add_handler(?MODULE, HandlerName, [Fn]),
+ HandlerName.
add_sup_callback(Fn) when is_function(Fn) ->
- gen_event:add_sup_handler(?MODULE, {?MODULE, make_ref()}, [Fn]).
+ HandlerName = {?MODULE, make_ref()},
+ gen_event:add_sup_handler(?MODULE, HandlerName, [Fn]),
+ HandlerName.
add_guarded_callback(Fn) when is_function(Fn) ->
- riak_core:add_guarded_event_handler(?MODULE, {?MODULE, make_ref()}, [Fn]).
+ HandlerName = {?MODULE, make_ref()},
+ riak_core:add_guarded_event_handler(?MODULE, HandlerName, [Fn]),
+ HandlerName.
+
+delete_handler(HandlerName, ReasonArgs) ->
+ gen_event:delete_handler(?MODULE, HandlerName, ReasonArgs).
force_update() ->
{ok, Ring} = riak_core_ring_manager:get_my_ring(),
@@ -81,6 +96,9 @@ force_sync_update() ->
ring_sync_update(Ring) ->
gen_event:sync_notify(?MODULE, {ring_update, Ring}).
+get_pid() ->
+ whereis(?MODULE).
+
%% ===================================================================
%% gen_event callbacks
%% ===================================================================
@@ -106,3 +124,39 @@ terminate(_Reason, _State) ->
code_change(_OldVsn, State, _Extra) ->
{ok, State}.
+-ifdef(TEST).
+
+delete_handler_test_() ->
+ RingMgr = riak_core_ring_manager,
+ {setup,
+ fun() ->
+ meck:new(RingMgr, [passthrough]),
+ meck:expect(RingMgr, get_my_ring,
+ fun() -> {ok, fake_ring_here} end),
+ ?MODULE:start_link()
+ end,
+ fun(_) ->
+ meck:unload(RingMgr)
+ end,
+ [
+ fun () ->
+ Name = ?MODULE:add_sup_callback(
+ fun(Arg) -> RingMgr:test_dummy_func(Arg) end),
+
+ BogusRing = bogus_ring_stand_in,
+ ?MODULE:ring_update(BogusRing),
+ ok = ?MODULE:delete_handler(Name, unused),
+ {error, _} = ?MODULE:delete_handler(Name, unused),
+
+ %% test_dummy_func is called twice: once by add_sup_callback()
+ %% and once by ring_update().
+ [
+ {_, {RingMgr, get_my_ring, _}, _},
+ {_, {RingMgr, test_dummy_func, _}, _},
+ {_, {RingMgr, test_dummy_func, [BogusRing]}, _}
+ ] = meck:history(RingMgr),
+ ok
+ end
+ ]}.
+
+-endif. % TEST
View
60 src/riak_core_ring_events_sup.erl
@@ -0,0 +1,60 @@
+%% -------------------------------------------------------------------
+%%
+%% riak_core: Core Riak Application
+%%
+%% Copyright (c) 2007-2013 Basho Technologies, Inc. All Rights Reserved.
+%%
+%% This file is provided to you under the Apache License,
+%% Version 2.0 (the "License"); you may not use this file
+%% except in compliance with the License. You may obtain
+%% a copy of the License at
+%%
+%% http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing,
+%% software distributed under the License is distributed on an
+%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+%% KIND, either express or implied. See the License for the
+%% specific language governing permissions and limitations
+%% under the License.
+%%
+%% -------------------------------------------------------------------
+
+-module(riak_core_ring_events_sup).
+
+-behaviour(supervisor).
+
+%% API
+-export([start_link/0, start_riak_core_node_watcher/0]).
+
+%% Supervisor callbacks
+-export([init/1]).
+
+%% Helper macro for declaring children of supervisor
+-define(CHILD(I, Type, Timeout), {I, {I, start_link, []}, permanent, Timeout, Type, [I]}).
+-define(CHILD(I, Type), ?CHILD(I, Type, 5000)).
+
+%% ===================================================================
+%% API functions
+%% ===================================================================
+
+start_link() ->
+ supervisor:start_link({local, ?MODULE}, ?MODULE, []).
+
+riak_core_node_watcher_childspec() ->
+ ?CHILD(riak_core_node_watcher, worker).
+
+start_riak_core_node_watcher() ->
+ supervisor:start_child(?MODULE, riak_core_node_watcher_childspec()).
+
+%% ===================================================================
+%% Supervisor callbacks
+%% ===================================================================
+
+init([]) ->
+ Children = lists:flatten(
+ [
+ ?CHILD(riak_core_ring_events, worker)
+ ]),
+
+ {ok, {{one_for_all, 9999, 10}, Children}}.
View
4 src/riak_core_ring_manager.erl
@@ -97,6 +97,7 @@
}).
-export([setup_ets/1, cleanup_ets/1, set_ring_global/1]). %% For EUnit testing
+-export([test_dummy_func/1]).
-ifdef(TEST).
-include_lib("eunit/include/eunit.hrl").
-endif.
@@ -622,6 +623,9 @@ prune_write_ring(Ring, State) ->
State2 = set_ring(Ring, State),
State2.
+test_dummy_func(_Arg) ->
+ ok.
+
%% ===================================================================
%% Unit tests
%% ===================================================================
View
9 src/riak_core_sup.erl
@@ -50,14 +50,19 @@ init([]) ->
[?CHILD(riak_core_sysmon_minder, worker),
?CHILD(riak_core_vnode_sup, supervisor, 305000),
?CHILD(riak_core_eventhandler_sup, supervisor),
- ?CHILD(riak_core_ring_events, worker),
+ ?CHILD(riak_core_ring_events_sup, supervisor),
?CHILD(riak_core_ring_manager, worker),
?CHILD(riak_core_metadata_manager, worker),
?CHILD(riak_core_metadata_hashtree, worker),
?CHILD(riak_core_broadcast, worker),
?CHILD(riak_core_vnode_proxy_sup, supervisor),
?CHILD(riak_core_node_watcher_events, worker),
- ?CHILD(riak_core_node_watcher, worker),
+ %% riak_core_node_watcher is no longer started here. It is
+ %% now a *dynamic* child of the riak_core_ring_events_sup
+ %% and started by riak_core_vnode_manager:init/1. If it is
+ %% started by riak_core_ring_events_sup during its startup,
+ %% it is too early, and Riak rings never settle after a ring
+ %% change.
?CHILD(riak_core_vnode_manager, worker),
?CHILD(riak_core_capability, worker),
?CHILD(riak_core_handoff_sup, supervisor),
View
3  src/riak_core_vnode_manager.erl
@@ -206,6 +206,9 @@ get_all_vnodes(Mod) ->
%% @private
init(_State) ->
+ %% See riak_core_sup:init/1 for why this call is here.
+ {ok, _Pid} = riak_core_ring_events_sup:start_riak_core_node_watcher(),
+
{ok, Ring, CHBin} = riak_core_ring_manager:get_raw_ring_chashbin(),
Mods = [Mod || {_, Mod} <- riak_core:vnode_modules()],
State = #state{forwarding=dict:new(), handoff=dict:new(),
Something went wrong with that request. Please try again.